Asynchronous analyze repository activities on Github


To change the consequences, you need to create causes.

Perhaps this statement is a paradox of time, but this is a topic for a separate post. And in this article, I want to share the experience of creating a repository activity parser on Github using the API provided by them. Requests to the server and their processing will be performed asynchronously, allowing you to increase the speed of the algorithm, which I may provide a time difference concerning synchronous execution. This task was performed as a test assignment for a Python developer at Playrix.

 

 

The task:

Perform repository analysis using the GitHub REST API. The analysis results are output to stdout. It is necessary to derive the following results:

 

Parameters:

• URL of the public repository on github.com.
• Start date of analysis. If empty, then unlimited.
• The end date of the analysis. If empty, then unlimited.
• Repository branch. The default is master.
• Parameters must be passed to the script through the command line.

 

Result:

• The most active participants. Table of 2 columns: author login, number of commits. The table is sorted by the number of commits in descending order. No more than 30 lines. The analysis is performed on a given period and a given
branch.

• The number of open and closed pull requests for a given period by the date the PR was created and the specified branch, which is the base for this PR.

• The number of “old” pull requests for a given period by the date the PR was created and the specified branch, which is the base for this PR. Pull request is considered old if it does not close within 30 days and is still open.

• The number of open and closed issues for a given period by the date the issue was created.

• The number of “old” issues for a given period by the date the issue was created. The issue is considered old if it does not close within 14 days.

► We need to try to make the most reliable, fault-tolerant script, including taking into account the API limitations on the number of requests

 

Introductions:

Since the script assumes data is output to stdout, the interface for interaction will also be organized from the command line. There is one required argument and 3 optional, they will have a default value. To inform the user that the script is running and work, we implement the animation of the process. You can get started, but first, let's get to know the development team. A candidate in the person of Labrador was proposed as an architect, a cat acted as Timlid, and I just wrote code.

e533baff-ddf1-409b-9a10-27c7d44665c1

In the process of writing code, I used Lenovo B590, SublimeText, and Bandcamp. Let's estimate the plan of action so that structure of the project will be divided into several executive units, each of which will do his work that we will subsequently describe and study. 

                                                               

  • argv_parse.py - contains logic for implement parsing arguments from bash when execution scripts
  • config.py - stores the GitHub API access token
  • loading.py - simple process animation 
  • man.py - script description
  • models.py - describes the models of user, parameters, and queries.
  • start.py - base logic with requests to Github API to get data, operate its and print results.

So let's start from argv_parse.py. The standard module argparse was used to manipulate the command-line interfaces. It is simple and quickly customizable, which is what we need in our case. There will be 4 arguments, one of which is required (URL). The rest, in their absence, will be assigned a default value. Argparse also allows you to check the type of value of the entered arguments. We can also indicate informational messages in the case of the help command.

First, let's import the necessary modules and some text introductions about version and authors.

from textwrap import dedent
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from datetime import datetime
from man import info

Create an instance of the ArgumentParser class. 

parser = ArgumentParser(
	formatter_class=RawDescriptionHelpFormatter,
	description=dedent(info))

Further, using the add_argument() method of the newly created instance, we create the arguments we need one by one. The add_argument() method accepts several arguments, which we can customize depending on the required input data. Here we can specify the flag, description, default value, and whether it is required or not.

parser.add_argument(
	'url',
	help='URL for public github repository\nhttps://github.com/username/repository'
)
parser.add_argument(
	'-s',
	'--start_date',
	default=None,
	type=datetime.fromisoformat,
	required=False,
	help='Start date value of period for data analysis \nFormat: YYYY-MM-DDTHH:MM:SSZ or YYYY-MM-DD',
)
parser.add_argument(
	'-e',
	'--end_date',
	default=datetime.now(),
	type=datetime.fromisoformat,
	required=False,
	help='End date value of period for data analysis \nFormat: YYYY-MM-DDTHH:MM:SSZ or YYYY-MM-DD'
)
parser.add_argument(
	'-b',
	'--branch',
	default='master',
	type=str,
	required=False,
	help='branch'
)

The required arguments for the interface have been created. Now, to get the values of the arguments from the command line, we use the parse_args() method, it returns an object, the attributes of which will be the names of the arguments specified with two hyphens.

arguments = parser.parse_args()

The next one is config.py, a file that stores your Github API access token. To work, the script uses the Github API, which it accesses using your token. If you already have a generated token and you recorded it for further repeated use, you can use it. If one is not available, then you can generate a new one, it's quick and easy. To get it, you need to log into github.com go to settings>developer settings>Personal access tokens. Click on the Generate access token button and follow the instructions. The resulting token is placed in the TOKEN variable.

TOKEN = '<your token>'

The next stage and file are loading.py. As I think its a very important thing because the user will know that script was running and doing his work. In our case animation like this was a good solution for quick and simple implementation. The animation below clearly shows how the animation will be displayed in the process of the script.

Let's take a look at the code and understand how it will work and what the hyphen movement is based on. The algorithm is simple, we have a generator in an endless loop that generates a string, updating the last character with the next character from the list and print it to the stdout, then transferring control.

import sys
import time

def loading():
	chars = ['/', '-', '\\', '|', ]
	while True:
		for i in chars:
			yield print(f"\rloading: {i}", end='', flush=True)
. . .


Comments 0:


Login for commenting