API Reference¶
GitRepository¶
This module includes 1 class, GitRepository, representing a repository in Git.
-
class
pydriller.git_repository.
GitRepository
(path: str)¶ Class representing a repository in Git. It contains most of the logic of PyDriller: obtaining the list of commits, checkout, reset, etc.
-
__init__
(path: str)¶ Init the Git RepositoryMining.
Parameters: path (str) – path to the repository
-
__module__
= 'pydriller.git_repository'¶
-
checkout
(_hash: str) → None¶ Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.
Parameters: _hash – commit hash to checkout
-
files
() → List[str]¶ Obtain the list of the files (excluding .git directory).
Returns: List[str], the list of the files
-
get_commit
(commit_id: str) → pydriller.domain.commit.Commit¶ Get the specified commit.
Parameters: commit_id (str) – hash of the commit to analyze Returns: Commit
-
get_commit_from_gitpython
(commit: git.objects.commit.Commit) → pydriller.domain.commit.Commit¶ Build a PyDriller commit object from a GitPython commit object. This is internal of PyDriller, I don’t think users generally will need it.
Parameters: commit (GitCommit) – GitPython commit Returns: Commit commit: PyDriller commit
-
get_commit_from_tag
(tag: str) → pydriller.domain.commit.Commit¶ Obtain the tagged commit.
Parameters: tag (str) – the tag Returns: Commit commit: the commit the tag referred to
-
get_commits_last_modified_lines
(commit: pydriller.domain.commit.Commit, modification: pydriller.domain.commit.Modification = None, hashes_to_ignore_path: str = None) → Dict[str, Set[str]]¶ Given the Commit object, returns the set of commits that last “touched” the lines that are modified in the files included in the commit. It applies SZZ.
IMPORTANT: for better results, we suggest to install Google depot_tools first (see https://dev.chromium.org/developers/how-tos/install-depot-tools). This allows PyDriller to use “git hyper-blame” instead of the normal blame. If depot_tools are not installed, PyDriller will automatically switch to the normal blame.
The algorithm works as follow: (for every file in the commit)
1- obtain the diff
2- obtain the list of deleted lines
3- blame the file and obtain the commits were those lines were added
Can also be passed as parameter a single Modification, in this case only this file will be analyzed.
Parameters: - commit (Commit) – the commit to analyze
- modification (Modification) – single modification to analyze
- hashes_to_ignore_path (str) – path to a file containing hashes of commits to ignore. Requires “git hyper-blame”.
Returns: the set containing all the bug inducing commits
-
get_commits_modified_file
(filepath: str) → List[str]¶ Given a filepath, returns all the commits that modified this file (following renames).
Parameters: filepath (str) – path to the file Returns: the list of commits’ hash
-
get_head
() → pydriller.domain.commit.Commit¶ Get the head commit.
Returns: Commit of the head commit
-
get_list_commits
(branch: str = None, reverse_order: bool = True) → Generator[[pydriller.domain.commit.Commit, None], None]¶ Return a generator of commits of all the commits in the repo.
Returns: Generator[Commit], the generator of all the commits in the repo
-
get_tagged_commits
()¶ Obtain the hash of all the tagged commits.
Returns: list of tagged commits (can be empty if there are no tags)
-
git
¶ GitPython object Git.
Returns: Git
-
hyper_blame_available
¶
-
parse_diff
(diff: str) → Dict[str, List[Tuple[int, str]]]¶ Given a diff, returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).
Parameters: diff (str) – diff of the commit Returns: Dictionary
-
repo
¶ GitPython object Repo.
Returns: Repo
-
reset
() → None¶ Reset the state of the repo, checking out the main branch and discarding local changes (-f option).
-
total_commits
() → int¶ Calculate total number of commits.
Returns: the total number of commits
-
RepositoryMining¶
This module includes 1 class, RepositoryMining, main class of PyDriller.
-
class
pydriller.repository_mining.
RepositoryMining
(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None, only_releases: bool = False, filepath: str = None)¶ This is the main class of PyDriller, responsible for running the study.
-
__init__
(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None, only_releases: bool = False, filepath: str = None)¶ Init a repository mining. The only required parameter is “path_to_repo”: to analyze a single repo, pass the absolute path to the repo; if you need to analyze more repos, pass a list of absolute paths.
Furthermore, PyDriller supports local and remote repositories: if you pass a path to a repo, PyDriller will run the study on that repo; if you pass an URL, PyDriller will clone the repo in a temporary folder, run the study, and delete the temporary folder.
Parameters: - path_to_repo (Union[str,List[str]]) – absolute path (or list of absolute paths) to the repository(ies) to analyze
- single (str) – hash of a single commit to analyze
- since (datetime) – starting date
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
- from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
- to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
- reversed_order (bool) – whether the commits should be analyzed in reversed order
- only_in_branch (str) – only commits in this branch will be analyzed
- only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
- only_no_merge (bool) – if True, merges will not be analyzed
- only_authors (List[str]) – only commits of these authors will be analyzed (the check is done on the username, NOT the email)
- only_commits (List[str]) – only these commits will be analyzed
- filepath (str) – only commits that modified this file will be analyzed
-
__module__
= 'pydriller.repository_mining'¶
-
traverse_commits
() → Generator[[pydriller.domain.commit.Commit, None], None]¶ Analyze all the specified commits (all of them by default), returning a generator of commits.
-
Commit¶
This module contains all the classes regarding a specific commit, such as Commit, Modification, ModificationType and Method.
-
class
pydriller.domain.commit.
Commit
(commit: git.objects.commit.Commit, project_path: pathlib.Path, main_branch: str)¶ Class representing a Commit. Contains all the important information such as hash, author, dates, and modified files.
-
__init__
(commit: git.objects.commit.Commit, project_path: pathlib.Path, main_branch: str) → None¶ Create a commit object.
Parameters: - commit – GitPython Commit object
- project_path – path to the project (temporary folder in case of a remote repository)
- main_branch – main branch of the repo
-
__module__
= 'pydriller.domain.commit'¶
Return the author of the commit as a Developer object.
Returns: author
Return the authored datetime.
Returns: datetime author_datetime
Author timezone expressed in seconds from epoch.
Returns: int timezone
-
branches
¶ Return the set of branches that contain the commit.
Returns: set(str) branches
-
committer
¶ Return the committer of the commit as a Developer object.
Returns: committer
-
committer_date
¶ Return the committed datetime.
Returns: datetime committer_datetime
-
committer_timezone
¶ Author timezone expressed in seconds from epoch.
Returns: int timezone
-
hash
¶ Return the SHA of the commit.
Returns: str hash
-
in_main_branch
¶ Return True if the commit is in the main branch, False otherwise.
Returns: bool in_main_branch
-
merge
¶ Return True if the commit is a merge, False otherwise.
Returns: bool merge
-
modifications
¶ Return a list of modified files.
Returns: List[Modification] modifications
-
msg
¶ Return commit message.
Returns: str commit_message
-
parents
¶ Return the list of parents SHAs.
Returns: List[str] parents
-
project_name
¶ Return the project name.
Returns: project name
-
-
class
pydriller.domain.commit.
Method
(func)¶ This class represents a method in a class. Contains various information extracted through Lizard.
-
__init__
(func)¶ Initialize a method object. This is calculated using Lizard: it parses the source code of all the modifications in a commit, extracting information of the methods contained in the file (if the file is a source code written in one of the supported programming languages).
-
__module__
= 'pydriller.domain.commit'¶
-
-
class
pydriller.domain.commit.
Modification
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])¶ This class contains information regarding a modified file in a commit.
-
__init__
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])¶ Initialize a modification. A modification carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.
-
__module__
= 'pydriller.domain.commit'¶
-
added
¶ Return the total number of added lines in the file.
Returns: int lines_added
-
complexity
¶ Calculate the Cyclomatic Complexity of the file.
Returns: Cyclomatic Complexity of the file
-
filename
¶ Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)
Returns: str filename
-
methods
¶ Return the list of methods in the file. Every method contains various information like complexity, loc, name, number of parameters, etc.
Returns: list of methods
-
new_path
¶ New path of the file. Can be None if the file is deleted.
Returns: str new_path
-
nloc
¶ Calculate the LOC of the file.
Returns: LOC of the file
-
old_path
¶ Old path of the file. Can be None if the file is added.
Returns: str old_path
-
removed
¶ Return the total number of deleted lines in the file.
Returns: int lines_deleted
-
token_count
¶ Calculate the token count of functions.
Returns: token count
-
Developer¶
This module includes only 1 class, Developer, representing a developer.
-
class
pydriller.domain.developer.
Developer
(name: str, email: str)¶ This class represents a developer. We save the email and the name.
-
__init__
(name: str, email: str)¶ Class to identify a developer.
Parameters: - name (str) – name and surname of the developer
- email (str) – email of the developer
-
__module__
= 'pydriller.domain.developer'¶
-