API Reference¶
GitRepository¶
-
class
pydriller.git_repository.
GitRepository
(path: str)¶ -
__init__
(path: str)¶ Init the Git Repository.
Parameters: path (str) – path to the repository
-
__module__
= 'pydriller.git_repository'¶
-
checkout
(_hash: str) → None¶ Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.
Parameters: _hash – commit hash to checkout
-
files
() → List[str]¶ Obtain the list of the files (excluding .git directory).
Returns: List[str], the list of the files
-
get_commit
(commit_id: str) → pydriller.domain.commit.Commit¶ Get the specified commit.
Parameters: commit_id (str) – hash of the commit to analyze Returns: Commit
-
get_commit_from_gitpython
(commit: git.objects.commit.Commit) → pydriller.domain.commit.Commit¶ Build a PyDriller commit object from a GitPython commit object. This is internal of PyDriller, I don’t think users generally will need it.
Parameters: commit (GitCommit) – GitPython commit Returns: Commit commit: PyDriller commit
-
get_commit_from_tag
(tag: str) → pydriller.domain.commit.Commit¶ Obtain the tagged commit.
Parameters: tag (str) – the tag Returns: Commit commit: the commit the tag referred to
-
get_commits_last_modified_lines
(commit: pydriller.domain.commit.Commit, modification: pydriller.domain.commit.Modification = None) → Set[str]¶ Given the Commit object, returns the set of commits that last “touched” the lines that are modified in the files included in the commit. It applies SZZ. The algorithm works as follow: (for every file in the commit)
1- obtain the diff
2- obtain the list of deleted lines
3- blame the file and obtain the commits were those lines were added
Can also be passed as parameter a single Modification, in this case only this file will be analyzed.
Parameters: - commit (Commit) – the commit to analyze
- modification (Modification) – single modification to analyze
Returns: the set containing all the bug inducing commits
-
get_head
() → pydriller.domain.commit.Commit¶ Get the head commit.
Returns: Commit of the head commit
-
get_list_commits
(branch: str = None) → List[pydriller.domain.commit.Commit]¶ Return the list of all the commits in the repo.
Returns: List[Commit], the list of all the commits in the repo
-
git
¶ GitPython object Git.
Returns: Git
-
parse_diff
(diff: str) → Dict[str, List[Tuple[int, str]]]¶ Given a diff, returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).
Parameters: diff (str) – diff of the commit Returns: Dictionary
-
repo
¶ GitPython object Repo.
Returns: Repo
-
reset
() → None¶ Reset the state of the repo, checking out the main branch and discarding local changes (-f option).
-
total_commits
() → int¶ Calculate total number of commits.
Returns: the total number of commits
-
RepositoryMining¶
-
class
pydriller.repository_mining.
RepositoryMining
(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None)¶ -
__init__
(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None)¶ Init a repository mining. The only required parameter is “path_to_repo”: to analyze a single repo, pass the absolute path to the repo; if you need to analyze more repos, pass a list of absolute paths.
Furthermore, PyDriller supports local and remote repositories: if you pass a path to a repo, PyDriller will run the study on that repo; if you pass an URL, PyDriller will clone the repo in a temporary folder, run the study, and delete the temporary folder.
Parameters: - path_to_repo (Union[str,List[str]]) – absolute path (or list of absolute paths) to the repository(ies) to analyze
- single (str) – hash of a single commit to analyze
- since (datetime) – starting date
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
- from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
- to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
- reversed_order (bool) – whether the commits should be analyzed in reversed order
- only_in_branch (str) – only commits in this branch will be analyzed
- only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
- only_no_merge (bool) – if True, merges will not be analyzed
- only_authors (List[str]) – only commits of these authors will be analyzed (the check is done on the username, NOT the email)
- only_commits (List[str]) – only these commits will be analyzed
-
__module__
= 'pydriller.repository_mining'¶
-
traverse_commits
() → Generator[[pydriller.domain.commit.Commit, None], None]¶ Analyze all the specified commits (all of them by default), returning a generator of commits.
-
Commit¶
-
class
pydriller.domain.commit.
Commit
(commit: git.objects.commit.Commit, project_path: pathlib.Path, main_branch: str)¶ -
__init__
(commit: git.objects.commit.Commit, project_path: pathlib.Path, main_branch: str) → None¶ Create a commit object.
Parameters: - GitCommit (commit) – GitPython Commit object
- project_path – path to the project (temporary folder in case of a remote repository)
- main_branch – main branch of the repo
-
__module__
= 'pydriller.domain.commit'¶
Return the author of the commit as a Developer object.
Returns: author
Return the authored datetime.
Returns: datetime author_datetime
Author timezone expressed in seconds from epoch.
Returns: int timezone
-
branches
¶ Return the set of branches that contain the commit.
Returns: set(str) branches
-
committer
¶ Return the committer of the commit as a Developer object.
Returns: committer
-
committer_date
¶ Return the committed datetime.
Returns: datetime committer_datetime
-
committer_timezone
¶ Author timezone expressed in seconds from epoch.
Returns: int timezone
-
hash
¶ Return the SHA of the commit.
Returns: str hash
-
in_main_branch
¶ Return True if the commit is in the main branch, False otherwise.
Returns: bool in_main_branch
-
merge
¶ Return True if the commit is a merge, False otherwise.
Returns: bool merge
-
modifications
¶ Return a list of modified files.
Returns: List[Modification] modifications
-
msg
¶ Return commit message.
Returns: str commit_message
-
parents
¶ Return the list of parents SHAs.
Returns: List[str] parents
-
project_name
¶ Return the project name.
Returns: project name
-
-
class
pydriller.domain.commit.
Method
(func)¶ -
__init__
(func)¶ Initialize a method object. This is calculated using Lizard: it parses the source code of all the modifications in a commit, extracting information of the methods contained in the file (if the file is a source code written in one of the supported programming languages).
-
__module__
= 'pydriller.domain.commit'¶
-
-
class
pydriller.domain.commit.
Modification
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])¶ -
__init__
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])¶ Initialize a modification. A modification carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.
-
__module__
= 'pydriller.domain.commit'¶
-
added
¶ Return the total number of added lines in the file.
Returns: int lines_added
-
complexity
¶ Calculate the Cyclomatic Complexity of the file.
Returns: Cyclomatic Complexity of the file
-
filename
¶ Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)
Returns: str filename
-
methods
¶ Return the list of methods in the file. Every method contains various information like complexity, loc, name, number of parameters, etc.
Returns: list of methods
-
new_path
¶ New path of the file. Can be None if the file is deleted.
Returns: str new_path
-
nloc
¶ Calculate the LOC of the file.
Returns: LOC of the file
-
old_path
¶ Old path of the file. Can be None if the file is added.
Returns: str old_path
-
removed
¶ Return the total number of deleted lines in the file.
Returns: int lines_deleted
-
token_count
¶ Calculate the token count of functions.
Returns: token count
-