API Reference

GitRepository

class pydriller.git_repository.GitRepository(path: str)
__init__(path: str)

Init the Git Repository.

Parameters:path (str) – path to the repository
__module__ = 'pydriller.git_repository'
checkout(_hash: str) → None

Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.

Parameters:_hash – commit hash to checkout
files() → List[str]

Obtain the list of the files (excluding .git directory).

Returns:List[str], the list of the files
get_commit(commit_id: str) → pydriller.domain.commit.Commit

Get the specified commit.

Parameters:commit_id (str) – hash of the commit to analyze
Returns:Commit
get_commit_from_gitpython(commit: git.objects.commit.Commit) → pydriller.domain.commit.Commit

Build a PyDriller commit object from a GitPython commit object. This is internal of PyDriller, I don’t think users generally will need it.

Parameters:commit (GitCommit) – GitPython commit
Returns:Commit commit: PyDriller commit
get_commit_from_tag(tag: str) → pydriller.domain.commit.Commit

Obtain the tagged commit.

Parameters:tag (str) – the tag
Returns:Commit commit: the commit the tag referred to
get_commits_last_modified_lines(commit: pydriller.domain.commit.Commit, modification: pydriller.domain.commit.Modification = None) → Set[str]

Given the Commit object, returns the set of commits that last “touched” the lines that are modified in the files included in the commit. It applies SZZ. The algorithm works as follow: (for every file in the commit)

1- obtain the diff

2- obtain the list of deleted lines

3- blame the file and obtain the commits were those lines were added

Can also be passed as parameter a single Modification, in this case only this file will be analyzed.

Parameters:
  • commit (Commit) – the commit to analyze
  • modification (Modification) – single modification to analyze
Returns:

the set containing all the bug inducing commits

get_head() → pydriller.domain.commit.Commit

Get the head commit.

Returns:Commit of the head commit
get_list_commits(branch: str = None) → List[pydriller.domain.commit.Commit]

Return the list of all the commits in the repo.

Returns:List[Commit], the list of all the commits in the repo
git
parse_diff(diff: str) → Dict[str, List[Tuple[int, str]]]

Given a diff, returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).

Parameters:diff (str) – diff of the commit
Returns:Dictionary
repo
reset() → None

Reset the state of the repo, checking out the main branch and discarding local changes (-f option).

total_commits() → int

Calculate total number of commits.

Returns:the total number of commits

RepositoryMining

class pydriller.repository_mining.RepositoryMining(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False)
__init__(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False)

Init a repository mining. The only required parameter is “path_to_repo”: to analyze a single repo, pass the absolute path to the repo; if you need to analyze more repos, pass a list of absolute paths.

Furthermore, PyDriller supports local and remote repositories: if you pass a path to a repo, PyDriller will run the study on that repo; if you pass an URL, PyDriller will clone the repo in a temporary folder, run the study, and delete the temporary folder.

Parameters:
  • path_to_repo (Union[str,List[str]]) – absolute path (or list of absolute paths) to the repository(ies) to analyze
  • single (str) – hash of a single commit to analyze
  • since (datetime) – starting date
  • to (datetime) – ending date
  • from_commit (str) – starting commit (only if since is None)
  • to_commit (str) – ending commit (only if to is None)
  • from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
  • to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
  • reversed_order (bool) – whether the commits should be analyzed in reversed order
  • only_in_branch (str) – only commits in this branch will be analyzed
  • only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
  • only_no_merge (bool) – if True, merges will not be analyzed
__module__ = 'pydriller.repository_mining'
traverse_commits() → Generator[[pydriller.domain.commit.Commit, None], None]

Analyze all the specified commits (all of them by default), returning a generator of commits.

Commit

class pydriller.domain.commit.Commit(commit: git.objects.commit.Commit, project_path: pathlib.Path, main_branch: str)
__init__(commit: git.objects.commit.Commit, project_path: pathlib.Path, main_branch: str) → None

Create a commit object.

Parameters:
  • GitCommit (commit) – GitPython Commit object
  • project_path – path to the project (temporary folder in case of a remote repository)
  • project_name – name of the project
  • main_branch – main branch of the repo
__module__ = 'pydriller.domain.commit'
author

Return the author of the commit as a Developer object.

Returns:author
author_date

Return the authored datetime.

Returns:datetime author_datetime
author_timezone

Author timezone expressed in seconds from epoch.

Returns:int timezone
branches

Return the set of branches that contain the commit.

Returns:set(str) branches
committer

Return the committer of the commit as a Developer object.

Returns:committer
committer_date

Return the committed datetime.

Returns:datetime committer_datetime
committer_timezone

Author timezone expressed in seconds from epoch.

Returns:int timezone
hash

Return the SHA of the commit.

Returns:str hash
in_main_branch

Return True if the commit is in the main branch, False otherwise.

Returns:bool in_main_branch
merge

Return True if the commit is a merge, False otherwise.

Returns:bool merge
modifications

Return a list of modified files.

Returns:List[Modification] modifications
msg

Return commit message.

Returns:str commit_message
parents

Return the list of parents SHAs.

Returns:List[str] parents
project_name

Return the project name.

Returns:project name
class pydriller.domain.commit.Method(func)
__init__(func)

Initialize a method object. This is calculated using Lizard: it parses the source code of all the modifications in a commit, extracting information of the methods contained in the file (if the file is a source code written in one of the supported programming languages).

__module__ = 'pydriller.domain.commit'
class pydriller.domain.commit.Modification(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])
__init__(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])

Initialize a modification. A modification carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.

__module__ = 'pydriller.domain.commit'
added

Return the total number of added lines in the file.

Returns:int lines_added
complexity

Calculate the Cyclomatic Complexity of the file.

Returns:Cyclomatic Complexity of the file
filename

Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)

Returns:str filename
methods

Return the list of methods in the file. Every method contains various information like complexity, loc, name, number of parameters, etc.

Returns:list of methods
new_path
nloc

Calculate the LOC of the file.

Returns:LOC of the file
old_path
removed

Return the total number of deleted lines in the file.

Returns:int lines_deleted
token_count

Calculate the token count of functions.

Returns:token count
class pydriller.domain.commit.ModificationType

An enumeration.

ADD = (1,)
COPY = (2,)
DELETE = (4,)
MODIFY = 5
RENAME = (3,)
__module__ = 'pydriller.domain.commit'

Developer

class pydriller.domain.developer.Developer(name: str, email: str)
__init__(name: str, email: str)

Class to identify a developer.

Parameters:
  • name (str) – name and surname of the developer
  • email (str) – email of the developer
__module__ = 'pydriller.domain.developer'