API Reference

GitRepository

class pydriller.git_repository.GitRepository(path: str)
__init__(path: str)

Init the Git Repository.

Parameters:path (str) – path to the repository
__module__ = 'pydriller.git_repository'
checkout(_hash: str) → None

Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.

Parameters:_hash – commit hash to checkout
files() → typing.List[str]

Obtain the list of the files (excluding .git directory).

Returns:List[str], the list of the files
get_change_sets() → typing.List[pydriller.domain.commit.ChangeSet]

Return the list of all the commits in the repo.

Returns:List[ChangeSet], the list of all the commits in the repo
get_commit(commit_id: str) → pydriller.domain.commit.Commit

Get the specified commit.

Parameters:commit_id (str) – hash of the commit to analyze
Returns:Commit
get_commit_from_tag(tag: str) → pydriller.domain.commit.Commit

Obtain the tagged commit.

Parameters:tag (str) – the tag
Returns:Commit commit: the commit the tag referred to
get_head() → pydriller.domain.commit.ChangeSet

Get the head commit.

Returns:ChangeSet of the head commit
parse_diff(diff: str) → typing.Dict[str, typing.List[typing.Tuple[int, str]]]

Given a diff, returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).

Parameters:diff (str) – diff of the commit
Returns:Dictionary
reset() → None

Reset the state of the repo, checking out the main branch and discarding local changes (-f option).

total_commits() → int

Calculate total number of commits.

Returns:the total number of commits

RepositoryMining

class pydriller.repository_mining.RepositoryMining(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)
__init__(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)

Init a repository mining.

Parameters:
  • path_to_repo (str) – absolute path to the repository you have to analyze
  • single (str) – hash of a single commit to analyze
  • since (datetime) – starting date
  • to (datetime) – ending date
  • from_commit (str) – starting commit (only if since is None)
  • to_commit (str) – ending commit (only if to is None)
  • from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
  • to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
  • reversed_order (bool) – whether the commits should be analyzed in reversed order
  • only_in_main_branch (bool) – whether only commits in main branch should be analyzed
  • only_in_branches (List[str]) – only commits in these branches will be analyzed
  • only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
  • only_no_merge (bool) – if True, merges will not be analyzed
__module__ = 'pydriller.repository_mining'
traverse_commits() → typing.Generator[[pydriller.domain.commit.Commit, NoneType], NoneType]

Analyze all the specified commits (all of them by default), returning a generator of commits.

Commit

class pydriller.domain.commit.ChangeSet(id: str, date: datetime.datetime)
__init__(id: str, date: datetime.datetime)

Light-weight version of the commit, storing only the hash and the date. Used for filter out commits before asking for more complex information (like diff and source code).

Parameters:
  • id (str) – hash of the commit
  • date – date of the commit
__module__ = 'pydriller.domain.commit'
class pydriller.domain.commit.Commit(commit: git.objects.commit.Commit, path: str, main_branch: str) → None
__init__(commit: git.objects.commit.Commit, path: str, main_branch: str) → None

Create a commit object.

__module__ = 'pydriller.domain.commit'
author

Return the author of the commit as a Developer object.

Returns:author
author_date

Return the authored datetime.

Returns:datetime author_datetime
author_timezone

Author timezone expressed in seconds from epoch.

Returns:int timezone
branches

Return the set of branches that contain the commit.

Returns:set(str) branches
committer

Return the committer of the commit as a Developer object.

Returns:committer
committer_date

Return the committed datetime.

Returns:datetime committer_datetime
committer_timezone

Author timezone expressed in seconds from epoch.

Returns:int timezone
hash

Return the SHA of the commit.

Returns:str hash
in_main_branch

Return True if the commit is in the main branch, False otherwise.

Returns:bool in_main_branch
merge

Return True if the commit is a merge, False otherwise.

Returns:bool merge
modifications

Return a list of modified files.

Returns:List[Modification] modifications
msg

Return commit message.

Returns:str commit_message
parents

Return the list of parents SHAs.

Returns:List[str] parents
class pydriller.domain.commit.Modification(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, parents: typing.List[str], hash: str, path: str = None, modifications_list=None)
__init__(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, parents: typing.List[str], hash: str, path: str = None, modifications_list=None)

Initialize a modification. A modification carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.

__module__ = 'pydriller.domain.commit'
added

Return the total number of added lines in the file.

Returns:int lines_added
diff

Return the diff of the file of the current commit.

Returns:str diff
filename

Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)

Returns:str filename
removed

Return the total number of deleted lines in the file.

Returns:int lines_deleted
source_code

Return the source code of the file on the current commit.

Returns:str source_code
class pydriller.domain.commit.ModificationType

An enumeration.

ADD = (1,)
COPY = (2,)
DELETE = (4,)
MODIFY = 5
RENAME = (3,)
__module__ = 'pydriller.domain.commit'
__new__(value)

Developer

class pydriller.domain.developer.Developer(name: str, email: str)
__init__(name: str, email: str)

Class to identify a developer.

Parameters:
  • name (str) – name and surname of the developer
  • email (str) – email of the developer
__module__ = 'pydriller.domain.developer'