API Reference

GitRepository

class pydriller.git_repository.GitRepository(path: str)
__init__(path: str)

Init the Git Repository.

Parameters:path – path to the repository
__module__ = 'pydriller.git_repository'
checkout(_hash: str) → None

Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.

Parameters:_hash – commit hash to checkout
files() → typing.List[str]

Obtain the list of the files (excluding .git directory).

Returns:List[str], the list of the files
get_change_sets() → typing.List[pydriller.domain.commit.ChangeSet]

Return the list of all the commits in the repo.

Returns:List[ChangeSet], the list of all the commits in the repo
get_commit(commit_id: str) → pydriller.domain.commit.Commit

Get the specified commit.

Parameters:commit_id – hash of the commit to analyze
Returns:Commit
get_commit_from_tag(tag: str) → pydriller.domain.commit.Commit

Obtain the tagged commit.

Parameters:tag (str) – the tag
Returns:Commit commit: the commit the tag referred to
get_head() → pydriller.domain.commit.ChangeSet

Get the head commit.

Returns:ChangeSet of the head commit
parse_diff(diff: str) → typing.Dict[str, typing.List[typing.Tuple[int, str]]]

Given a diff, returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).

Parameters:diff (str) – diff of the commit
Returns:Dictionary
reset() → None

Reset the state of the repo, checking out the main branch and discarding local changes (-f option).

total_commits() → int

Calculate total number of commits.

Returns:the total number of commits

RepositoryMining

class pydriller.repository_mining.RepositoryMining(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)
__init__(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)

Init a repository mining.

Parameters:
  • path_to_repo (str) – absolute path to the repository you have to analyze
  • single (str) – hash of a single commit to analyze
  • since (datetime) – starting date
  • to (datetime) – ending date
  • from_commit (str) – starting commit (only if since is None)
  • to_commit (str) – ending commit (only if to is None)
  • from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
  • to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
  • reversed_order (bool) – whether the commits should be analyzed in reversed order
  • only_in_main_branch (bool) – whether only commits in main branch should be analyzed
  • only_in_branches (List[str]) – only commits in these branches will be analyzed
  • only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
  • only_no_merge (bool) – if True, merges will not be analyzed
__module__ = 'pydriller.repository_mining'
traverse_commits() → typing.Generator[[pydriller.domain.commit.Commit, NoneType], NoneType]

Analyze all the specified commits (all of them by default), returning a generator of commits.

Commit

class pydriller.domain.commit.ChangeSet(id: str, date: datetime.datetime)
__init__(id: str, date: datetime.datetime)

Light-weight version of the commit, storing only the hash and the date. Used for filter out commits before asking for more complex information (like diff and source code).

Parameters:
  • id (str) – hash of the commit
  • date – date of the commit
__module__ = 'pydriller.domain.commit'
class pydriller.domain.commit.Commit(hash: str, author: pydriller.domain.developer.Developer, committer: pydriller.domain.developer.Developer, author_date: datetime.datetime, committer_date: datetime.datetime, author_timezone: int, committer_timezone: int, msg: str, parents: typing.List[str], merge: bool = False, branches: set = set(), is_commit_in_main_branch: bool = False) → None
__init__(hash: str, author: pydriller.domain.developer.Developer, committer: pydriller.domain.developer.Developer, author_date: datetime.datetime, committer_date: datetime.datetime, author_timezone: int, committer_timezone: int, msg: str, parents: typing.List[str], merge: bool = False, branches: set = set(), is_commit_in_main_branch: bool = False) → None

Create a commit object.

Parameters:
  • hash (str) – hash of the commit
  • author (Developer) – author of the commit
  • committer (Developer) – committer of the commit
  • author_date (datetime) – date when the author committed
  • committer_date (datetime) – date when the committer committed
  • author_timezone (int) – seconds west from UTC
  • committer_timezone (int) – seconds west from UTC
  • msg (str) – message of the commit
  • parents (List[str]) – list of hashes of the parent commits
  • merge (bool) – True if the commit is a merge commit
  • branches (set) – branches that include the commit
  • is_commit_in_main_branch (bool) – True if the commit is in the main branch
__module__ = 'pydriller.domain.commit'
add_modifications(old_path: str, new_path: str, change: pydriller.domain.modification.ModificationType, diff: str, sc: str)

Add a modification to the commit.

Parameters:
  • old_path (str) – old path of the file (can be null if the file is added)
  • new_path (str) – new path of the file (can be null if the file is deleted)
  • change (ModificationType) – type of the change
  • diff (str) – diff of the change
  • sc (str) – source code of the file (can be null if the file is deleted)

Modification

class pydriller.domain.modification.Modification(old_path: str, new_path: str, change_type: pydriller.domain.modification.ModificationType, diff: str, source_code: str)
__init__(old_path: str, new_path: str, change_type: pydriller.domain.modification.ModificationType, diff: str, source_code: str)

Initialize a modification. A modification carries on information regarding the changed file.

Parameters:
  • old_path – old path of the file (can be null if the file is added)
  • new_path – new path of the file (can be null if the file is deleted)
  • change_type – type of the change
  • diff – diff of the change
  • source_code – source code of the file (can be null if the file is deleted)
__module__ = 'pydriller.domain.modification'
class pydriller.domain.modification.ModificationType

An enumeration.

ADD = (1,)
COPY = (2,)
DELETE = (4,)
MODIFY = 5
RENAME = (3,)
__module__ = 'pydriller.domain.modification'
__new__(value)

Developer

class pydriller.domain.developer.Developer(name: str, email: str)
__init__(name: str, email: str)

Class to identify a developer.

Parameters:
  • name (str) – name and surname of the developer
  • email (str) – email of the developer
__module__ = 'pydriller.domain.developer'