API Reference

GitRepository

This module includes 1 class, GitRepository, representing a repository in Git.

class pydriller.git_repository.GitRepository(path: str, conf=None)

Class representing a repository in Git. It contains most of the logic of PyDriller: obtaining the list of commits, checkout, reset, etc.

__del__()
__init__(path: str, conf=None)

Init the Git RepositoryMining.

Parameters:path (str) – path to the repository
__module__ = 'pydriller.git_repository'
checkout(_hash: str) → None

Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.

Parameters:_hash – commit hash to checkout
clear()

According to GitPython’s documentation, sometimes it leaks resources. This holds especially for Windows users. Hence, we need to clear the cache manually.

files() → List[str]

Obtain the list of the files (excluding .git directory).

Returns:List[str], the list of the files
get_commit(commit_id: str) → pydriller.domain.commit.Commit

Get the specified commit.

Parameters:commit_id (str) – hash of the commit to analyze
Returns:Commit
get_commit_from_gitpython(commit: git.objects.commit.Commit) → pydriller.domain.commit.Commit

Build a PyDriller commit object from a GitPython commit object. This is internal of PyDriller, I don’t think users generally will need it.

Parameters:commit (GitCommit) – GitPython commit
Returns:Commit commit: PyDriller commit
get_commit_from_tag(tag: str) → pydriller.domain.commit.Commit

Obtain the tagged commit.

Parameters:tag (str) – the tag
Returns:Commit commit: the commit the tag referred to
get_commits_last_modified_lines(commit: pydriller.domain.commit.Commit, modification: pydriller.domain.commit.Modification = None, hashes_to_ignore_path: str = None) → Dict[str, Set[str]]

Given the Commit object, returns the set of commits that last “touched” the lines that are modified in the files included in the commit. It applies SZZ.

The algorithm works as follow: (for every file in the commit)

1- obtain the diff

2- obtain the list of deleted lines

3- blame the file and obtain the commits were those lines were added

Can also be passed as parameter a single Modification, in this case only this file will be analyzed.

Parameters:
  • commit (Commit) – the commit to analyze
  • modification (Modification) – single modification to analyze
  • hashes_to_ignore_path (str) – path to a file containing hashes of commits to ignore.
Returns:

the set containing all the bug inducing commits

get_commits_modified_file(filepath: str) → List[str]

Given a filepath, returns all the commits that modified this file (following renames).

Parameters:filepath (str) – path to the file
Returns:the list of commits’ hash
get_head() → pydriller.domain.commit.Commit

Get the head commit.

Returns:Commit of the head commit
get_list_commits(rev='HEAD', **kwargs) → Generator[[pydriller.domain.commit.Commit, None], None]

Return a generator of commits of all the commits in the repo.

Returns:Generator[Commit], the generator of all the commits in the repo
get_tagged_commits()

Obtain the hash of all the tagged commits.

Returns:list of tagged commits (can be empty if there are no tags)
git

GitPython object Git.

Returns:Git
repo

GitPython object Repo.

Returns:Repo
reset() → None

Reset the state of the repo, checking out the main branch and discarding local changes (-f option).

total_commits() → int

Calculate total number of commits.

Returns:the total number of commits

RepositoryMining

This module includes 1 class, RepositoryMining, main class of PyDriller.

class pydriller.repository_mining.RepositoryMining(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, include_refs: bool = False, include_remotes: bool = False, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None, only_releases: bool = False, filepath: str = None, histogram_diff: bool = False, skip_whitespaces: bool = False, clone_repo_to: str = None, order: str = None)

This is the main class of PyDriller, responsible for running the study.

__init__(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, include_refs: bool = False, include_remotes: bool = False, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None, only_releases: bool = False, filepath: str = None, histogram_diff: bool = False, skip_whitespaces: bool = False, clone_repo_to: str = None, order: str = None)

Init a repository mining. The only required parameter is “path_to_repo”: to analyze a single repo, pass the absolute path to the repo; if you need to analyze more repos, pass a list of absolute paths.

Furthermore, PyDriller supports local and remote repositories: if you pass a path to a repo, PyDriller will run the study on that repo; if you pass an URL, PyDriller will clone the repo in a temporary folder, run the study, and delete the temporary folder.

Parameters:
  • path_to_repo (Union[str,List[str]]) – absolute path (or list of absolute paths) to the repository(ies) to analyze
  • single (str) – hash of a single commit to analyze
  • since (datetime) – starting date
  • to (datetime) – ending date
  • from_commit (str) – starting commit (only if since is None)
  • to_commit (str) – ending commit (only if to is None)
  • from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
  • to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
  • include_refs (bool) – whether to include refs and HEAD in commit analysis
  • include_remotes (bool) – whether to include remote commits in analysis
  • reversed_order (bool) – whether the commits should be analyzed in reversed order (DEPRECATED)
  • only_in_branch (str) – only commits in this branch will be analyzed
  • only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
  • only_no_merge (bool) – if True, merges will not be analyzed
  • only_authors (List[str]) – only commits of these authors will be analyzed (the check is done on the username, NOT the email)
  • only_commits (List[str]) – only these commits will be analyzed
  • filepath (str) – only commits that modified this file will be analyzed
  • order (str) – order of commits. It can be one of: ‘date-order’, ‘author-date-order’, ‘topo-order’, or ‘reverse’. Default is reverse.
__module__ = 'pydriller.repository_mining'
traverse_commits() → Generator[[pydriller.domain.commit.Commit, None], None]

Analyze all the specified commits (all of them by default), returning a generator of commits.

Commit

This module contains all the classes regarding a specific commit, such as Commit, Modification, ModificationType and Method.

class pydriller.domain.commit.Commit(commit: git.objects.commit.Commit, conf)

Class representing a Commit. Contains all the important information such as hash, author, dates, and modified files.

__init__(commit: git.objects.commit.Commit, conf) → None

Create a commit object.

Parameters:
  • commit – GitPython Commit object
  • conf – Configuration class
__module__ = 'pydriller.domain.commit'
author

Return the author of the commit as a Developer object.

Returns:author
author_date

Return the authored datetime.

Returns:datetime author_datetime
author_timezone

Author timezone expressed in seconds from epoch.

Returns:int timezone
branches

Return the set of branches that contain the commit.

Returns:set(str) branches
committer

Return the committer of the commit as a Developer object.

Returns:committer
committer_date

Return the committed datetime.

Returns:datetime committer_datetime
committer_timezone

Author timezone expressed in seconds from epoch.

Returns:int timezone
dmm_unit_complexity

Return the Delta Maintainability Model (DMM) metric value for the unit complexity property.

It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the cyclomatic complexity of the modified methods.

It rewards (value close to 1.0) modifications to low-risk (low complexity) methods, or spliting risky (highly complex) ones. It penalizes (value close to 0.0) working on methods that remain complex or get more complex.

Returns:The DMM value (between 0.0 and 1.0) for method complexity in this commit. or None if none of the programming languages in the commit are supported.
dmm_unit_interfacing

Return the Delta Maintainability Model (DMM) metric value for the unit interfacing property.

It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the interface (number of parameters) of the modified methods.

It rewards (value close to 1.0) modifications to low-risk (with few parameters) methods, or spliting risky (with many parameters) ones. It penalizes (value close to 0.0) working on methods that continue to have or are extended with too many parameters.

Returns:The dmm value (between 0.0 and 1.0) for method interfacing in this commit. or None if none of the programming languages in the commit are supported.
dmm_unit_size

Return the Delta Maintainability Model (DMM) metric value for the unit size property.

It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the lengths of the modified methods.

It rewards (value close to 1.0) modifications to low-risk (small) methods, or spliting risky (large) ones. It penalizes (value close to 0.0) working on methods that remain large or get larger.

Returns:The DMM value (between 0.0 and 1.0) for method size in this commit, or None if none of the programming languages in the commit are supported.
hash

Return the SHA of the commit.

Returns:str hash
in_main_branch

Return True if the commit is in the main branch, False otherwise.

Returns:bool in_main_branch
merge

Return True if the commit is a merge, False otherwise.

Returns:bool merge
modifications

Return a list of modified files.

Returns:List[Modification] modifications
msg

Return commit message.

Returns:str commit_message
parents

Return the list of parents SHAs.

Returns:List[str] parents
project_name

Return the project name.

Returns:project name
class pydriller.domain.commit.DMMProperty

Maintainability properties of the Delta Maintainability Model.

UNIT_COMPLEXITY = 2
UNIT_INTERFACING = 3
UNIT_SIZE = 1
__module__ = 'pydriller.domain.commit'
class pydriller.domain.commit.Method(func)

This class represents a method in a class. Contains various information extracted through Lizard.

UNIT_COMPLEXITY_LOW_RISK_THRESHOLD = 5

Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its cyclomatic complexity. The procedure to obtain the threshold is described in the PyDriller documentation.

UNIT_INTERFACING_LOW_RISK_THRESHOLD = 2

Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its interface. The procedure to obtain the threshold is described in the PyDriller documentation.

UNIT_SIZE_LOW_RISK_THRESHOLD = 15

Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its size. The procedure to obtain the threshold is described in the PyDriller documentation.

__init__(func)

Initialize a method object. This is calculated using Lizard: it parses the source code of all the modifications in a commit, extracting information of the methods contained in the file (if the file is a source code written in one of the supported programming languages).

__module__ = 'pydriller.domain.commit'
is_low_risk(dmm_prop: pydriller.domain.commit.DMMProperty) → bool

Predicate indicating whether this method is low risk in terms of the given property.

Parameters:dmm_prop – Property according to which this method is considered risky.
Returns:True if and only if the method is considered low-risk w.r.t. this property.
class pydriller.domain.commit.Modification(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])

This class contains information regarding a modified file in a commit.

__init__(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])

Initialize a modification. A modification carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.

__module__ = 'pydriller.domain.commit'
added

Return the total number of added lines in the file.

Returns:int lines_added
changed_methods

Return the list of methods that were changed. This analysis is more complex because Lizard runs twice: for methods before and after the change

Returns:list of methods
complexity

Calculate the Cyclomatic Complexity of the file.

Returns:Cyclomatic Complexity of the file
diff_parsed

Returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).

Returns:Dictionary
filename

Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)

Returns:str filename
language_supported

Return whether the language used in the modification can be analyzed by Pydriller. Languages are derived from the file extension. Supported languages are those supported by Lizard.

Returns:True iff language of this Modification can be analyzed.
methods

Return the list of methods in the file. Every method contains various information like complexity, loc, name, number of parameters, etc.

Returns:list of methods
methods_before

Return the list of methods in the file before the change happened. Each method will have all specific info, e.g. complexity, loc, name, etc.

Returns:list of methods
new_path

New path of the file. Can be None if the file is deleted.

Returns:str new_path
nloc

Calculate the LOC of the file.

Returns:LOC of the file
old_path

Old path of the file. Can be None if the file is added.

Returns:str old_path
removed

Return the total number of deleted lines in the file.

Returns:int lines_deleted
token_count

Calculate the token count of functions.

Returns:token count
class pydriller.domain.commit.ModificationType

Type of Modification. Can be ADD, COPY, RENAME, DELETE, MODIFY or UNKNOWN.

ADD = 1
COPY = 2
DELETE = 4
MODIFY = 5
RENAME = 3
UNKNOWN = 6
__module__ = 'pydriller.domain.commit'

Developer

This module includes only 1 class, Developer, representing a developer.

class pydriller.domain.developer.Developer(name: str, email: str)

This class represents a developer. We save the email and the name.

__init__(name: str, email: str)

Class to identify a developer.

Parameters:
  • name (str) – name and surname of the developer
  • email (str) – email of the developer
__module__ = 'pydriller.domain.developer'

Process Metrics

This module contains the abstract class to implement process metrics.

class pydriller.metrics.process.process_metric.ProcessMetric(path_to_repo: str, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None)

Abstract class to implement process metrics

__init__(path_to_repo: str, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None)
Path_to_repo:

path to a single repo

Parameters:
  • since (datetime) – starting date
  • to (datetime) – ending date
  • from_commit (str) – starting commit (only if since is None)
  • to_commit (str) – ending commit (only if to is None)
__module__ = 'pydriller.metrics.process.process_metric'
count()

Implement the main functionality of the metric