API Reference¶
GitRepository¶
This module includes 1 class, GitRepository, representing a repository in Git.
-
class
pydriller.git_repository.
GitRepository
(path: str, conf=None)¶ Class representing a repository in Git. It contains most of the logic of PyDriller: obtaining the list of commits, checkout, reset, etc.
-
__del__
()¶
-
__init__
(path: str, conf=None)¶ Init the Git RepositoryMining.
Parameters: path (str) – path to the repository
-
__module__
= 'pydriller.git_repository'¶
-
checkout
(_hash: str) → None¶ Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.
Parameters: _hash – commit hash to checkout
-
clear
()¶ According to GitPython’s documentation, sometimes it leaks resources. This holds especially for Windows users. Hence, we need to clear the cache manually.
-
files
() → List[str]¶ Obtain the list of the files (excluding .git directory).
Returns: List[str], the list of the files
-
get_commit
(commit_id: str) → pydriller.domain.commit.Commit¶ Get the specified commit.
Parameters: commit_id (str) – hash of the commit to analyze Returns: Commit
-
get_commit_from_gitpython
(commit: git.objects.commit.Commit) → pydriller.domain.commit.Commit¶ Build a PyDriller commit object from a GitPython commit object. This is internal of PyDriller, I don’t think users generally will need it.
Parameters: commit (GitCommit) – GitPython commit Returns: Commit commit: PyDriller commit
-
get_commit_from_tag
(tag: str) → pydriller.domain.commit.Commit¶ Obtain the tagged commit.
Parameters: tag (str) – the tag Returns: Commit commit: the commit the tag referred to
-
get_commits_last_modified_lines
(commit: pydriller.domain.commit.Commit, modification: pydriller.domain.commit.Modification = None, hashes_to_ignore_path: str = None) → Dict[str, Set[str]]¶ Given the Commit object, returns the set of commits that last “touched” the lines that are modified in the files included in the commit. It applies SZZ.
The algorithm works as follow: (for every file in the commit)
1- obtain the diff
2- obtain the list of deleted lines
3- blame the file and obtain the commits were those lines were added
Can also be passed as parameter a single Modification, in this case only this file will be analyzed.
Parameters: - commit (Commit) – the commit to analyze
- modification (Modification) – single modification to analyze
- hashes_to_ignore_path (str) – path to a file containing hashes of commits to ignore.
Returns: the set containing all the bug inducing commits
-
get_commits_modified_file
(filepath: str) → List[str]¶ Given a filepath, returns all the commits that modified this file (following renames).
Parameters: filepath (str) – path to the file Returns: the list of commits’ hash
-
get_head
() → pydriller.domain.commit.Commit¶ Get the head commit.
Returns: Commit of the head commit
-
get_list_commits
(rev='HEAD', **kwargs) → Generator[[pydriller.domain.commit.Commit, None], None]¶ Return a generator of commits of all the commits in the repo.
Returns: Generator[Commit], the generator of all the commits in the repo
-
get_tagged_commits
()¶ Obtain the hash of all the tagged commits.
Returns: list of tagged commits (can be empty if there are no tags)
-
git
¶ GitPython object Git.
Returns: Git
-
repo
¶ GitPython object Repo.
Returns: Repo
-
reset
() → None¶ Reset the state of the repo, checking out the main branch and discarding local changes (-f option).
-
total_commits
() → int¶ Calculate total number of commits.
Returns: the total number of commits
-
RepositoryMining¶
This module includes 1 class, RepositoryMining, main class of PyDriller.
-
class
pydriller.repository_mining.
RepositoryMining
(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, include_refs: bool = False, include_remotes: bool = False, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None, only_releases: bool = False, filepath: str = None, histogram_diff: bool = False, skip_whitespaces: bool = False, clone_repo_to: str = None, order: str = None)¶ This is the main class of PyDriller, responsible for running the study.
-
__init__
(path_to_repo: Union[str, List[str]], single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, include_refs: bool = False, include_remotes: bool = False, reversed_order: bool = False, only_in_branch: str = None, only_modifications_with_file_types: List[str] = None, only_no_merge: bool = False, only_authors: List[str] = None, only_commits: List[str] = None, only_releases: bool = False, filepath: str = None, histogram_diff: bool = False, skip_whitespaces: bool = False, clone_repo_to: str = None, order: str = None)¶ Init a repository mining. The only required parameter is “path_to_repo”: to analyze a single repo, pass the absolute path to the repo; if you need to analyze more repos, pass a list of absolute paths.
Furthermore, PyDriller supports local and remote repositories: if you pass a path to a repo, PyDriller will run the study on that repo; if you pass an URL, PyDriller will clone the repo in a temporary folder, run the study, and delete the temporary folder.
Parameters: - path_to_repo (Union[str,List[str]]) – absolute path (or list of absolute paths) to the repository(ies) to analyze
- single (str) – hash of a single commit to analyze
- since (datetime) – starting date
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
- from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
- to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
- include_refs (bool) – whether to include refs and HEAD in commit analysis
- include_remotes (bool) – whether to include remote commits in analysis
- reversed_order (bool) – whether the commits should be analyzed in reversed order (DEPRECATED)
- only_in_branch (str) – only commits in this branch will be analyzed
- only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
- only_no_merge (bool) – if True, merges will not be analyzed
- only_authors (List[str]) – only commits of these authors will be analyzed (the check is done on the username, NOT the email)
- only_commits (List[str]) – only these commits will be analyzed
- filepath (str) – only commits that modified this file will be analyzed
- order (str) – order of commits. It can be one of: ‘date-order’, ‘author-date-order’, ‘topo-order’, or ‘reverse’. Default is reverse.
-
__module__
= 'pydriller.repository_mining'¶
-
traverse_commits
() → Generator[[pydriller.domain.commit.Commit, None], None]¶ Analyze all the specified commits (all of them by default), returning a generator of commits.
-
Commit¶
This module contains all the classes regarding a specific commit, such as Commit, Modification, ModificationType and Method.
-
class
pydriller.domain.commit.
Commit
(commit: git.objects.commit.Commit, conf)¶ Class representing a Commit. Contains all the important information such as hash, author, dates, and modified files.
-
__init__
(commit: git.objects.commit.Commit, conf) → None¶ Create a commit object.
Parameters: - commit – GitPython Commit object
- conf – Configuration class
-
__module__
= 'pydriller.domain.commit'¶
Return the author of the commit as a Developer object.
Returns: author
Return the authored datetime.
Returns: datetime author_datetime
Author timezone expressed in seconds from epoch.
Returns: int timezone
-
branches
¶ Return the set of branches that contain the commit.
Returns: set(str) branches
-
committer
¶ Return the committer of the commit as a Developer object.
Returns: committer
-
committer_date
¶ Return the committed datetime.
Returns: datetime committer_datetime
-
committer_timezone
¶ Author timezone expressed in seconds from epoch.
Returns: int timezone
-
dmm_unit_complexity
¶ Return the Delta Maintainability Model (DMM) metric value for the unit complexity property.
It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the cyclomatic complexity of the modified methods.
It rewards (value close to 1.0) modifications to low-risk (low complexity) methods, or spliting risky (highly complex) ones. It penalizes (value close to 0.0) working on methods that remain complex or get more complex.
Returns: The DMM value (between 0.0 and 1.0) for method complexity in this commit. or None if none of the programming languages in the commit are supported.
-
dmm_unit_interfacing
¶ Return the Delta Maintainability Model (DMM) metric value for the unit interfacing property.
It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the interface (number of parameters) of the modified methods.
It rewards (value close to 1.0) modifications to low-risk (with few parameters) methods, or spliting risky (with many parameters) ones. It penalizes (value close to 0.0) working on methods that continue to have or are extended with too many parameters.
Returns: The dmm value (between 0.0 and 1.0) for method interfacing in this commit. or None if none of the programming languages in the commit are supported.
-
dmm_unit_size
¶ Return the Delta Maintainability Model (DMM) metric value for the unit size property.
It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the lengths of the modified methods.
It rewards (value close to 1.0) modifications to low-risk (small) methods, or spliting risky (large) ones. It penalizes (value close to 0.0) working on methods that remain large or get larger.
Returns: The DMM value (between 0.0 and 1.0) for method size in this commit, or None if none of the programming languages in the commit are supported.
-
hash
¶ Return the SHA of the commit.
Returns: str hash
-
in_main_branch
¶ Return True if the commit is in the main branch, False otherwise.
Returns: bool in_main_branch
-
merge
¶ Return True if the commit is a merge, False otherwise.
Returns: bool merge
-
modifications
¶ Return a list of modified files.
Returns: List[Modification] modifications
-
msg
¶ Return commit message.
Returns: str commit_message
-
parents
¶ Return the list of parents SHAs.
Returns: List[str] parents
-
project_name
¶ Return the project name.
Returns: project name
-
-
class
pydriller.domain.commit.
DMMProperty
¶ Maintainability properties of the Delta Maintainability Model.
-
UNIT_COMPLEXITY
= 2¶
-
UNIT_INTERFACING
= 3¶
-
UNIT_SIZE
= 1¶
-
__module__
= 'pydriller.domain.commit'¶
-
-
class
pydriller.domain.commit.
Method
(func)¶ This class represents a method in a class. Contains various information extracted through Lizard.
-
UNIT_COMPLEXITY_LOW_RISK_THRESHOLD
= 5¶ Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its cyclomatic complexity. The procedure to obtain the threshold is described in the PyDriller documentation.
-
UNIT_INTERFACING_LOW_RISK_THRESHOLD
= 2¶ Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its interface. The procedure to obtain the threshold is described in the PyDriller documentation.
-
UNIT_SIZE_LOW_RISK_THRESHOLD
= 15¶ Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its size. The procedure to obtain the threshold is described in the PyDriller documentation.
-
__init__
(func)¶ Initialize a method object. This is calculated using Lizard: it parses the source code of all the modifications in a commit, extracting information of the methods contained in the file (if the file is a source code written in one of the supported programming languages).
-
__module__
= 'pydriller.domain.commit'¶
-
is_low_risk
(dmm_prop: pydriller.domain.commit.DMMProperty) → bool¶ Predicate indicating whether this method is low risk in terms of the given property.
Parameters: dmm_prop – Property according to which this method is considered risky. Returns: True if and only if the method is considered low-risk w.r.t. this property.
-
-
class
pydriller.domain.commit.
Modification
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])¶ This class contains information regarding a modified file in a commit.
-
__init__
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, diff_and_sc: Dict[str, str])¶ Initialize a modification. A modification carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.
-
__module__
= 'pydriller.domain.commit'¶
-
added
¶ Return the total number of added lines in the file.
Returns: int lines_added
-
changed_methods
¶ Return the list of methods that were changed. This analysis is more complex because Lizard runs twice: for methods before and after the change
Returns: list of methods
-
complexity
¶ Calculate the Cyclomatic Complexity of the file.
Returns: Cyclomatic Complexity of the file
-
diff_parsed
¶ Returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).
Returns: Dictionary
-
filename
¶ Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)
Returns: str filename
-
language_supported
¶ Return whether the language used in the modification can be analyzed by Pydriller. Languages are derived from the file extension. Supported languages are those supported by Lizard.
Returns: True iff language of this Modification can be analyzed.
-
methods
¶ Return the list of methods in the file. Every method contains various information like complexity, loc, name, number of parameters, etc.
Returns: list of methods
-
methods_before
¶ Return the list of methods in the file before the change happened. Each method will have all specific info, e.g. complexity, loc, name, etc.
Returns: list of methods
-
new_path
¶ New path of the file. Can be None if the file is deleted.
Returns: str new_path
-
nloc
¶ Calculate the LOC of the file.
Returns: LOC of the file
-
old_path
¶ Old path of the file. Can be None if the file is added.
Returns: str old_path
-
removed
¶ Return the total number of deleted lines in the file.
Returns: int lines_deleted
-
token_count
¶ Calculate the token count of functions.
Returns: token count
-
Developer¶
This module includes only 1 class, Developer, representing a developer.
-
class
pydriller.domain.developer.
Developer
(name: str, email: str)¶ This class represents a developer. We save the email and the name.
-
__init__
(name: str, email: str)¶ Class to identify a developer.
Parameters: - name (str) – name and surname of the developer
- email (str) – email of the developer
-
__module__
= 'pydriller.domain.developer'¶
-
Process Metrics¶
This module contains the abstract class to implement process metrics.
-
class
pydriller.metrics.process.process_metric.
ProcessMetric
(path_to_repo: str, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None)¶ Abstract class to implement process metrics
-
__init__
(path_to_repo: str, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None)¶ Path_to_repo: path to a single repo
Parameters: - since (datetime) – starting date
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
-
__module__
= 'pydriller.metrics.process.process_metric'¶
-
count
()¶ Implement the main functionality of the metric
-