API Reference¶
Git¶
This module includes 1 class, Git, representing a repository in Git.
-
class
pydriller.git.
Git
(path: str, conf=None)¶ Class representing a repository in Git. It contains most of the logic of PyDriller: obtaining the list of commits, checkout, reset, etc.
-
__del__
()¶
-
__init__
(path: str, conf=None)¶ Init the Git Repository.
Parameters: path (str) – path to the repository
-
__module__
= 'pydriller.git'¶
-
checkout
(_hash: str) → None¶ Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.
Parameters: _hash – commit hash to checkout
-
clear
()¶ According to GitPython’s documentation, sometimes it leaks resources. This holds especially for Windows users. Hence, we need to clear the cache manually.
-
files
() → List[str]¶ Obtain the list of the files (excluding .git directory).
Returns: List[str], the list of the files
-
get_commit
(commit_id: str) → pydriller.domain.commit.Commit¶ Get the specified commit.
Parameters: commit_id (str) – hash of the commit to analyze Returns: Commit
-
get_commit_from_gitpython
(commit: git.objects.commit.Commit) → pydriller.domain.commit.Commit¶ Build a PyDriller commit object from a GitPython commit object. This is internal of PyDriller, I don’t think users generally will need it.
Parameters: commit (GitCommit) – GitPython commit Returns: Commit commit: PyDriller commit
-
get_commit_from_tag
(tag: str) → pydriller.domain.commit.Commit¶ Obtain the tagged commit.
Parameters: tag (str) – the tag Returns: Commit commit: the commit the tag referred to
-
get_commits_last_modified_lines
(commit: pydriller.domain.commit.Commit, modification: Optional[pydriller.domain.commit.ModifiedFile] = None, hashes_to_ignore_path: Optional[str] = None) → Dict[str, Set[str]]¶ Given the Commit object, returns the set of commits that last “touched” the lines that are modified in the files included in the commit. It applies SZZ.
The algorithm works as follow: (for every file in the commit)
1- obtain the diff
2- obtain the list of deleted lines
3- blame the file and obtain the commits were those lines were added
Can also be passed as parameter a single Modification, in this case only this file will be analyzed.
Parameters: - commit (Commit) – the commit to analyze
- modification (Modification) – single modification to analyze
- hashes_to_ignore_path (str) – path to a file containing hashes of commits to ignore.
Returns: Dict commits: a dictionary having as keys the files of the commit, and as values the commits that last touched those files.
-
get_commits_modified_file
(filepath: str, include_deleted_files=False) → List[str]¶ Given a filepath, returns all the commits that modified this file (following renames).
Parameters: - filepath (str) – path to the file
- include_deleted_files (bool) – if True, include commits that modifies a deleted file
Returns: the list of commits’ hash
-
get_head
() → pydriller.domain.commit.Commit¶ Get the head commit.
Returns: Commit of the head commit
-
get_list_commits
(rev='HEAD', **kwargs) → Generator[pydriller.domain.commit.Commit, None, None]¶ Return a generator of commits of all the commits in the repo.
Returns: Generator[Commit], the generator of all the commits in the repo
-
get_tagged_commits
()¶ Obtain the hash of all the tagged commits.
Returns: list of tagged commits (can be empty if there are no tags)
-
repo
¶ GitPython object Repo.
Returns: Repo
-
reset
() → None¶ Reset the state of the repo, checking out the main branch and discarding local changes (-f option).
-
total_commits
() → int¶ Calculate total number of commits.
Returns: the total number of commits
-
Repository¶
This module includes 1 class, Repository, main class of PyDriller.
-
exception
pydriller.repository.
MalformedUrl
(message)¶ -
__init__
(message)¶ Initialize self. See help(type(self)) for accurate signature.
-
__module__
= 'pydriller.repository'¶
-
-
class
pydriller.repository.
Repository
(path_to_repo: Union[str, List[str]], single: Optional[str] = None, since: Optional[datetime.datetime] = None, since_as_filter: Optional[datetime.datetime] = None, to: Optional[datetime.datetime] = None, from_commit: Optional[str] = None, to_commit: Optional[str] = None, from_tag: Optional[str] = None, to_tag: Optional[str] = None, include_refs: bool = False, include_remotes: bool = False, num_workers: int = 1, only_in_branch: Optional[str] = None, only_modifications_with_file_types: Optional[List[str]] = None, only_no_merge: bool = False, only_authors: Optional[List[str]] = None, only_commits: Optional[List[str]] = None, only_releases: bool = False, filepath: Optional[str] = None, include_deleted_files: bool = False, histogram_diff: bool = False, skip_whitespaces: bool = False, clone_repo_to: Optional[str] = None, order: Optional[str] = None)¶ This is the main class of PyDriller, responsible for running the study.
-
__init__
(path_to_repo: Union[str, List[str]], single: Optional[str] = None, since: Optional[datetime.datetime] = None, since_as_filter: Optional[datetime.datetime] = None, to: Optional[datetime.datetime] = None, from_commit: Optional[str] = None, to_commit: Optional[str] = None, from_tag: Optional[str] = None, to_tag: Optional[str] = None, include_refs: bool = False, include_remotes: bool = False, num_workers: int = 1, only_in_branch: Optional[str] = None, only_modifications_with_file_types: Optional[List[str]] = None, only_no_merge: bool = False, only_authors: Optional[List[str]] = None, only_commits: Optional[List[str]] = None, only_releases: bool = False, filepath: Optional[str] = None, include_deleted_files: bool = False, histogram_diff: bool = False, skip_whitespaces: bool = False, clone_repo_to: Optional[str] = None, order: Optional[str] = None)¶ Init a repository. The only required parameter is “path_to_repo”: to analyze a single repo, pass the absolute path to the repo; if you need to analyze more repos, pass a list of absolute paths.
Furthermore, PyDriller supports local and remote repositories: if you pass a path to a repo, PyDriller will run the study on that repo; if you pass an URL, PyDriller will clone the repo in a temporary folder, run the study, and delete the temporary folder.
Parameters: - path_to_repo (Union[str,List[str]]) – absolute path (or list of absolute paths) to the repository(ies) to analyze
- single (str) – hash of a single commit to analyze
- since (datetime) – starting date
- since_as_filter (datetime) – starting date (scans all commits, does not stop at first commit with date < since_as_filter)
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
- from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
- to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
- include_refs (bool) – whether to include refs and HEAD in commit analysis
- include_remotes (bool) – whether to include remote commits in analysis
- num_workers (int) – number of workers (i.e., threads). Please note, if num_workers > 1 the commits order is not maintained.
- only_in_branch (str) – only commits in this branch will be analyzed
- only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
- only_no_merge (bool) – if True, merges will not be analyzed
- only_authors (List[str]) – only commits of these authors will be analyzed (the check is done on the username, NOT the email)
- only_commits (List[str]) – only these commits will be analyzed
- only_releases (bool) – analyze only tagged commits
- histogram_diff (bool) – add the “–histogram” option when asking for the diff
- skip_whitespaces (bool) – add the “-w” option when asking for the diff
- clone_repo_to (str) – if the repo under analysis is remote, clone the repo to the specified directory
- filepath (str) – only commits that modified this file will be analyzed
- include_deleted_files (bool) – include commits modifying a deleted file (useful when analyzing a deleted filepath)
- order (str) – order of commits. It can be one of: ‘date-order’, ‘author-date-order’, ‘topo-order’, or ‘reverse’. If order=None, PyDriller returns the commits from the oldest to the newest.
-
__module__
= 'pydriller.repository'¶
-
traverse_commits
() → Generator[pydriller.domain.commit.Commit, None, None]¶ Analyze all the specified commits (all of them by default), returning a generator of commits.
-
Commit¶
This module contains all the classes regarding a specific commit, such as Commit, Modification, ModificationType and Method.
-
class
pydriller.domain.commit.
Commit
(commit: git.objects.commit.Commit, conf)¶ Class representing a Commit. Contains all the important information such as hash, author, dates, and modified files.
-
__init__
(commit: git.objects.commit.Commit, conf) → None¶ Create a commit object.
Parameters: - commit – GitPython Commit object
- conf – Configuration class
-
__module__
= 'pydriller.domain.commit'¶
Return the author of the commit as a Developer object.
Returns: author
Return the authored datetime.
Returns: datetime author_datetime
Author timezone expressed in seconds from epoch.
Returns: int timezone
-
branches
¶ Return the set of branches that contain the commit.
Returns: set(str) branches
-
committer
¶ Return the committer of the commit as a Developer object.
Returns: committer
-
committer_date
¶ Return the committed datetime.
Returns: datetime committer_datetime
-
committer_timezone
¶ Author timezone expressed in seconds from epoch.
Returns: int timezone
-
deletions
¶ Return the number of deleted lines in the commit (as shown from –shortstat).
Returns: int deletion lines
-
dmm_unit_complexity
¶ Return the Delta Maintainability Model (DMM) metric value for the unit complexity property.
It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the cyclomatic complexity of the modified methods.
It rewards (value close to 1.0) modifications to low-risk (low complexity) methods, or spliting risky (highly complex) ones. It penalizes (value close to 0.0) working on methods that remain complex or get more complex.
Returns: The DMM value (between 0.0 and 1.0) for method complexity in this commit. or None if none of the programming languages in the commit are supported.
-
dmm_unit_interfacing
¶ Return the Delta Maintainability Model (DMM) metric value for the unit interfacing property.
It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the interface (number of parameters) of the modified methods.
It rewards (value close to 1.0) modifications to low-risk (with few parameters) methods, or spliting risky (with many parameters) ones. It penalizes (value close to 0.0) working on methods that continue to have or are extended with too many parameters.
Returns: The dmm value (between 0.0 and 1.0) for method interfacing in this commit. or None if none of the programming languages in the commit are supported.
-
dmm_unit_size
¶ Return the Delta Maintainability Model (DMM) metric value for the unit size property.
It represents the proportion (between 0.0 and 1.0) of maintainability improving change, when considering the lengths of the modified methods.
It rewards (value close to 1.0) modifications to low-risk (small) methods, or spliting risky (large) ones. It penalizes (value close to 0.0) working on methods that remain large or get larger.
Returns: The DMM value (between 0.0 and 1.0) for method size in this commit, or None if none of the programming languages in the commit are supported.
-
files
¶ Return the number of modified files of the commit (as shown from –shortstat).
Returns: int modified files number
-
hash
¶ Return the SHA of the commit.
Returns: str hash
-
in_main_branch
¶ Return True if the commit is in the main branch, False otherwise.
Returns: bool in_main_branch
-
insertions
¶ Return the number of added lines in the commit (as shown from –shortstat).
Returns: int insertion lines
-
lines
¶ Return the number of modified lines in the commit (as shown from –shortstat).
Returns: int insertion + deletion lines
-
merge
¶ Return True if the commit is a merge, False otherwise.
Returns: bool merge
-
modified_files
¶ Return a list of modified files. The list is empty if the commit is a merge commit. For more info on this, see https://haacked.com/archive/2014/02/21/reviewing-merge-commits/ or https://github.com/ishepard/pydriller/issues/89#issuecomment-590243707
Returns: List[Modification] modifications
-
msg
¶ Return commit message.
Returns: str commit_message
-
parents
¶ Return the list of parents SHAs.
Returns: List[str] parents
-
project_name
¶ Return the project name.
Returns: project name
-
project_path
¶ Return the absolute path of the project.
Returns: project path
-
-
class
pydriller.domain.commit.
DMMProperty
¶ Maintainability properties of the Delta Maintainability Model.
-
UNIT_COMPLEXITY
= 2¶
-
UNIT_INTERFACING
= 3¶
-
UNIT_SIZE
= 1¶
-
__module__
= 'pydriller.domain.commit'¶
-
-
class
pydriller.domain.commit.
Method
(func: Any)¶ This class represents a method in a class. Contains various information extracted through Lizard.
-
UNIT_COMPLEXITY_LOW_RISK_THRESHOLD
= 5¶ Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its cyclomatic complexity. The procedure to obtain the threshold is described in the PyDriller documentation.
-
UNIT_INTERFACING_LOW_RISK_THRESHOLD
= 2¶ Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its interface. The procedure to obtain the threshold is described in the PyDriller documentation.
-
UNIT_SIZE_LOW_RISK_THRESHOLD
= 15¶ Threshold used in the Delta Maintainability Model to establish whether a method is low risk in terms of its size. The procedure to obtain the threshold is described in the PyDriller documentation.
-
__init__
(func: Any) → None¶ Initialize a method object. This is calculated using Lizard: it parses the source code of all the modifications in a commit, extracting information of the methods contained in the file (if the file is a source code written in one of the supported programming languages).
-
__module__
= 'pydriller.domain.commit'¶
-
is_low_risk
(dmm_prop: pydriller.domain.commit.DMMProperty) → bool¶ Predicate indicating whether this method is low risk in terms of the given property.
Parameters: dmm_prop – Property according to which this method is considered risky. Returns: True if and only if the method is considered low-risk w.r.t. this property.
-
-
class
pydriller.domain.commit.
ModificationType
¶ Type of Modification. Can be ADD, COPY, RENAME, DELETE, MODIFY or UNKNOWN.
-
ADD
= 1¶
-
COPY
= 2¶
-
DELETE
= 4¶
-
MODIFY
= 5¶
-
RENAME
= 3¶
-
UNKNOWN
= 6¶
-
__module__
= 'pydriller.domain.commit'¶
-
-
class
pydriller.domain.commit.
ModifiedFile
(diff: git.diff.Diff)¶ This class contains information regarding a modified file in a commit.
-
__init__
(diff: git.diff.Diff)¶ Initialize a modified file. A modified file carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.
-
__module__
= 'pydriller.domain.commit'¶
-
added_lines
¶ Return the total number of added lines in the file.
Returns: int lines_added
-
change_type
¶
-
changed_methods
¶ Return the list of methods that were changed. This analysis is more complex because Lizard runs twice: for methods before and after the change
Returns: list of methods
-
complexity
¶ Calculate the Cyclomatic Complexity of the file.
Returns: Cyclomatic Complexity of the file
-
content
¶
-
content_before
¶
-
deleted_lines
¶ Return the total number of deleted lines in the file.
Returns: int lines_deleted
-
diff
¶
-
diff_parsed
¶ Returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).
Returns: Dictionary
-
filename
¶ Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)
Returns: str filename
-
language_supported
¶ Return whether the language used in the modification can be analyzed by Pydriller. Languages are derived from the file extension. Supported languages are those supported by Lizard.
Returns: True iff language of this Modification can be analyzed.
-
methods
¶ Return the list of methods in the file. Every method contains various information like complexity, loc, name, number of parameters, etc.
Returns: list of methods
-
methods_before
¶ Return the list of methods in the file before the change happened. Each method will have all specific info, e.g. complexity, loc, name, etc.
Returns: list of methods
-
new_path
¶ New path of the file. Can be None if the file is deleted.
Returns: str new_path
-
nloc
¶ Calculate the LOC of the file.
Returns: LOC of the file
-
old_path
¶ Old path of the file. Can be None if the file is added.
Returns: str old_path
-
source_code
¶
-
source_code_before
¶
-
token_count
¶ Calculate the token count of functions.
Returns: token count
-
Developer¶
This module includes only 1 class, Developer, representing a developer.
-
class
pydriller.domain.developer.
Developer
(name: Optional[str] = None, email: Optional[str] = None)¶ This class represents a developer. We save the email and the name.
-
__init__
(name: Optional[str] = None, email: Optional[str] = None)¶ Class to identify a developer.
Parameters: - name (str) – name and surname of the developer
- email (str) – email of the developer
-
__module__
= 'pydriller.domain.developer'¶
-
Process Metrics¶
This module contains the abstract class to implement process metrics.
-
class
pydriller.metrics.process.process_metric.
ProcessMetric
(path_to_repo: str, since: Optional[datetime.datetime] = None, to: Optional[datetime.datetime] = None, from_commit: Optional[str] = None, to_commit: Optional[str] = None)¶ Abstract class to implement process metrics
-
__init__
(path_to_repo: str, since: Optional[datetime.datetime] = None, to: Optional[datetime.datetime] = None, from_commit: Optional[str] = None, to_commit: Optional[str] = None)¶ Path_to_repo: path to a single repo
Parameters: - since (datetime) – starting date
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
-
__module__
= 'pydriller.metrics.process.process_metric'¶
-
count
()¶ Implement the main functionality of the metric
-