API Reference¶
GitRepository¶
-
class
pydriller.git_repository.
GitRepository
(path: str)¶ -
__init__
(path: str)¶ Init the Git Repository.
Parameters: path (str) – path to the repository
-
__module__
= 'pydriller.git_repository'¶
-
checkout
(_hash: str) → None¶ Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.
Parameters: _hash – commit hash to checkout
-
files
() → typing.List[str]¶ Obtain the list of the files (excluding .git directory).
Returns: List[str], the list of the files
-
get_change_sets
() → typing.List[pydriller.domain.commit.ChangeSet]¶ Return the list of all the commits in the repo.
Returns: List[ChangeSet], the list of all the commits in the repo
-
get_commit
(commit_id: str) → pydriller.domain.commit.Commit¶ Get the specified commit.
Parameters: commit_id (str) – hash of the commit to analyze Returns: Commit
-
get_commit_from_tag
(tag: str) → pydriller.domain.commit.Commit¶ Obtain the tagged commit.
Parameters: tag (str) – the tag Returns: Commit commit: the commit the tag referred to
-
get_head
() → pydriller.domain.commit.ChangeSet¶ Get the head commit.
Returns: ChangeSet of the head commit
-
parse_diff
(diff: str) → typing.Dict[str, typing.List[typing.Tuple[int, str]]]¶ Given a diff, returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).
Parameters: diff (str) – diff of the commit Returns: Dictionary
-
reset
() → None¶ Reset the state of the repo, checking out the main branch and discarding local changes (-f option).
-
total_commits
() → int¶ Calculate total number of commits.
Returns: the total number of commits
-
RepositoryMining¶
-
class
pydriller.repository_mining.
RepositoryMining
(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)¶ -
__init__
(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)¶ Init a repository mining.
Parameters: - path_to_repo (str) – absolute path to the repository you have to analyze
- single (str) – hash of a single commit to analyze
- since (datetime) – starting date
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
- from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
- to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
- reversed_order (bool) – whether the commits should be analyzed in reversed order
- only_in_main_branch (bool) – whether only commits in main branch should be analyzed
- only_in_branches (List[str]) – only commits in these branches will be analyzed
- only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
- only_no_merge (bool) – if True, merges will not be analyzed
-
__module__
= 'pydriller.repository_mining'¶
-
traverse_commits
() → typing.Generator[[pydriller.domain.commit.Commit, NoneType], NoneType]¶ Analyze all the specified commits (all of them by default), returning a generator of commits.
-
Commit¶
-
class
pydriller.domain.commit.
ChangeSet
(id: str, date: datetime.datetime)¶ -
__init__
(id: str, date: datetime.datetime)¶ Light-weight version of the commit, storing only the hash and the date. Used for filter out commits before asking for more complex information (like diff and source code).
Parameters: - id (str) – hash of the commit
- date – date of the commit
-
__module__
= 'pydriller.domain.commit'¶
-
-
class
pydriller.domain.commit.
Commit
(commit: git.objects.commit.Commit, path: str, main_branch: str) → None¶ -
__init__
(commit: git.objects.commit.Commit, path: str, main_branch: str) → None¶ Create a commit object.
-
__module__
= 'pydriller.domain.commit'¶
Return the author of the commit as a Developer object.
Returns: author
Return the authored datetime.
Returns: datetime author_datetime
Author timezone expressed in seconds from epoch.
Returns: int timezone
-
branches
¶ Return the set of branches that contain the commit.
Returns: set(str) branches
-
committer
¶ Return the committer of the commit as a Developer object.
Returns: committer
-
committer_date
¶ Return the committed datetime.
Returns: datetime committer_datetime
-
committer_timezone
¶ Author timezone expressed in seconds from epoch.
Returns: int timezone
-
hash
¶ Return the SHA of the commit.
Returns: str hash
-
in_main_branch
¶ Return True if the commit is in the main branch, False otherwise.
Returns: bool in_main_branch
-
merge
¶ Return True if the commit is a merge, False otherwise.
Returns: bool merge
-
modifications
¶ Return a list of modified files.
Returns: List[Modification] modifications
-
msg
¶ Return commit message.
Returns: str commit_message
-
parents
¶ Return the list of parents SHAs.
Returns: List[str] parents
-
-
class
pydriller.domain.commit.
Modification
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, parents: typing.List[str], hash: str, path: str = None, modifications_list=None)¶ -
__init__
(old_path: str, new_path: str, change_type: pydriller.domain.commit.ModificationType, parents: typing.List[str], hash: str, path: str = None, modifications_list=None)¶ Initialize a modification. A modification carries on information regarding the changed file. Normally, you shouldn’t initialize a new one.
-
__module__
= 'pydriller.domain.commit'¶
-
added
¶ Return the total number of added lines in the file.
Returns: int lines_added
-
diff
¶ Return the diff of the file of the current commit.
Returns: str diff
-
filename
¶ Return the filename. Given a path-like-string (e.g. “/Users/dspadini/pydriller/myfile.py”) returns only the filename (e.g. “myfile.py”)
Returns: str filename
-
removed
¶ Return the total number of deleted lines in the file.
Returns: int lines_deleted
-
source_code
¶ Return the source code of the file on the current commit.
Returns: str source_code
-