API Reference¶
GitRepository¶
-
class
pydriller.git_repository.
GitRepository
(path: str)¶ -
__init__
(path: str)¶ Init the Git Repository.
Parameters: path – path to the repository
-
__module__
= 'pydriller.git_repository'¶
-
checkout
(_hash: str) → None¶ Checkout the repo at the speficied commit. BE CAREFUL: this will change the state of the repo, hence it should not be used with more than 1 thread.
Parameters: _hash – commit hash to checkout
-
files
() → typing.List[str]¶ Obtain the list of the files (excluding .git directory).
Returns: List[str], the list of the files
-
get_change_sets
() → typing.List[pydriller.domain.commit.ChangeSet]¶ Return the list of all the commits in the repo.
Returns: List[ChangeSet], the list of all the commits in the repo
-
get_commit
(commit_id: str) → pydriller.domain.commit.Commit¶ Get the specified commit.
Parameters: commit_id – hash of the commit to analyze Returns: Commit
-
get_commit_from_tag
(tag: str) → pydriller.domain.commit.Commit¶ Obtain the tagged commit.
Parameters: tag (str) – the tag Returns: Commit commit: the commit the tag referred to
-
get_head
() → pydriller.domain.commit.ChangeSet¶ Get the head commit.
Returns: ChangeSet of the head commit
-
parse_diff
(diff: str) → typing.Dict[str, typing.List[typing.Tuple[int, str]]]¶ Given a diff, returns a dictionary with the added and deleted lines. The dictionary has 2 keys: “added” and “deleted”, each containing the corresponding added or deleted lines. For both keys, the value is a list of Tuple (int, str), corresponding to (number of line in the file, actual line).
Parameters: diff (str) – diff of the commit Returns: Dictionary
-
reset
() → None¶ Reset the state of the repo, checking out the main branch and discarding local changes (-f option).
-
total_commits
() → int¶ Calculate total number of commits.
Returns: the total number of commits
-
RepositoryMining¶
-
class
pydriller.repository_mining.
RepositoryMining
(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)¶ -
__init__
(path_to_repo: str, single: str = None, since: datetime.datetime = None, to: datetime.datetime = None, from_commit: str = None, to_commit: str = None, from_tag: str = None, to_tag: str = None, reversed_order: bool = False, only_in_main_branch: bool = False, only_in_branches: typing.List[str] = None, only_modifications_with_file_types: typing.List[str] = None, only_no_merge: bool = False, num_threads: int = 1)¶ Init a repository mining.
Parameters: - path_to_repo (str) – absolute path to the repository you have to analyze
- single (str) – hash of a single commit to analyze
- since (datetime) – starting date
- to (datetime) – ending date
- from_commit (str) – starting commit (only if since is None)
- to_commit (str) – ending commit (only if to is None)
- from_tag (str) – starting the analysis from specified tag (only if since and from_commit are None)
- to_tag (str) – ending the analysis from specified tag (only if to and to_commit are None)
- reversed_order (bool) – whether the commits should be analyzed in reversed order
- only_in_main_branch (bool) – whether only commits in main branch should be analyzed
- only_in_branches (List[str]) – only commits in these branches will be analyzed
- only_modifications_with_file_types (List[str]) – only modifications with that file types will be analyzed
- only_no_merge (bool) – if True, merges will not be analyzed
-
__module__
= 'pydriller.repository_mining'¶
-
traverse_commits
() → typing.Generator[[pydriller.domain.commit.Commit, NoneType], NoneType]¶ Analyze all the specified commits (all of them by default), returning a generator of commits.
-
Commit¶
-
class
pydriller.domain.commit.
ChangeSet
(id: str, date: datetime.datetime)¶ -
__init__
(id: str, date: datetime.datetime)¶ Light-weight version of the commit, storing only the hash and the date. Used for filter out commits before asking for more complex information (like diff and source code).
Parameters: - id (str) – hash of the commit
- date – date of the commit
-
__module__
= 'pydriller.domain.commit'¶
-
-
class
pydriller.domain.commit.
Commit
(hash: str, author: pydriller.domain.developer.Developer, committer: pydriller.domain.developer.Developer, author_date: datetime.datetime, committer_date: datetime.datetime, author_timezone: int, committer_timezone: int, msg: str, parents: typing.List[str], merge: bool = False, branches: set = set(), is_commit_in_main_branch: bool = False) → None¶ -
__init__
(hash: str, author: pydriller.domain.developer.Developer, committer: pydriller.domain.developer.Developer, author_date: datetime.datetime, committer_date: datetime.datetime, author_timezone: int, committer_timezone: int, msg: str, parents: typing.List[str], merge: bool = False, branches: set = set(), is_commit_in_main_branch: bool = False) → None¶ Create a commit object.
Parameters: - hash (str) – hash of the commit
- author (Developer) – author of the commit
- committer (Developer) – committer of the commit
- author_date (datetime) – date when the author committed
- committer_date (datetime) – date when the committer committed
- author_timezone (int) – seconds west from UTC
- committer_timezone (int) – seconds west from UTC
- msg (str) – message of the commit
- parents (List[str]) – list of hashes of the parent commits
- merge (bool) – True if the commit is a merge commit
- branches (set) – branches that include the commit
- is_commit_in_main_branch (bool) – True if the commit is in the main branch
-
__module__
= 'pydriller.domain.commit'¶
-
add_modifications
(old_path: str, new_path: str, change: pydriller.domain.modification.ModificationType, diff: str, sc: str)¶ Add a modification to the commit.
Parameters: - old_path (str) – old path of the file (can be null if the file is added)
- new_path (str) – new path of the file (can be null if the file is deleted)
- change (ModificationType) – type of the change
- diff (str) – diff of the change
- sc (str) – source code of the file (can be null if the file is deleted)
-
Modification¶
-
class
pydriller.domain.modification.
Modification
(old_path: str, new_path: str, change_type: pydriller.domain.modification.ModificationType, diff: str, source_code: str)¶ -
__init__
(old_path: str, new_path: str, change_type: pydriller.domain.modification.ModificationType, diff: str, source_code: str)¶ Initialize a modification. A modification carries on information regarding the changed file.
Parameters: - old_path – old path of the file (can be null if the file is added)
- new_path – new path of the file (can be null if the file is deleted)
- change_type – type of the change
- diff – diff of the change
- source_code – source code of the file (can be null if the file is deleted)
-
__module__
= 'pydriller.domain.modification'¶
-