.. highlight:: python ================== Process Metrics ================== Process metrics capture aspects of the development process rather than aspects about the code itself. From release 1.11 PyDriller can calculate ``change_set``, ``code churn``, ``commits count``, ``contributors count``, ``contributors experience``, ``history complexity``, ``hunks count``, ``lines count`` and ``minor contributors``. Everything in just one line! The metrics can be run between two commits (setting up the parameters ``from_commit`` and ``to_commit``) or between two dates (setting up the parameters ``since`` and ``to``) Below an example of how call the metrics. Change Set ========== This metric measures the of files committed together. The class ``ChangeSet`` has two methods: * ``max()`` to count the *maximum* number of files committed together; * ``avg()`` to count the *average* number of files committed together. **Note:** The average value is rounded off to the nearest integer. For example:: from pydriller.metrics.process.change_set import ChangeSet metric = ChangeSet(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') maximum = metric.max() average = metric.avg() print('Maximum number of files committed together: {}'.format(maximum)) print('Average number of files committed together: {}'.format(average)) will print the maximum and average number of files committed together in the evolution period ``[from_commit, to_commit]``. **Note:** differently from the other metrics below, the scope of this metrics is the evolution period rather than the single files. It is possible to specify the dates as follows:: from datetime import datetime from pydriller.metrics.process.change_set import ChangeSet metric = ChangeSet(path_to_repo='path/to/the/repo', since=datetime(2019, 1, 1), to=datetime(2019, 12, 31)) maximum = metric.max() average = metric.avg() print('Maximum number of files committed together: {}'.format(maximum)) print('Average number of files committed together: {}'.format(average)) The code above will print the maximum and average number of files committed together between the ``1st January 2019`` and ``31st December 2019``. Code Churn ========== This metric measures the code churns of a file. Depending on the parametrization, a code churn is the sum of either (a) (added lines - removed lines) or (b) (added lines + removed lines) across the analyzed commits. The class ``CodeChurn`` has four methods: * ``count()`` to count the *total* size of code churns of a file; * ``max()`` to count the *maximum* size of a code churn of a file; * ``avg()`` to count the *average* size of a code churn of a file. **Note:** The average value is rounded off to the nearest integer; * ``get_added_and_removed_lines()`` to retrieve the *exact* number of lines added and removed for each file as a tuple (added_lines, removed_lines). For example:: from pydriller.metrics.process.code_churn import CodeChurn metric = CodeChurn(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') files_count = metric.count() files_max = metric.max() files_avg = metric.avg() added_removed_lines = metric.get_added_and_removed_lines() print('Total code churn for each file: {}'.format(files_count)) print('Maximum code churn for each file: {}'.format(files_max)) print('Average code churn for each file: {}'.format(files_avg)) print('Lines added and removed for each file: {}'.format(added_removed_lines)) will print the total, maximum, and average number of code churns for each modified file, along with the number of lines added and removed, in the evolution period ``[from_commit, to_commit]``. The calculation variant (a) or (b) can be configured by setting the ``CodeChurn`` init parameter: * ``add_deleted_lines_to_churn`` To retrieve the added and removed lines for each file directly, the ``get_added_and_removed_lines()`` method can be used, which returns a dictionary with file paths as keys and a tuple (added_lines, removed_lines) as values. Commits Count ============= This metric measures the number of commits made to a file. The class ``CommitCount`` has one method: * ``count()`` to count the number of commits to a file. For example:: from pydriller.metrics.process.commits_count import CommitsCount metric = CommitsCount(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') files = metric.count() print('Files: {}'.format(files)) will print the number of commits for each modified file in the evolution period ``[from_commit, to_commit]``. Contributors Count ================== This metric measures the number of developers that contributed to a file. The class ``ContributorsCount`` has two methods: * ``count()`` to count the number of contributors who modified a file; * ``count_minor()`` to count the number of *minor* contributors who modified a file, i.e., those that contributed less than 5% to the file. For example:: from pydriller.metrics.process.contributors_count import ContributorsCount metric = ContributorsCount(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') count = metric.count() minor = metric.count_minor() print('Number of contributors per file: {}'.format(count)) print('Number of "minor" contributors per file: {}'.format(minor)) will print the number of developers that contributed to each of the modified file in the evolution period ``[from_commit, to_commit]`` and the number of developers that contributed less than 5% to each of the modified file in the evolution period ``[from_commit, to_commit]``. Contributors Experience ======================== This metric measures the percetage of the lines authored by the highest contributor of a file. The class ``ContributorExperience`` has one method: * ``count()`` to count the number of lines authored by the highest contributor of a file; For example:: from pydriller.metrics.process.contributors_experience import ContributorsExperience metric = ContributorsExperience(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') files = metric.count() print('Files: {}'.format(files)) will print the percentage of the lines authored by the highest contributor for each of the modified file in the evolution period ``[from_commit, to_commit]``. Hunks Count =========== This metric measures the number of hunks made to a file. As a hunk is a continuous block of changes in a ``diff``, this number assesses how fragmented the commit file is (i.e. lots of changes all over the file versus one big change). The class ``HunksCount`` has one method: * ``count()`` to count the median number of hunks of a file. For example:: from pydriller.metrics.process.hunks_count import HunksCount metric = HunksCount(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') files = metric.count() print('Files: {}'.format(files)) will print the median number of hunks for each of the modified file in the evolution period ``[from_commit, to_commit]``. Lines Count =========== This metric measures the number of added and removed lines in a file. The class ``LinesCount`` has seven methods: * ``count()`` to count the total number of added and removed lines for each modified file; * ``count_added()``, ``max_added()`` and ``avg_added()`` to count the total, maximum and average number of added lines for each modified file; * ``count_removed()``, ``max_removed()`` and ``avg_removed()`` to count the total, maximum and average number of removed lines for each modified file. **Note:** The average values are rounded off to the nearest integer. For example:: from pydriller.metrics.process.lines_count import LinesCount metric = LinesCount(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') added_count = metric.count_added() added_max = metric.max_added() added_avg = metric.avg_added() print('Total lines added per file: {}'.format(added_count)) print('Maximum lines added per file: {}'.format(added_max)) print('Average lines added per file: {}'.format(added_avg)) will print the total, maximum and average number of lines added for each modified file in the evolution period ``[from_commit, to_commit]``. While:: from pydriller.metrics.process.lines_count import LinesCount metric = LinesCount(path_to_repo='path/to/the/repo', from_commit='from commit hash', to_commit='to commit hash') removed_count = metric.count_removed() removed_max = metric.max_removed() removed_avg = metric.avg_removed() print('Total lines removed per file: {}'.format(removed_count)) print('Maximum lines removed per file: {}'.format(removed_max)) print('Average lines removed per file: {}'.format(removed_avg)) will print the total, maximum and average number of lines removed for each modified file in the evolution period ``[from_commit, to_commit]``.