Process Metrics

Process metrics capture aspects of the development process rather than aspects about the code itself. From release 1.11 PyDriller can calculate change_set, code churn, commits count, contributors count, contributors experience, history complexity, hunks count, lines count and minor contributors. Everything in just one line!

The metrics can be run between two commits (setting up the parameters from_commit and to_commit) or between two dates (setting up the parameters since and to)

Below an example of how call the metrics.

Change Set

This metric measures the of files committed together.

The class ChangeSet has two methods:

max() to count the maximum number of files committed together;
avg() to count the average number of files committed together. Note: The average value is rounded off to the nearest integer.

For example:

from pydriller.metrics.process.change_set import ChangeSet
metric = ChangeSet(path_to_repo='path/to/the/repo',
                   from_commit='from commit hash',
                   to_commit='to commit hash')

maximum = metric.max()
average = metric.avg()
print('Maximum number of files committed together: {}'.format(maximum))
print('Average number of files committed together: {}'.format(average))

will print the maximum and average number of files committed together in the evolution period [from_commit, to_commit].

Note: differently from the other metrics below, the scope of this metrics is the evolution period rather than the single files.

It is possible to specify the dates as follows:

from datetime import datetime
from pydriller.metrics.process.change_set import ChangeSet
metric = ChangeSet(path_to_repo='path/to/the/repo',
                   since=datetime(2019, 1, 1),
                   to=datetime(2019, 12, 31))

maximum = metric.max()
average = metric.avg()
print('Maximum number of files committed together: {}'.format(maximum))
print('Average number of files committed together: {}'.format(average))

The code above will print the maximum and average number of files committed together between the 1st January 2019 and 31st December 2019.

Code Churn

This metric measures the code churns of a file.

Depending on the parametrization, a code churn is the sum of either

(added lines - removed lines) or

(added lines + removed lines)

across the analyzed commits.

The class CodeChurn has four methods:

count() to count the total size of code churns of a file;
max() to count the maximum size of a code churn of a file;
avg() to count the average size of a code churn of a file. Note: The average value is rounded off to the nearest integer;
get_added_and_removed_lines() to retrieve the exact number of lines added and removed for each file as a tuple (added_lines, removed_lines).

For example:

from pydriller.metrics.process.code_churn import CodeChurn
metric = CodeChurn(path_to_repo='path/to/the/repo',
                   from_commit='from commit hash',
                   to_commit='to commit hash')
files_count = metric.count()
files_max = metric.max()
files_avg = metric.avg()
added_removed_lines = metric.get_added_and_removed_lines()

print('Total code churn for each file: {}'.format(files_count))
print('Maximum code churn for each file: {}'.format(files_max))
print('Average code churn for each file: {}'.format(files_avg))
print('Lines added and removed for each file: {}'.format(added_removed_lines))

will print the total, maximum, and average number of code churns for each modified file, along with the number of lines added and removed, in the evolution period [from_commit, to_commit].

The calculation variant (a) or (b) can be configured by setting the CodeChurn init parameter:

add_deleted_lines_to_churn

To retrieve the added and removed lines for each file directly, the get_added_and_removed_lines() method can be used, which returns a dictionary with file paths as keys and a tuple (added_lines, removed_lines) as values.

Commits Count

This metric measures the number of commits made to a file.

The class CommitCount has one method:

count() to count the number of commits to a file.

For example:

from pydriller.metrics.process.commits_count import CommitsCount
metric = CommitsCount(path_to_repo='path/to/the/repo',
                      from_commit='from commit hash',
                      to_commit='to commit hash')
files = metric.count()
print('Files: {}'.format(files))

will print the number of commits for each modified file in the evolution period [from_commit, to_commit].

Contributors Count

This metric measures the number of developers that contributed to a file.

The class ContributorsCount has two methods:

count() to count the number of contributors who modified a file;
count_minor() to count the number of minor contributors who modified a file, i.e., those that contributed less than 5% to the file.

For example:

from pydriller.metrics.process.contributors_count import ContributorsCount
metric = ContributorsCount(path_to_repo='path/to/the/repo',
                           from_commit='from commit hash',
                           to_commit='to commit hash')
count = metric.count()
minor = metric.count_minor()
print('Number of contributors per file: {}'.format(count))
print('Number of "minor" contributors per file: {}'.format(minor))

will print the number of developers that contributed to each of the modified file in the evolution period [from_commit, to_commit] and the number of developers that contributed less than 5% to each of the modified file in the evolution period [from_commit, to_commit].

Contributors Experience

This metric measures the percetage of the lines authored by the highest contributor of a file.

The class ContributorExperience has one method:

count() to count the number of lines authored by the highest contributor of a file;

For example:

from pydriller.metrics.process.contributors_experience import ContributorsExperience
metric = ContributorsExperience(path_to_repo='path/to/the/repo',
                                from_commit='from commit hash',
                                to_commit='to commit hash')
files = metric.count()
print('Files: {}'.format(files))

will print the percentage of the lines authored by the highest contributor for each of the modified file in the evolution period [from_commit, to_commit].

Hunks Count

This metric measures the number of hunks made to a file. As a hunk is a continuous block of changes in a diff, this number assesses how fragmented the commit file is (i.e. lots of changes all over the file versus one big change).

The class HunksCount has one method:

count() to count the median number of hunks of a file.

For example:

from pydriller.metrics.process.hunks_count import HunksCount
metric = HunksCount(path_to_repo='path/to/the/repo',
                    from_commit='from commit hash',
                    to_commit='to commit hash')
files = metric.count()
print('Files: {}'.format(files))

will print the median number of hunks for each of the modified file in the evolution period [from_commit, to_commit].

Lines Count

This metric measures the number of added and removed lines in a file. The class LinesCount has seven methods:

count() to count the total number of added and removed lines for each modified file;
count_added(), max_added() and avg_added() to count the total, maximum and average number of added lines for each modified file;
count_removed(), max_removed() and avg_removed() to count the total, maximum and average number of removed lines for each modified file.

Note: The average values are rounded off to the nearest integer.

For example:

from pydriller.metrics.process.lines_count import LinesCount
metric = LinesCount(path_to_repo='path/to/the/repo',
                    from_commit='from commit hash',
                    to_commit='to commit hash')

added_count = metric.count_added()
added_max = metric.max_added()
added_avg = metric.avg_added()
print('Total lines added per file: {}'.format(added_count))
print('Maximum lines added per file: {}'.format(added_max))
print('Average lines added per file: {}'.format(added_avg))

will print the total, maximum and average number of lines added for each modified file in the evolution period [from_commit, to_commit].

While:

from pydriller.metrics.process.lines_count import LinesCount
metric = LinesCount(path_to_repo='path/to/the/repo',
                    from_commit='from commit hash',
                    to_commit='to commit hash')

removed_count = metric.count_removed()
removed_max = metric.max_removed()
removed_avg = metric.avg_removed()
print('Total lines removed per file: {}'.format(removed_count))
print('Maximum lines removed per file: {}'.format(removed_max))
print('Average lines removed per file: {}'.format(removed_avg))

will print the total, maximum and average number of lines removed for each modified file in the evolution period [from_commit, to_commit].