.. highlight:: python

==================
Process Metrics
==================

Process metrics capture aspects of the development process rather than aspects about the code itself.
From release 1.11 PyDriller can calculate ``change_set``, ``code churn``, ``commits count``, ``contributors count``, ``contributors experience``, ``history complexity``, ``hunks count``, ``lines count`` and ``minor contributors``. Everything in just one line!

The metrics can be run between two commits (setting up the parameters ``from_commit`` and ``to_commit``) or between two dates (setting up the parameters ``since`` and ``to``)

Below an example of how call the metrics.


Change Set
==========

This metric measures the of files committed together.

The class ``ChangeSet`` has two methods:

* ``max()`` to count the *maximum* number of files committed together;
* ``avg()`` to count the *average* number of files committed together. **Note:** The average value is rounded off to the nearest integer.

For example::

    from pydriller.metrics.process.change_set import ChangeSet
    metric = ChangeSet(path_to_repo='path/to/the/repo',
                       from_commit='from commit hash',
                       to_commit='to commit hash')
    
    maximum = metric.max()
    average = metric.avg()
    print('Maximum number of files committed together: {}'.format(maximum))
    print('Average number of files committed together: {}'.format(average))

will print the maximum and average number of files committed together in the evolution period ``[from_commit, to_commit]``. 

**Note:** differently from the other metrics below, the scope of this metrics is the evolution period rather than the single files.


It is possible to specify the dates as follows::

    from datetime import datetime
    from pydriller.metrics.process.change_set import ChangeSet
    metric = ChangeSet(path_to_repo='path/to/the/repo',
                       since=datetime(2019, 1, 1),
                       to=datetime(2019, 12, 31))
    
    maximum = metric.max()
    average = metric.avg()
    print('Maximum number of files committed together: {}'.format(maximum))
    print('Average number of files committed together: {}'.format(average))

The code above will print the maximum and average number of files committed together between the ``1st January 2019`` and ``31st December 2019``. 


Code Churn
==========

This metric measures the code churns of a file.

Depending on the parametrization, a code churn is the sum of either 
    
    (a) (added lines - removed lines) or 
    (b) (added lines + removed lines)
    
across the analyzed commits.

The class ``CodeChurn`` has four methods:

* ``count()`` to count the *total* size of code churns of a file;
* ``max()`` to count the *maximum* size of a code churn of a file;
* ``avg()`` to count the *average* size of a code churn of a file. **Note:** The average value is rounded off to the nearest integer;
* ``get_added_and_removed_lines()`` to retrieve the *exact* number of lines added and removed for each file as a tuple (added_lines, removed_lines).

For example::

    from pydriller.metrics.process.code_churn import CodeChurn
    metric = CodeChurn(path_to_repo='path/to/the/repo',
                       from_commit='from commit hash',
                       to_commit='to commit hash')
    files_count = metric.count()
    files_max = metric.max()
    files_avg = metric.avg()
    added_removed_lines = metric.get_added_and_removed_lines()
    
    print('Total code churn for each file: {}'.format(files_count))
    print('Maximum code churn for each file: {}'.format(files_max))
    print('Average code churn for each file: {}'.format(files_avg))
    print('Lines added and removed for each file: {}'.format(added_removed_lines))

will print the total, maximum, and average number of code churns for each modified file, along with the number of lines added and removed, in the evolution period ``[from_commit, to_commit]``.

The calculation variant (a) or (b) can be configured by setting the ``CodeChurn`` init parameter:

* ``add_deleted_lines_to_churn``

To retrieve the added and removed lines for each file directly, the ``get_added_and_removed_lines()`` method can be used, which returns a dictionary with file paths as keys and a tuple (added_lines, removed_lines) as values.


Commits Count
=============

This metric measures the number of commits made to a file.

The class ``CommitCount`` has one method:

* ``count()`` to count the number of commits to a file.

For example::

    from pydriller.metrics.process.commits_count import CommitsCount
    metric = CommitsCount(path_to_repo='path/to/the/repo',
                          from_commit='from commit hash',
                          to_commit='to commit hash')
    files = metric.count()
    print('Files: {}'.format(files))

will print the number of commits for each modified file in the evolution period ``[from_commit, to_commit]``. 


Contributors Count
==================

This metric measures the number of developers that contributed to a file.

The class ``ContributorsCount`` has two methods:

* ``count()`` to count the number of contributors who modified a file;
* ``count_minor()`` to count the number of *minor* contributors who modified a file, i.e., those that contributed less than 5% to the file.

For example::

    from pydriller.metrics.process.contributors_count import ContributorsCount
    metric = ContributorsCount(path_to_repo='path/to/the/repo',
                               from_commit='from commit hash',
                               to_commit='to commit hash')
    count = metric.count()
    minor = metric.count_minor()
    print('Number of contributors per file: {}'.format(count))
    print('Number of "minor" contributors per file: {}'.format(minor))

will print the number of developers that contributed to each of the modified file in the evolution period ``[from_commit, to_commit]`` and the number of developers that contributed less than 5% to each of the modified file in the evolution period ``[from_commit, to_commit]``. 


Contributors Experience
========================

This metric measures the percetage of the lines authored by the highest contributor of a file.

The class ``ContributorExperience`` has one method:

* ``count()`` to count the number of lines authored by the highest contributor of a file;

For example::

    from pydriller.metrics.process.contributors_experience import ContributorsExperience
    metric = ContributorsExperience(path_to_repo='path/to/the/repo',
                          	    from_commit='from commit hash',
                                    to_commit='to commit hash')
    files = metric.count()
    print('Files: {}'.format(files))

will print the percentage of the lines authored by the highest contributor for each of the modified file in the evolution period ``[from_commit, to_commit]``. 


Hunks Count
===========

This metric measures the number of hunks made to a file.
As a hunk is a continuous block of changes in a ``diff``, this number assesses how fragmented the commit file is (i.e. lots of changes all over the file versus one big change).

The class ``HunksCount`` has one method:

* ``count()`` to count the median number of hunks of a file.

For example::

    from pydriller.metrics.process.hunks_count import HunksCount
    metric = HunksCount(path_to_repo='path/to/the/repo',
                        from_commit='from commit hash',
                        to_commit='to commit hash')
    files = metric.count()
    print('Files: {}'.format(files))

will print the median number of hunks for each of the modified file in the evolution period ``[from_commit, to_commit]``. 


Lines Count
===========

This metric measures the number of added and removed lines in a file.
The class ``LinesCount`` has seven methods:

* ``count()`` to count the total number of added and removed lines for each modified file;
* ``count_added()``, ``max_added()`` and ``avg_added()`` to count the total, maximum and average number of added lines for each modified file;
* ``count_removed()``, ``max_removed()`` and ``avg_removed()`` to count the total, maximum and average number of removed lines for each modified file.

**Note:** The average values are rounded off to the nearest integer.

For example::

    from pydriller.metrics.process.lines_count import LinesCount
    metric = LinesCount(path_to_repo='path/to/the/repo',
                        from_commit='from commit hash',
                        to_commit='to commit hash')
    
    added_count = metric.count_added()
    added_max = metric.max_added()
    added_avg = metric.avg_added()
    print('Total lines added per file: {}'.format(added_count))
    print('Maximum lines added per file: {}'.format(added_max))
    print('Average lines added per file: {}'.format(added_avg))

will print the total, maximum and average number of lines added for each modified file in the evolution period ``[from_commit, to_commit]``. 

While::

    from pydriller.metrics.process.lines_count import LinesCount
    metric = LinesCount(path_to_repo='path/to/the/repo',
                        from_commit='from commit hash',
                        to_commit='to commit hash')
    
    removed_count = metric.count_removed()
    removed_max = metric.max_removed()
    removed_avg = metric.avg_removed()
    print('Total lines removed per file: {}'.format(removed_count))
    print('Maximum lines removed per file: {}'.format(removed_max))
    print('Average lines removed per file: {}'.format(removed_avg))

will print the total, maximum and average number of lines removed for each modified file in the evolution period ``[from_commit, to_commit]``.