Keeping your Python code DRY using Clone Digger

DRY (as in Don't Repeat Yourself) is one of the corner stones of a clean and agile code. This is especially so with the production code but also with test code where finding good abstractions will greatly increase speed and lower maintenance.

When working with python I have found a very nice tool called Clone Digger that helps finding repetitions in code. It is easy to set up and work with and the report is clear and easy to read.

Install it with pip:

pip install clonedigger

When installed it is automatically added to the tool chain. To run it use the following command:

clonedigger package

Package is the directory where the python code is to be found. Clone digger takes a list of packages separated with space. --help will show help on options and configuration.

Clone digger will work it's way through all files in each package and compare them, both within each package and across package listed in the same invocation. If there is code, such as auto generated code, that is not modifiable, it needs to be removed from the packages added in order for the tool not to analyse them.

It will generate a report looking like the one below to output.html (this is configurable).

Clone Digger output

Working with tools like this makes it easy to spot where new abstractions should be added to remove code duplication. Both in test and production code. Where ever repetition is found which does not fill an important documenting function (which is sometimes the case in tests) it is easy to spot where and how a method should be added to pull the functionality together. It will also help in naming such methods. There are of cause some false positives. Especially in the test code where method bodies are often descriptive rather then dry. But even here it helps to find common assertions which can be pulled up and made into new descriptive methods. This becomes especially valuable when working with given-when-then in tests.