Clean code

A couple of years back I wrote an article exploring a way to apply Robert C. Martins guidelines on how to create clean functions, taken from his book Clean Code. This created quite a stir on DZone where I also published it.

A comment from Steven Jeuris about a week ago on my blog made me want to revisit the subject again.

Since I wrote the original article I have leant a lot about creating clean functions that does one thing. And as I replied in to his comment I would not design the code in the original article the same way now.

The key to applying the guidelines in the book is pragmatism. There are a number of benefits from working as is suggested there. I will cover some of them below. But it cannot be followed religiously. It is after all software we are creating, not religion.

Steven Jeuris writes, in his blog on the subject, that a perfectly good way to make code in a function readable is to create blocks with comments. This to me is an anti pattern. There are few reasons to ever create blocks in a function, if they are required it generally indicates that the function is to long. Comments are almost always a bad idea. If a comment is required the block should be put within a function bearing a name that explains what the block does.

Steven Jeuris further writes that new small functions litters the name space of the class. I can see where he is coming from but again I have to disagree. If the name space of a class is littered by explanatory function names then it sounds like the class is to big and needs to be refactored. To me it sounds very much like it does more then one thing.

There is also an interesting side effect to creating smaller more atomic functions that does one thing. It is easier to find commonalities between the other, higher level, functions within the same class, or even related classes. This actually makes it easier to keep the code DRY (Don't Repeat Yourself).

Another very important factor to keep in mind when working like this is what patterns the rest of the team are used to. The size of functions and the way to work with them are not a one size fits all. If a team is used to procedural code the step to extract till you drop style functions will be really confusing. If, on the other hand, the team is used to atomic, do one thing only, functions it is hell to work with larger functions littered with comments and artificial blocks.

In any case, I know that if I would do the same exercise to day as I did when writing my original article on the subject it would look different, and the original definitely takes a good thing way to far.

In the project I am currently working on I have a lot of code to write. It's a green field project delivering a new platform and I have no frameworks to borrow from or extend. So I have to write the code for each single feature by hand.

I am not complaining. I like writing code. But it takes time, especially since it's only me on the project at the moment.

Thankfully it's all written in Python so it's not to verbose, though even Python is verbose when you have to write this amount of repetitive code.

Which is how we come to the value of clean code. The code is slowly transforming by refactoring and transformations. The effect of this is that there is no repetition in production code, it is completely DRY. In tests there is little repetition, this to make the tests document the production code.

The effect of this is that although the upfront cost, before any abstractions had been created, was slightly higher the cost of adding new features now is very small. To create a function that initially required 40 lines of code excluding tests now requires 5 to 10 lines. The test abstractions are so easy to use that it takes next to no time even though some repetition (sequentially calling the same methods in each test method to explain the steps required to use the function) is needed.

The conclusion to this is that the returns of clean code is rather immediate. Many seems to think that the returns are slow and far in the future. I know this to be the opposite. The returns are almost immediate and are currently cutting development time radically. And that is on a code base which has only a couple of thousand lines of production code.

Most of us have at least heard about the DRY principle. But how is it applied in a pragmatic way in software development?

I will take you though my thoughts as I work with an application to illustrate how, when and why I apply it as well as where it may be better not to.

The code is in python. It is not a fully working application since it isn't open source.

The example application is a simple stock management application. It has a warehouses. The warehouse has sections and each section contains SKUs (Stock Keeping Units). It also has a catalogue with categories and products. Each product is bound to one or more SKUs.

I am using SQLAlchemy as ORM and this also maintains the database schema using declarative_base. In the tests I am using fixture to create data fixtures that can be injected to the database when executing the tests. Since the application is so simple SQLite is a sufficient database engine.

The first requirement is to create a function that will find a warehouse using a warehouse ID. The first step is to start the database and inject data into it. This is done in the setUp() method in the test like so:

The WarehouseFixture and SectionFixture are fixture.DataSet classes, Warehouse and Section are domain classes.

This makes it possible to query data through the DOA and then assert the results using the fixtures.

After completing the other tiers needed in the application to expose this as a JSon front end I need to create some functional tests. They will need to inject the data into the database in the same way that the DAO test did. Then the functional tests will validate the JSon response from the server using the fixtures.

I could, and I know some would, C&P the function calls above into a new setUp() method in the functional test. But doing so will disconnect the fixtures and DAO tests in the functional tests with the tests used in the DAO. Since the DAO tests and fixtures will evolve with the data model and database we don't want the functional tests lose that connection. Instead I pull up some functions from the DAO test case into stand alone functions. This can then be used from the functional tests as well as the DAO tests.

We now have generic functions that can be called in different, unrelated, tests. And the code is DRY.

Next it is time to add some new DAO tests. This DAO will be in a new module which handles the catalogue. Since the previous refactoring did not require decoupling from the warehouse, both DAO and functional tests test the warehouse, it took place within the warehouse dataaccess test module. Since this will create a new decoupled module the test modules should not be coupled. To enable this we create a utility module called data_fixtures with a new class called DataFixtureTestCase like so:

Since python supports multiple inheritance it is possible to subclass this class and AsyncHTTPTestCase when creating functional tests. In the test cases setUp() function setupDatabase() is called to set up the database and teardownDatabase() is called in tearDown() to tear it down. The setupDatabase() also ensures that the required properties have been set before setting up the database.

To keep the connection from functional tests to dataaccess the functional test module imports the engine, env and fixtures properties from dataaccess.

The above refactorings helps keep the code DRY whilst easy to understand. It also creates a place where common assertions bound to data fixtures can live.

It should be noted that when working with test DRY is not always your friend. If the tests loses any of it's expressiveness as "executing documentation" (tests is the first stop for understanding production code) expressiveness wins any day. This is actually reversed from production code which must be kept DRY at all times.

I have recently been working on a e-commerce project. We needed to create an extension to a third party e-commerce application. The source code is open to customers and partners. It is a pretty comprehensive and complex application.

The e-commerce domain is a pretty complex domain. With a system that is designed for a global market with all national tax rules, shipping rules and other variations and also the requirement to adjust prices according to customers and running campaigns makes for a very complex system. On top of this it needs to integrate with and in large parts understand the logistics and stock keeping domain, provide customer service interfaces and all other things that comes with running a reputable global web store.

When I started planning for the extension I read the documentation and the JavaDocs and looked at the classes and database schemas. This gave me some initial ideas that the code would be domain driven. They have borrowed some of the patterns from domain driven design. It looked promising.

As I dug deeper I started looking at the tests. In my experience a well crafted system can often be understood by reading the tests. Unfortunately this was not the case. There were not many tests and the tests that were there did not provide me with the information I needed.

The next step was then to start on the first of my user stories. To retrieve a product catalogue and return it to my client. Looking at the domain this seemed pretty simple. I could clearly see which objects I needed to work with. But I could not find any repositories, factories, or services that would give me access to the objects.

I talked with my contact at the third party vendor. I specifically asked for repositories, factories and services. I also mentioned that I had got the impression that the application had a domain driven design. He talked with some of the developers and came back to me sounding a bit embarrassed. He had asked about domain driven design. The development team had told him that well, in part, perhaps, a little. It was kind of work in progress. No I would have to use their service tier. And to understand how the service tier worked would have to look at their controller tier. And to understand the controllers I had to look at the view beans. And then there was all the configuration hidden in the spring XML. They also said that it may prove quite time consuming and difficult to do what I intended to do. This was due to most of the logic actually being part of the view. Which was what I needed to replace. The system wasn't really designed to do what I had set out to do.

Ouch. It was a rather awkward moment for my contact at the vendor. He didn't like to give me this message. He is a pretty technical guy and he knows when the code sucks.

I spent the next three weeks just untangling how they put together their request from the view to the search server so that I could retrieve products matching a product catalogue. It took me a disproportionate amount of time just to figure out how they configured the database settings.

So how come I am telling this story?

With the knowledge and understanding I now have of the code base I can read the history of this application. I can clearly see the different struggles they have had from when they set out to nock up a quick web store in Java. How they made the decision to sacrifice code cleanliness and testing to maximise speed. How they had to scramble to meet new customer feature requests never quite having time to tidy up the mess. At some point some time was invested in cleaning up some of the mess and release the source code to customers. This to allow customers and partners to make modification that would not be supported by the vendor.

This is not what I should see when working with a code base. This is not what the code was written for. It was written to solve the problems with online shopping on a global scale. When I read the code that is what it should tell me. How they solved that problem.

My guess is that no one in the team that started this project knew quite what they sacrificed. They didn't have the experience of a well working clean code base. They had not worked with the confidence that a test driven, automated code base gives.

This is unfortunate. What if one of the initial senior architects/developers would have pushed through clean test driven code? I think a lot would have changed. As the system grew more complex and it became apparent that there were to much logic in the view the team would gradually implement domain driven design. With a full regression test suite to support the change the risk would be minimal. Further to this it could be done in verticals that would have made sense to the reader. As new architectural challenges were encountered, for example scalability, the system would be robust and tested enough to implement it in the best possible way with easy and next to no risk.

For me it would have been easy to understand what I was looking at. For my contact at the vendor, and probably the developers replaying to my questions, they could proudly have said that yes, it's a fully tested, domain driven architecture. If you look here and here you will find the repositories. And why don't you read through test suite x, y and z to see how we use them in out view tier. Perhaps I wouldn't even have had to ask.

So my lesson from this story is that the clean code and the tests were needed from the start. It should have been there in order for the code base to evolve. And it would have saved money and time. For them and for me. From the start.

When writing my article AMQP Hello World as an Exploratory test I realised that the test is written in a different way then many tests I read. I felt that I should explain why I choose a different design when writing tests since I think this would benefit many projects.

The code below is taken from the example provided in the AMQP article and can be found at my github repository.

My main driver when writing tests is readability. A test should read pretty much like a well written requirement specification. It will detail the requirements for set up before testing can start. It will then details the given state of the application, moving on to the state when something happens and then verify that expectations were met. Before finishing the application needs to exit gracefully.

When I work with tests in Java I tend to use JUnit 4. JUnit 4 does not need naming conventions, such as setUp, testXyz and tearDown. This is all driven with annotations, which frees up the name space and allows the developer to instead name the methods according to what they do.

Below is the set up method for my test:

The method name clearly states what the method does. The annotation clearly states which role the method plays in the test life cycle. This makes the test code easier to read and debug.

When working with the actual test methods I use names that clearly identifies what the test verifies. Below is a test methods from the example. The method name points out what the method tests. The content is then written in a given-when-then pattern to detail the steps that are required.

This way of working has a couple of side effects that are quite useful. When adding more tests, either as new test methods or as new test classes, it is easy to spot similarities which can be shared in between tests. This in turn helps to enable the creation of a domain language for the tests of the application. It is important though not to forget that tests should first and foremost be readable. Creating to much abstraction makes it harder to follow the test code.

The given-when-then methods clearly defines the steps taken to perform the tests but does not slow the reader down with large amounts of implementation details. This makes the tests a good source of documentation. It further more makes it very easy to see if a test failed due the state being changed before or during the operation that is under test.

The tear down method is, just as the set up method named according to what it does rather then its function in the test life cycle. Its function in the test cycle is defined by its annotation.

I find that this style of test design makes the tests much easier to read, maintain and understand. They also work much better as documentation when trying to understand generalised well written code.