Monthly Archives: November 2011

A couple of years back I wrote an article exploring a way to apply Robert C. Martins guidelines on how to create clean functions, taken from his book Clean Code. This created quite a stir on DZone where I also published it.

A comment from Steven Jeuris about a week ago on my blog made me want to revisit the subject again.

Since I wrote the original article I have leant a lot about creating clean functions that does one thing. And as I replied in to his comment I would not design the code in the original article the same way now.

The key to applying the guidelines in the book is pragmatism. There are a number of benefits from working as is suggested there. I will cover some of them below. But it cannot be followed religiously. It is after all software we are creating, not religion.

Steven Jeuris writes, in his blog on the subject, that a perfectly good way to make code in a function readable is to create blocks with comments. This to me is an anti pattern. There are few reasons to ever create blocks in a function, if they are required it generally indicates that the function is to long. Comments are almost always a bad idea. If a comment is required the block should be put within a function bearing a name that explains what the block does.

Steven Jeuris further writes that new small functions litters the name space of the class. I can see where he is coming from but again I have to disagree. If the name space of a class is littered by explanatory function names then it sounds like the class is to big and needs to be refactored. To me it sounds very much like it does more then one thing.

There is also an interesting side effect to creating smaller more atomic functions that does one thing. It is easier to find commonalities between the other, higher level, functions within the same class, or even related classes. This actually makes it easier to keep the code DRY (Don't Repeat Yourself).

Another very important factor to keep in mind when working like this is what patterns the rest of the team are used to. The size of functions and the way to work with them are not a one size fits all. If a team is used to procedural code the step to extract till you drop style functions will be really confusing. If, on the other hand, the team is used to atomic, do one thing only, functions it is hell to work with larger functions littered with comments and artificial blocks.

In any case, I know that if I would do the same exercise to day as I did when writing my original article on the subject it would look different, and the original definitely takes a good thing way to far.

I have recently finished designing a REST protocol. There are a few things that needs careful consideration when exposing a domain through such an interface. I will gather some of the conclusions here for future reference.

The first thing to keep in mind is what a HTTP based REST protocol is and what it is not. It is not a domain, it's a protocol. It is easy to forget when modelling the domain that it's representation is only a snapshot view onto it. The REST protocol provides this snapshot view. This makes a tiered architecture fitting, with clear boundaries between presentation, I.E the REST API, domain, and then any other technical aspects such as data storage.

Next thing to keep in mind is: What is REST over HTTP? It's a way to access resources, or documents. That's what we do when using the web. We don't access domain objects. Remembering this helps when thinking about the interface as well. Most of the time when we create enterprise software we create abstract documents representing real life object. For example, when creating an e-commerce system that sells shoes it's not the shoes that are transitioned through the work flow. It's documents that represent the shoes or the whereabouts of them. With this frame of mind it's much simpler to define the REST interface onto the domain.

Then there are the three levels of REST. Resources, verbs and hyper media controls. As described in this article, by Martin Fowler, about the Richardson Maturity Model.

The first level is resources. This is where the document thinking comes in. When thinking about the domain model it is usually pretty simple to see how different states of it can map onto a document representation. This becomes a resource. It could be the result of an operation just as well as the state of an object.

The second level are the HTTP verbs. I used GET, PUT, POST and DELETE in my API.

GET will get the representation of a resource. A key aspect with get is that it can be called any number of times with the exact same result. It has not side effects.

POST and PUT are used to change things. POST is used when a resource is created or modified in such a manner that the resource path changes. One obvious use is when creating a new object that can later be addressed using it's ID. This returns a response of 201, Created with a location header that points to where the new resource is located. POST is also used if a resource is changed in such a manner that it's path changes.

An example is if you have a set of products that are accessed based on a category name. When the category name changes the product paths will also change. This kind of side effect requires a POST rather then a PUT. The response is either 200, if it contains a body, or 204 if it does not. In either case the location header needs to be present to point the client to the new location of the resource.

POST is also used for submitting data to a computation. If the API displays a currency converter I would use POST as the method with the amount, the from and the to currencies in the body. Here the response code is 200, the result in the body but it no location header is provided.

PUT is used to modify a resource. What is important here is that resource paths are not allowed to change as a side effect of the modification. It can expose ways to change properties on objects or replacing a whole object as long as the path does not change. Response is usually 204, but if a response body is retuned it's 200.

DELETE is simply used to remove a resource. Response is almost exclusively 204 since a removed resource have little in the form of content to offer.

The third level is hypermedia controls also referred to as HATEOAS (Hypermedia as the Engine of Application State). Hypermedia controls are links to actions that can be performed on the document. My thinking on what links to provide is to point forward from the current state. The client will be responsible for keeping track of it's history as well as weather that history is valid traversable. History validity is based on expire headers so don't forget to provide them in each response.

When defining links I have used the following JSON syntax:

This exposes what actions are available on the current resource in an easy manner, using the HTTP verb/method as the key and the path as the value.

I am sure I will have good reason to change and rethink the above above points as I design other APIs in the future but for this application it has worked really well. And each of the above points are important to think about when defining a REST API, weather the conclusions are the same or not.

In the project I am currently working on I have a lot of code to write. It's a green field project delivering a new platform and I have no frameworks to borrow from or extend. So I have to write the code for each single feature by hand.

I am not complaining. I like writing code. But it takes time, especially since it's only me on the project at the moment.

Thankfully it's all written in Python so it's not to verbose, though even Python is verbose when you have to write this amount of repetitive code.

Which is how we come to the value of clean code. The code is slowly transforming by refactoring and transformations. The effect of this is that there is no repetition in production code, it is completely DRY. In tests there is little repetition, this to make the tests document the production code.

The effect of this is that although the upfront cost, before any abstractions had been created, was slightly higher the cost of adding new features now is very small. To create a function that initially required 40 lines of code excluding tests now requires 5 to 10 lines. The test abstractions are so easy to use that it takes next to no time even though some repetition (sequentially calling the same methods in each test method to explain the steps required to use the function) is needed.

The conclusion to this is that the returns of clean code is rather immediate. Many seems to think that the returns are slow and far in the future. I know this to be the opposite. The returns are almost immediate and are currently cutting development time radically. And that is on a code base which has only a couple of thousand lines of production code.