Legacy Code is important. Not just to us, as programmers, and the companies we happen to be working for. It has much larger implications then that. A couple of years ago I heard in a podcast that software has a 98% penetration rate on our everyday lives (I can unfortunately not find the reference now). The figure was for the northern hemisphere but the south is catching up fast. This means that our legacy as programmers, our legacy code, has an impact on pretty much every part of the society we live in. It affects us all every day of our lives. At home, at work, going in-between and pretty much everywhere else. I think we often forget this when we hack away at the task at hand. I certainly often do. We should take credit for all the great things we are able to create. But we also have to take responsibility for the bad. It is up to us to ensure that we create great software.
Because of this I think it's important to analyse how we deal with legacy code. Can we make sure that the code we leave behind does not become a burdon, a liability, but an asset? If we can, then how can we do this? And if we can't, then how can we ensure that cost is as low as possible?
Before I dig in to this subject I will try to define what I mean with legacy code. It's a term that is used by many and in many different contexts. Looking around online I find Wikipedias definition:
"Legacy code is source code that relates to a no-longer supported or manufactured operating system or other computer technology."
Another well known definition is Michael M. Feathers's from his book Working Effectively with Legacy Code:
"To me, legacy code is simply code without tests."
These two are in a way two ends of a spectra, where Wikipedias definition is rather weak compared to how we use the term in our profession and Feathers definition is rather harsh, considering that most programmers don't test drive there work and therefor create legacy as soon as they sit down at the keyboard.
There are a couple of other interesting pages on the subject that I have stepped over in my search for a good definition, one being the C2 wiki page, which I found the most inspirational, the other is /*Programmers*/ which discusses the subject in quite some detail.
My definition is:
Any part of a system, or whole system, that is hard or impossible to change when the requirements for it changes in such a way that the code needs changing to continue to serve it's users in the best possible way.
This means that code becomes legacy as soon as the context in which the code functions in changes and the code cannot keep up with the change in an easy way. What I mean with the context is anything that effects the code, this includes users requirements, regulations and the technical environment as well as anything else that affects the codes function. Easy to change is when the difficulty of change is proportionate to the perceived impact of the change. Some things are harder to change, for example a storage solution. Others are simpler, for example UI widgets or fields.
This makes all legacy code a bad legacy. That is how we use the term in our every day work. Therefor we also have to try to minimise the legacy code we create.
I have found two possible ways to deal with this. One is test first or TDD. The other is to ensure that each component, each function, is small enough and decoupled enough to be thrown away in it's entirety when it needs changing. I'll look closer at each in turn.
Test first or TDD is a way to work with the low level code as well as with higher level requirements by writing tests first. We have all heard the debate about this. I think this way of working is the only way to ensure that all aspects of importance in the code are tested in such a way that we can ensure no regression happens when we change them. This is, as far as I know, the only way that we can keep a system agile and easy to change. In other words the only way that we can keep a system from becoming legacy. Higher level tests are bound to requirements, lower level tests are bound to implementation on a class and method level.
What I have also found is that in such cases, sometimes when the tests are to hard to change to fit new requirements it is possible to throw them and the accompanying code away. The tests provides the documentation needed to improve the subsystem to work according to the new and old requirements. And the high level tests will catch functional regression. For this to work properly everyone has to realise that all code is design, every single line, and that everyone is responsible for the design.
I know that not everyone think they can do Test first. And perhaps it is not required as long as all requirements are covered in functional, BDD style, tests. In my experience as a Test first person this is not true but perhaps I fail on the more general level since I have the safety net further down and closer to the implementation.
The other way to avoid costly legacy is to make sure that each piece of legacy is so small that it is disposable when requirements change. I realised this quite recently when understanding Micro Service Architectures. The way I understand this is that the concept of a "class" is pretty much a service. Sure, the classes we normally create are much smaller but the higher level classes that we aggregate to provide a function of the system, the "public" interface, can be defines as a service. They are then bound together using a message or service buss instead of the normal object messaging used in classic class oriented systems. This means that it's an external orchestrator that provides the system of functions to the outside world. Tests are still essential and I believe they should be written first. But test in this context are functional tests which verifies the services functions as provided to the buss, and other services. Here BDD and similar methods are excellent. And the boundaries are made very clear. It also means that one service can be written in one language and another in another language. When a service needs to change it is quite possible to throw it away and create a new one. Chances are it's even better when reworked a second or third time.
The risk with this approach is of cause that the orchestration becomes so complex that the legacy is introduced on higher level instead. It is however often easier to mange complexity on higher levels then further down.
There are of cause situations when code does not have to be all squeaky clean. PoCs, pilots and other early incarnations of systems are often put together in a hurry and can often be thrown away as a learning experience. The problem with legacy arises when we risk loosing important functionality. So here it is our responsibility not to give in to pressure from others to let system with to much and badly controlled technical debt go live. This is not easy but I think we have to in order to build a reputation as professionals.
I hope to get your comments on the subject so please share your thoughts below or in your own blog with a ping back.