Most of those legacy system started with a clean design at the beginning as the originally problem it tried to solve was smaller and well-defined. However, as business competition and organization evolution continuously demand new enhancements/features within a relatively short timeframe, these new features will typically be implemented in a completely separated module so that existing working code won't be touched. Common functionality coded in the original system will got copied into the new module. This is a very common syndrome of what I call "reuse via copy and paste"
Here is an earlier blog about some common mindset that is causing the formation of bad code.
Basically, the idea of "not touching existing working code" encourage a "copy and paste code" culture which over time, causing a lot of code duplication across many places. Once a bugfix is developed, you need to make sure the fix is put in all the copied code. Once you enhance a feature, you need to make sure to roll it into all the copies. When there is 20 different places in the code doing the same thing (but slightly different), you start of losing visibility which takes the ultimate responsibility of certain piece of logic. Now the code is very hard to maintain and also hard to understand it.
Because you cannot understand it, so you are more scare about making changes to existing code (since they are working). This further encourage you to put new feature in a complete separated module, and further worsen the situation. The cycle repeats.
Over a period of time, the code is so unmaintainable that adding any new features takes a long time and usually breaks many places of existing code, development team doesn't feel they are productive and work in a low morale condition. In my past career, I was brought in to help on this situation.
At a high level, here are the key steps ...
1. Identify your target architecture
Define a "to-be" architecture that can support the business objectives in next 5 years. It is important to purposely ignore the current legacy system at this stage because otherwise you won't be able to think "outside the box".
Be cautious not to pass out a feeling that this exercise is going to suggest throwing the existing system away and start everything from scratch. It is important to understand that the "to-be architecture" is primarily a thought exercise for us to define our target. And we should clearly separate our "vision" from the "execution" which we shouldn't be worrying at this stage.
The long-term architecture establish a vision on where we want the ultimate architecture to be and serve as our long-term target. A core vs non-core analysis is also necessary to decide which components should be built vs buy.
It is also important to get a sense of possible changes in future and build enough flexibility into the architecture such that it can adapt to future changes when it happens. Knowing what you don't know is very important.
A top down approach is typically used to design the to-be architecture. The level of detail is determined by how well the requirements are known and how likely will they be changed in future. Since the to-be architecture mainly serve the purpose of a guiding target, I usually won't get too deep into implementation details at this stage.
The next step is to get on to the ground to understand where you are now.
2: Understand your existing system
To get quickly up to speed, my first attempt is talk to people who understand the current code base as well as the pain points. I'd also try to skim through existing documents, presentation slides to get a basic understanding of the existing architecture.
In case people who are knowledgeable about how the legacy system works still available, conducting a formal architecture review process can be a very efficient process to get start on understanding the legacy system.
In case these people has already left, a different reverse engineering process is needed.
3: Define your action plan
At this point, you have a clear picture of where you are and where you want to be. The next step is to figure out how to move from here to there. In fact this stage is the hardest because many factors needs to be taken into considerations.
- Business priorities and important milestone dates
- Risk factors and opportunity costs
- Organization skill set distribution and culture
Parallel development of a green-field project
A small team of the best developers will form an effort in parallel (alongside with the legacy system) to create the architecture from scratch. The latest, best of breed technologies will typically be used such that most of the infrastructure pieces can either be bought or fulfilled by open source technologies. The team will focus in just rewriting the core business logic. The green-field system is typically more easy to understand and more efficient.
After the green field system is sufficiently tested. The existing legacy system will be swapped out (or serve as a contingent backup). Careful planning on data migration, traffic migration is important to make sure the transition is smooth.
One problem of this approach is development cost, because now you need to maintain (within the transition period) two teams of developers working on two systems. New feature requirements may come in continuously and you may need to do the same thing twice in both systems.
Another problem is the morale of the developers who maintain the legacy system. They know the system is going away in future and so they may need to find another job. You may endup losing those people who are knowledgeable about your legacy system even faster.
Another approach is to refactor the current code base and incrementally bring them back into a good shape. This involve repartitioning of responsibilities of existing components, break down complex components or long methods.
When I run into code that I am not able to understand which may be dead code that never get exercise, or logic that is hidden in many level of indirection. What I typically do is to add trace statements into the code and rerun the system to see when this code is execute and who is calling it (by observing the stack trace). I will also put a wrapper around the code that I don't understand, shrink the wrapper's perimeter to a point that I can safely just swap out the component which I don't understand.
It is also quite common that legacy system lacks of unit test, so a fair amount of effort may need to spend in writing unit test around the components.
One problem of the refactoring approach is it is not easy to get management buy-in because they don't see any new feature coming out from the engineering effort being spent.