Thursday, October 25, 2007

Understand Legacy Code

The first step to support Legacy System is to understand the code. One approach is to read through all the lines of code. Quite often, this is quite ineffective, you may be wasting a lot of time reading dead code.

What I typically do is quickly scan through the package structure of the code, the class name and public methods. If the code structure is good, then I can get some rough idea about where the logic lies. However, I rely more on analyzing the run time behavior of the code to determine its logic flow.

A technique that I use frequently is to just run the system and determine the execution path. At the first step, I need to determine the entry points of the system.

Locate the entry points

Usually, the entry points are ...
  • main() method where the system boots
  • background maintenance tasks
  • event-driven callbacks
Finding out these entry points is non-trivial, especially if the documentation of the overall system is poor. I have built a tool that allows me to instrument those classes that I am interested so that when I start the system, it will print out all the entry points when they execute. Entry point is the lowest method of the stack trace that you are interested. The tool is based on AspectJ which allows me to inject some print statements when certain methods are executed.

After I get an idea about where the entry points are, the next step is to drill down into each execution point to study what it is doing.

Drill down into entry points

The first thing I usually do is to read through the code trying to understand what it is doing. Eclipse IDE is very effective to navigate through the code to see which method is calling which method. The most frequently used Eclipse feature is ...
  • "open declaration" which jumps to the method being called
  • "references" which list all the methods that is calling this methods
  • "Calling hierachy" which is a tree of callers which calls this method
  • "Type hierachy" which is a tree of Class hierachy
After that, I will add trace statements into the code to verify my understanding. Consider Trace statement is an enhanced logging mechanism which include the thread id, the calling stack trace and a message. Trace statements is temporary and its purposes is help me to understanding the code. After that, I will remove my trace statement. Since I don't want anyone to use the trace statements in the production environment, I wrote a "Trace" class so that I can easily remove it when I am done.

Document the understanding

Draw a UML diagram to describe my understanding. And also take notes about any observed issues and what can be improved.

No comments: