Productively operating on legacy code

10 min readJul 5, 2022

What is legacy code? Is this the code which we changed couple of months ago, or is it the part of the code where nobody has the ambition to go? There is no definite answer, but one thing is sure, software changes daily! There are several reasons why: adding a feature, fixing bug, improving the design or optimizing resource usage. These changes influence either structure, functionality or resource usage.

It's easy to go down the "if it works, don't touch it" path, but this is not the right attitude. If you want to have a good, easy to extend and adapt system, you need to embrace the system as is, and tackle all the problems nobody else wants. In order to make your life easier you need to have tests which will help you on this journey. If there are no tests, this is the best time to start writing them! Nevertheless, when you need to make a change in the codebase you should follow these simple steps:

identify change points
find test cases
break dependencies
write tests
make changes and refactor

Breaking changes

Depending on the situation, making a simple change can vary. The main factor you need to consider when making changes is understanding of the code. Beside this, you should also consider the time between a change made and the real feedback about the change (lag time). The faster the feedback is the more rapid the change process is.

Dependencies can be problematic, but we can break them. In object-oriented programming we can break dependencies by introducing an interface. When we introduce more interfaces to break dependencies, rebuild time can go up slightly, but the benefit of extraction can be extraordinary.

To break dependencies the best collaborators would be fakes (stubs and mocks). Besides this “logical” helpers you also have “physical” helpers at hand: xUnit, IDEs, test coverage…

Test-driven development (TDD)

Adding a new feature to the legacy system might look intimidating, but with the test in place you can do this flawlessly. TDD is the most powerful feature-addition technique. The algorithm looks like this:

write failing test
get it to compile
make it pass
remove duplication
repeat

Note that TDD is not tied only to object-oriented programming, and it should be used as much as possible as it gives you confidence that something is working as expected even if you don't understand the code completely or you add a new feature without breaking the existing functionality.

Programming by difference

This technique allows us to make changes quickly, and we can use test to move to a cleaner design. The algorithm looks like this:

Add new behavior (add new method to derived class)
Replace behavior (override base class method)
Re-use behavior (do nothing, defaulting to inherited behavior)
Only changes in behavior require changes in program code
is_a relationship

Although this technique breaks Liskov substitution principle, it’s good to think about how far classes are from normalized form every once in a while. Programming by difference lets you introduce variations quickly. When it’s done, we can use our tests to pin down the new behavior and move to more appropriate structure(s) when we need to.

There are multiple different techniques which can help us when we have to make a change. Let's look at them one by one.

Sprout method

When we need to add a feature to the system and we have identified it as a new code, write the code in a new method. Call it from the places where this new functionality needs to reside.

The advantage of it is you got your new code under test, the disadvantages is that you are not working on getting the entire code in test due to hard to break dependency or other reason that you may have and you might just not get it under test at all.

Sprout class

When you need to make changes to a class, but there is no way that you are going to be able to create objects of that class in a test in a reasonable amount of time, the best way would be to create another class to hold changes and use it from the source class.

The main advantage is that it allows you to move forward with the work with the confidence that there are no invasive changes done. The disadvantage is conceptual complexity.

Wrap method

There is additional possibility to add behavior which can be useful as well. Wrap method is a great way to introduce seams (places in your code where you can plug in different functionality) while adding new feature.

Advantages are: getting new tested functionality into application when we can't easily write tests for the calling code, it explicitly makes new functionality independent of existing functionality. On the contrary it can lead to poor names.

Wrap class

By using wrap class we can add behavior to a system without adding it to an existing class. This technique is called decorator pattern.

Characterization tests

There are two ways to find out what a method does: (a) dig into documentation and system requirements and (b) write tests and figure out based on the result. The first is time consuming, and also might lead to different understanding as not all of the requirements are preserved. The second approach can be use to characterize the actual behavior of code. These tests can be used as supporting tests for better understanding, but can also be used as real tests to validate system behavior, once we actually understand what the method is supposed to do.

The algorithm for writing this kind of test goes like this:

use a method which we want to understand
write an assertion that you know it will fail
let the failure tell you what is the correct behavior
change the test so that it expects the behavior code produces
repeat

Characterization classes

This approach is extension of previous one where we want to figure out on the class level.

Skin and wrap API

If your application is all API calls, then we can use a technique to make interfaces that mirror the API as close as possible and then create wrappers around API. To minimize the chances of mistakes, it's advised to preserve signatures. The end result is: wrappers can delegate to the real API in production code and to fakes during testing.

Responsibility-based extraction

With the help of refactoring tools we can use this technique to identify responsibilities in the code and extract method(s) for them. This approach can be applied to API hell as well, when API is more complicated.

Scratch refactoring

One of the best way to learn how legacy code works is to do some refactoring: extract methods, move things around, do whatever you want. This is a throwaway code which is helping you to understand what a method or a class do, and there is no need to write tests for it. But this approach has risk on its own: it may lead you to think that the system is doing something that it isn't, so be aware!

Sensing variable

When we are refactoring we want to preserve current behavior, but it does not mean that we cannot add any code. Sometimes it helps to add a variable to a class and use it to sense a condition(s) in method which we want to refactor. Once we are done with refactoring, we can get rid of the (introduced) variable.

Preserve signatures

Refactoring is an error-prone process, because during it we can: misspell things, use wrong data type, using one variable instead of another… (just to name a few). In order to make fewer mistakes, we can use cut/copy and paste, to break dependencies, keeping the entire method signature(s). Then we can work on the newly extracted method(s).

Lean on the compiler

The primary purpose of the compiler is to translate source code (written in any programming language) into another form. In statically typed languages, compiler can help even more. You can use its type checking to identify changes you want to make. The benefit of this technique is that you are letting the compiler guide you toward the changes you need to make.

Adapt parameter

Sometimes it can be hard to create the parameter for the method under test, and the outcome of the test relies of the "state" of parameter passed. We can do it in two ways: by extracting interface or by adapting parameter. By adapting parameter we are introducing a new interface that will wrap the parameter, or more specifically the behavior which we need. Then we can use it to update production code and add tests for it (with fakes).

Break out method object

Long methods are hard to work with. The simple idea to make it easier for test and maintenance is to move long method to a new class. Local variables in the old method can become instance variables in the new class, which often makes it easier to break dependencies and move code to cleaner state.

Encapsulate global references

This technique is extremely powerful when you are trying to break problematic dependencies on globals. The process is simple:

identify globals for encapsulation
create a class with globals moved to it
comment out original declaration of globals and declare a global instance of the new class
exchange unresolved references with the name of global instance

Expose static method

Sometimes we are dealing with the classes that cannot be instantiated in test, but there are some methods which does not use instance data or methods, we can turn those methods into static methods ad get them under test.

Extract and override call

Sporadically, there might be dependencies which are localized, but are getting in the way of testing. We can break that dependency, by extracting it, and prevent side effects in our tests. The steps we need to take are:

identification and creation of a substitute method
replacing the call to a new method

Extract and override factory method

Hard-coded initialization working in constructor can be very hard to work around in testing. If the creation was somewhere else, we can introduce some separations more easily. One of the options is extraction and overriding factory method, which takes care of extracting object creation sequence into a factory method (of the object).

Extract and override getter

Extract and override factory method has one flow, it does not work if it's not available (e.g. C++). In C++ you cannot call a virtual function in a derived class from base class's constructor. The idea on overcoming this issue is to introduce a getter for the instance variable that you want to replace with a fake object. Then refactor the code to use this getter everywhere, and also subclass the class and override getter to provide fake object for testing.

Extract interface

This technique is one of the safest techniques for dependency breaking, as it is supported by the compiler. The steps are pretty straightforward:

create new interface
make the class implement the newly added interface
exchange the class with interface
introduce methods in the interface wherever there is a compiler warning

Extract implementer

Extract interface is a handy technique, but it comes with the downside, which is naming. There is an opposite way, instead of extracting superclass/interface we can go the opposite way: extract subclass/implementer. Then we can turn the source class into an interface and move all non-public methods and variables to implementer.

Introduce instance delegator

If there are static methods in the project, chances are that you can work with them without major problems, unless they contain something that is difficult to depend on it a test. To overcome this problem is to replace the static call with the method call on an object, where static call is called from method call.

Introduce static setter

How to test global mutable data? How to test singleton? These things are hard to get under test, but there are some ways. The best would be not to have globals and/or singletons, but if there is an explicit need you can use the following approach:

for globals: you can replace them with an object if they are sitting outside a class or are purely public static variables
for singletons: add a static setter (which will be used to replace the instance), and make constructor protected. Then you can create a subclass of singleton, create fresh object, and pass it to the setter

Parametrize constructor

There are cases where dependency is hidden inside constructor. This situation is really hard to test, as dependency is tightly coupled inside the body. The best way to do resolve this situation is to externalize the dependency by passing it into the constructor.

Construction of object with annoying parameter

The best way to see if there are some constructor parameters which cause side effects upon object creation is to create a test which calls no-args constructor. If this is not possible, then use the constructor with the least number of parameters and set them all to null. By doing so we can inspect possible side effects. If we identified a tightly coupled or heavy weight object, we can introduce an interface and create fake test class to facilitate this.

Parametrize method

This technique is a counterpart of parametrize constructor, where extraction of the object created inside a method is passed as an argument.

Pull up feature

Sometimes you need to work with a cluster of methods which don't (in-)directly reference any of the bad dependencies. In this situation, this cluster of methods can be abstracted to superclass.

Push down dependency

Some classes have only few problematic dependencies, which can be removed by separating it from the rest of the code. In order to do this we make the class abstract and push down problematic things into a newly created class which extends this, now abstract, class.

Subclass and override method

This is one of the core techniques for breaking dependencies in object-oriented programs. The idea behind this approach is to use inheritance in the context of a test to nullify the behavior that you don't care about, or get access to behavior that you do care about.

To subclass and override method, do the following:

identified the smallest set of methods you want to separate
make each of the methods overridable and adjust their visibility
create a new subclass that overrides the methods

Supersede instance variable

Object creation in constructor can be problematic, but in most cases we can use some of the methods listed above. However, in some languages (e.g. C++), this is not possible. The steps look like this:

identify the instance variable you want to supersede
create new supersede method (e.g. supersedeInstanceA, where InstanceA is the name of the variable you want to supersede)
in this method write the code to destroy previous instance and set the new value

Extract method

The idea behind this refactoring is to systematically break up large existing methods into smaller ones. When this is done, our code becomes easier to understand

Other methods which can help you significantly:

deletion of unused code
notes and/or sketching
explaining method/class/part of a system to a colleague and checking your mutual understanding
using unified naming
using heuristics to distinguish responsibilities (grouping methods, check for hidden methods, internal relationships, primary responsibility…)
pair programming