Knowing When to Let Go

Joel Spolsky once referred to the act of rewriting code from scratch as “the single worst strategic mistake that any software company can make”.

In this article I aim to investigate whether this judgement applies as a sweeping statement. Are there situations where the correct choice is to start over? Is there a replacement strategy that should be adopted instead?

Motivation for a rewrite

The downward slope of code quality

Code is rarely seen as flawed at conception. Otherwise, the sensible choice would be to immediately fix any design problems — nipping any issues in the bud.
Furthermore, code does not physically degrade. If the requirements for a project stay fixed, a perfectly constructed solution would remain so throughout its lifetime.

Instead, there are many factors related to shifting scope that may cause even the greatest designs to wither. In “Programs, Life Cycles, and Laws of Software Evolution”, the author describes three classes of software. Condensed, these are:

  • Programs that are strictly specified, such as calculators for mathematical functions.
  • Programs that solve a real world problem, perhaps a machine competing against chess grandmasters.
  • Programs that automate or ease existing human activities.

The final category comprises the majority of large software projects within the commercial sector. Unfortunately, this category is also the most demanding when taking the cost of keeping a solution relevant into account.

A telling, albeit dated, statistic is presented in the paper:

“Of the total U.S. expenditure for 1977, some 70 percent was spent on program maintenance and only about 30 percent on program development. This ratio is generally accepted by the software community as characteristic of the state of the art.”

The root cause is supposedly intrinsic to the field:

“A program that is used and that is an implementation of its specification […] undergoes continual change or becomes progressively less useful. [This] continues until it is judged more cost effective to replace the system with a recreated version.”

“As an evolving program is continually changed, its complexity, reflecting deteriorating structure, increases unless work is done to maintain or reduce it.”

It is worth noting that the Extreme Programming movement attempts to counteract this final truism with its guidance of “Refactor whenever and wherever possible.”

Recognising rock-bottom

Often, a degraded design will result in application growing-pains.
There are several warning signs that programmers may recognise before considering a codebase rewrite:

Time taken for new developers to get up to speed.

A recently hired developer may be unable to produce useful code even after a period at the company. If this is due to the application’s unwieldy design, it is usually beneficial to devote resources to fixing flaws. Effort expended will be refunded in the future, as developer turnover is a stark reality.

Deployment of the code cannot be automated.

Testing-and-deployment automation can provide enormous productivity gains. Allowing developers to avoid carrying out menial tasks leaves them with more time to add product value, sidestepping unproductive context-switches.

Automation may be impossible due to assumptions made earlier on in the system’s design. Alternatively, it may become difficult due to fragmentation of system components over time.

Test suites take too long.

Alternatively, writing good tests is hard because of code interdependencies.

This flag is self descriptive and common, resulting in many developers searching the internet for quick fixes.

Arguments against a rewrite

Joel initially argues that programmers may be over-eager to begin rewriting a codebase. In his own words:

“The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming:
It’s harder to read code than to write it.”

People spend the majority of their reading time absorbing prose, able to visualise concepts at the rate that eyes scan a page.
Code is often much more concept-dense. It is easy to assume that any lack of understanding after study is due to an overcomplicated implementation.

Additionally, he believes that developers are more keen to produce new code instead of fixing existing code. Creation is often seen as more appealing:

“Programmers are, in their hearts, architects, and the first thing they want to do when they get to a site is to bulldoze the place flat and build something grand. We’re not excited by incremental renovation: tinkering, improving, planting flower beds.”

Problems caused by rewriting

Overzealous rewriting can lead to a multitude of issues, even when assuming that no new bugs are introduced.

Firstly rewriting, like most software development, will likely take longer than initially expected. This becomes more important when combined with the second issue: Rewriting adds no value to a product.

Ideally, customers should not notice that a rewrite has occurred. Clients may become disenfranchised with the lack of product feature progression. In the worst case, by the time that the rewrite is complete, many customers may have switched to a competitor’s system.

Thirdly, a subtle disadvantage to the rewrite process is that it invalidates any bug-tracking progress. There is no guarantee that any bug reports or failing test-cases produced will be useful to the maintainers of the replacement system. In an open source project, where these contributions are often made by the system’s users, the community may not be particularly pleased if hard work is casually thrown away. This sentiment is reflected by the infamous “Cascade of Attention-Deficit Teenagers” rant.

Finally, unless the system is entirely documented, it is possible that functionality may be lost. This may range from missing undocumented features, to removing hidden side-effects that customers discovered.

However, you cannot make the assumption that no new bugs will be introduced.

Joel states:

“The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they’ve been fixed.”

“It’s important to remember that when you start from scratch there is absolutely no reason to believe that you are going to do a better job than you did the first time”

Any churn within development teams will result in new programmers — that are not necessarily more experienced than the previous authors — being tasked with a rewrite. This could lead to an underwhelming rewrite of code, suffering many problems that were present before.

 Arguments for a rewrite

If the article concluded here, you would be forgiven for assuming that rewriting code is a foolish idea, spawned by a limited attention span. On the contrary, there are several benefits to starting from scratch.

Discarding troublesome code

Many companies, especially startups, have code laying forgotten, created during project’s bootstrapping phase.

This code is often written by less technically adept team-members, such as the CEO, when the growing company did not require their main vocation. In this example, the original maintainer now owns a different sector of the business. As a result, this orphaned contribution can cause political issues for the development team.

With no ownership, maintenance of the module is likely to fall behind acceptable levels. In addition, junior developers may feel intimidated when discovering the initial author via VCS.

In this case it is often a good idea to rewrite the troublesome code. The replacement can follow present-day best-practices. Responsibility for maintenance returns to the full-time development team. Politically, “We needed a new module to handle 2014 business logic.” sounds much better in a developer’s mind than telling a CEO that they should stick to their day job.

Avoiding bugs introduced during significant modification

Empirical studies have investigated the factors shaping code evolution. “An Analysis of Errors in a Reuse-Oriented Development Environment” provides reasoning supporting potential redesign.

It studies evolution through re-use of components, three types of reuse are defined:

  • Verbatim reuse: Parameters to a function differ but the original component is not modified.
  • Reuse with some modification: Altering less than 25% of the original component to provide new functionality.
  • Reuse with extensive modification: Increasing functionality, but altering more than 25% of the original component.

The paper has several interesting findings. Unsurprisingly, parametric reuse of modules is often the most successful:

“There is a clear benefit from reuse in terms of reduced error density when the reuse is verbatim or via slight modification. However, reuse through slight modification only shows about a 59% reduction in total error density, while verbatim reuse results in more than a 90% reduction compared to newly developed code.”’

Unfortunately, heavily modified components lose significant reliability:

“Reuse via extensive modification appears to yield no advantage over new code development.”

It would appear that modules requiring significant modification during maintenance or integration into new components are no more reliable than when starting from scratch. This is at odds with Joel’s praise of “battle-hardened” legacy code.

Furthermore, bugs introduced when modifying extensively are often harder to fix than those caused by rewriting:

“[Extensive modification] results in errors that typically were more difficult to isolate and correct than the errors in newly developed code. In terms of the rework due to the errors in these components, it appears that this mode of development is more costly than new development.”

When coupled with the effects of developer familiarity on bug-isolation, differences in sign-off time between rewrite at reshape can be even more significant. Starting from scratch may result in deploying well-tested code, far before the cause for any modification-induced regressions would have been discovered.

Conclusion

The examined articles suggest that rewriting code is often detrimental. Some also suggest that extensively modifying to avoid a rewrite is detrimental.

If this is the case, are we doomed to fail? Is there a better way?

I think that it is important to note the distinction between throwing away significant working code and rewriting. An incremental rewrite can improve a codebase without causing long periods of stagnation.

Developers should fix, or in worst-case: introduce, a strict interface within the software component. Tests must be utilised to verify existing behaviour and prevent regressions.
Functions within the system undergoing incremental-rewrite should be replaced gradually. Ideally, set to invoke methods of new modular implementation(s).

This process will eventually result with the troublesome component’s implementation being rewritten. Crucially, the catastrophic consequences of throwing away all of the code at once are avoided.

Once targets for incremental rewrite have been identified, the process is akin to refactoring with a few additional rules:

  • Before replacing components, always be certain that the code is bad — avoid cognitive bias.
  • No new features should be introduced. They serve only as a distraction and increase the chance of error.
  • Study the problem’s domain to avoid mistakes made during the previous implementation. Find the sweet-spot between over-complication (YAGNI) and under-specification.

Rewriting at a smaller scale will help developers avoid biting off more than they can chew.

Importantly, development on the system as a whole can carry on and the product continues to present value to its consumers.

There is no silver hammer for fixing organically-grown software behemoths. However, avoiding and all-or-nothing approach can help maintain development velocity and decrease potential risk. With reduced risk comes higher proportional payoff for exploration.
Hopefully, exploration yields a well-constructed next-generation system, perfectly suited to the task at hand — for now.