Few steps away from disaster

 

You are in the office, it is 8 pm, Friday…

…your screen glares with stormy Jenkins weather reports, you nervously observe another build failing for the nth time. Tests are continuously failing, their descriptions are vague, BVT needs another 3 hours to complete. Most of the APIs were hurriedly documented, there are gaps, some are just copy-pastes, there is rarely a meaningful comment in the code. The guy who did the latest Git merge between branches you are working on screwed some work and made the commit history even less readable; another guy, original owner of the troubling module quit the job few months ago and you are struggling to find a single piece of information consistently describing how it works.

What seemed like an implementation of a simple feature, actually turned out to be more like awakening a monster. A monster made of aggregated technical debt; bits and pieces of poor software engineering decisions that initially appeared working, until today.

And now, you are likely to spend an entire weekend coding through this mess. This might end in two ways… creation of another sloppy, superficial patch and deepening the technical debt even further, or, a Monday disaster. A meeting, where you, your team-mates and managers eventually acknowledge the lamentable state of the project. All try to approximate the impact on deadlines, work out an approach to solve the issue and if it is not already too late – after some time – the project might get back on track, if it is, the project dies.

I have been fortunate to work for one of the biggest global software companies as a software engineer, experiencing the pros and cons of a work in the corporate environment. All I write is based on my experiences and observations, and while the above story is slightly exaggerated to draw your attention, it touches few important aspects of a software engineering process. Aspects that often lack this attention and seem to disappear within a bag of “more serious” development activities, but ignored, can build up over time and bring any project closer to the above scenario.

Software documentation

Not writing a documentation is obviously bad, however, writing it can be done in numerous bad ways.

First of all, poor content quality. When commenting code or writing a documentation is widely considered a boring duty, it is often not paid proper attention throughout the development cycle. It is common to see it postponed until the very last minute of a project and then written hastily together with other clean-up activities. When as little as possible effort is put into this process, the only thing we can expect is rather inconsistent, flawed and unhelpful documentation. If such a habit exists within a development team and is silently accepted, all the subsequent efforts of developers to incorporate the poorly documented codebase in their future work will lead to frustration. Out of this frustration can hardly be born an incentive to improve the situation.

Developers also tend to initially underestimate the value of documentation when they have another co-creators around. They can easily obtain required explanation from them, however, situation gets much more complicated when these co-creators leave the company or change teams. I have seen people who quit the job being desperately asked for last-minute documentation contributions in their ultimate week of work. Well, it certainly was not of the highest quality 😉 Another problem appeared when someone changed teams and former teammates were forced to fight for his time with the new management. It was a cause of many delays and clashes.

Another important aspect of documenting software is the way we store the documentation. While this issue does not touch code comments, it is really important to have a consistent documentation storage policy within a project. I have seen a large, mature project documented using:

  • files stored on lab’s FTP server,
  • email conversations,
  • database entries within a collaboration suite server,
  • wiki articles,
  • plan items and tasks within a software life-cycle management platform.

Years pass and some storage methods go offline, there are various migrations and people responsible for maintaining the content change. While having a primary design document accessible was required by the company policy, all the subsequent improvements, fixed bugs and lots of tiny tools used somewhere in the process were documented throughout all these places. This poses a real threat to the project. When daily development routines require you to travel the office in search for people “who know where to look” and to dig through various sources putting the pieces together in order to gather any meaningful documentation, this is a real danger. The project grows, matures and there is a rotation of people. The lack of consistent, long-term and persistent storage policy for the documentation might result in a completely unmaintainable project where every development effort is a pain and poses a threat on stability of the platform.

Software testing

Once again, not writing test suites is bad, but using them in a light-minded way might be dangerous.

When software development work is often connected with tight deadlines and a lot of stress – just like in the opening story of this article – there appears a temptation to “temporary disable” a troubling test case. A developer might comment the whole test body out or just slightly modify its behaviour. Then, he promises himself to fix it as soon as he gets out of the blind alley. It rarely happens… First of all, when you need to disable a test case in order to implement a feature means you do not understand what both things do. You do not understand the purpose and the scope of the test case, as well as the impact of your incoming patch. This is the shortcut to introducing a really bad regression into the project. These walkarounds are going to pile up over time, and generally, the careless developer rarely finds time to fix it early enough. Secondly, if such practice happens exceptionally and relates to just a few developers, it might be easy to curb. However, if this practice is being commonly applied by the whole team, the project is doomed.

Another problem with test suites is their runtime. The purpose of most testing techniques is to deliver meaningful feedback on incoming patch. To empower the software development process and help build a better product, this feedback has to be quick. All the beneficial effects of rich test suites and good code coverage are suppressed by prolonged runtime of a suite. In a project I have worked on, our main test suite guarding the way into the main development branch had execution time of around 10 minutes. In case of this rather large project, it was successfully striking a balance between complexity of testing techniques and their runtime. However, another test suite, a BVT, had runtime of over 4 hours. While this kind of testing has a slightly different purpose, it happened to introduce serious breakdown in the project due to its runtime sluggishness. During final iteration of one critical project phase, build cluster got swamped with workload caused by this BVT suite. Developers could hardly obtain any meaningful feedback from it and all continuous integration systems were congested. The decision was made to disable it. Few weeks later, serious regression issues were found after re-enabling the suite and significant delay was introduced into the plan.

Code review

Finally, the powerful concept of code review. Another useful practice if happens not only on paper.

It your team has a policy of formal code review, requiring i.e. a senior software engineer or component owner to go through your modifications, this is right and really beneficial for the project. What I observed as causing problems is when there is a policy of more informal peer review. That you are only obliged to ask any member of your team to approve your changes. The risky behaviour here is when pairs of befriended developers start to form and they tend to agree on certain deals between them. They begin to assign each other as reviewers of their modifications and there is a tendency of silently accepting everything, without any deeper consideration. Partly due to usually high workload of an average developer, and partly due to plain laziness. Once again, when no proper attention is paid to this aspect of the software engineering process – under a false impression of saving time – great risk is being introduced into the project. It is almost equal to doubling chances of committing faulty code, where two pairs of eyes can hypothetically fish out twice as many flaws.

Conclusion

The aspects discussed are parts of a broader discussion about technical debt and good software engineering practices. The technical debt form an inseparable part of any engineering project and cannot be avoided, it has to be kept under control. People responsible for managing a project should properly trade short-term benefits which deepen a technical debt for well-thought long-term engineering decisions. The same applies to regular software developers, whose actions may reflect an overall strategy of the project or be out of control, forming micro risks which can add up together. As a consequence of poor engineering decisions, project might approach a state of technical death. Such state is rarely recoverable, and rewriting any bigger project from scratch is barely fisible.

Related reading

Things You Should Never Do, Part I, Joel Spolsky
Paying Down Your Technical Debt, Jeff Atwood