/* DELETE THIS COMMENT */

Among software engineers, the deletion of code is a celebrated act, as it typically represents the discovery of a more elegant expression of a solution. Famous anecdotes exist about the engineer who confounded his managers by reporting his productivity as “-2000 lines” (Bill Atkinson of Apple, Inc), as well quotes such as that attributed to Ken Thompson of Bell Labs, “One of my most productive days was throwing away 1000 lines of code”. Removing code typically represents refactoring an existing solution to reduce unnecessary complexity, duplication, or other manifestations of software-bloat. Deleting code removes an opportunity for confusion, and the fastest executing code is that which does not exist. Removing code is indeed a pleasure, earned as the reward for diligent refactoring, or intelligently removing complexity. I contend that deleting comments (or otherwise avoiding having to write them in the first place) can be similarly rewarding.

Drawing from a combination of the writings of various computer scientists and software engineers on this topic, as well as my own personal experience, I will explore what drives us to comment code and argue that in many cases comments may represent an opportunity for misleading the reader, or that writing a comment potentially represents a failure to refactor for clarity or to enforce invariants by design.

I will aim to convince that in many cases, comments can and should be omitted in favour of self-documenting code which enforces its invariants by design, and that comments represent an opportunity to introduce redundancy and repetition into the codebase. I will conclude that comments should be avoided wherever possible by refactoring code, introducing automated tests as executable documentation (or by providing some other form of executable API samples), while conceding that some comments may remain necessary (such as when a third party dependency is known to be defective resulting in strange usage) and considering how best to handle these cases.

# Why do we write comments?

The basic argument for writing comments is that source code serves the dual purpose of generating an executable program via a compiler or interpreter, and as a means of conveying the program to human readers (i.e., other software engineers). The source code represents the current state of the behaviour of program; it is the construction-site on which all software engineers will work. As on a larger software team, it is unlikely that all software engineers are likely to be familiar with all aspects of the source code, and will therefore need to be able to read, understand and modify source code that they have not written. Source code is likely to be read far more than it is written (as it will be read by potentially many engineers who did not write it); therefore any steps that can be taken to increase the understandability of source code for its human readers will likely pay off over the long term. The argument is that by writing high level natural language comments describing the intent behind a module, class or method, the source code is then more easily digestible for humans (and the compiler or interpreter is ambivalent).

The received wisdom is that there is a hierarchy of quality of comments, ranging from least useful to most useful. The least useful describe what the code does, essentially augmenting the source code with its natural language description, such as;

++ i; // increment i

This class of comments is considered poor form as it obviously adds little to the comprehensibility of the program; it literally describes what the source code statement does, adding no information.

The next category of comments, considered somewhat more useful are comments detailing how the code does what it does, which can add some value. Consider the following code listing for computing an approximation of a reciprocal of the square root of a number;

// inverse square root with 1st order approximation of Newton's method
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y;
i = 0x5f3759df - ( i >> 1 );
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) );

The implementation shown above is lifted directly from the Quake 3 source code [2] (with the more colourful comments elided); clearly the comment (which is not present in the original source) adds some contextual information that assists a reader, who may otherwise be confused about what the program is trying to accomplish.

The final category of comments, typically considered the most useful, are comments which describe the why, signalling the higher-level intent behind the augmented code. For example;

// The following code computes the reciprocal of a square root, to a // first order approximation using Newton's method
//
// This method is required for performance reasons for lighting 
// calculations, as 1.f/sqrt(num) is too slow on target 
// hardware on [target hardware, target compiler]
//

This high-level intent is not expressed by the source code implementation itself, so this class of comment adds information to the source code which is otherwise not possible to determine from reading it. Other examples of this category of comment are explanation of why a more complicated algorithm was favoured over a simpler one, or why a specific but non obvious order of operations is required, due to some subtlety.

Of these three categories, instances of the first two are almost always candidates for removal via refactoring, as will be discussed in the following section. This is sometimes the case for the final category also; note that the why categories of comments also carry the additional danger of becoming out of date, but this being impossible to determine from the source code (due to it conveying concepts not expressed in the source code), which can lead developers to introduce defects based on misunderstandings caused by misleading comments. This is because source code is rarely the authoritative statement for high level decisions unrelated to the code (such as business domain decisions), so including this information in the source code comments is a form of “don’t repeat yourself” ( DRY) failure (as described by  Hunt & Thomas in The Pragmatic Programmer [8]), and carries similar risks to software repetition. This phenomenon, that comments can actual serve to obscure the intent of the source code, is described by Yourdon in Techniques of Program Structure and Design [3]. Several of the lessons in The Elements of Programming Style [6] by Kernighan and Plauger relate to the dangers of comments, expressed by the maxims to “ensure that the code and comments agree”, “don’t comment bad code, rewrite it”, and “don’t echo the code with comments”. The next section of this article will explore the idea that “The proper use of comments is to compensate for our failure to express ourself in code” [1].

# Comments May Represent Failure To Refactor

Consider a system which conveys invariants to API consumers via comments, such as the following method signature;

/*
* Method - UpdateThermostat
*
* Arguments - temperature. This value represents the temperature to 
* set the thermostat to. This value MUST be valid value in degrees 
* Kelvin, where 0.f is absolute zero.
*/
void UpdateThermostat(float temperature);

This comment appears to be useful, as it conveys information which the method signature does not. However, the signature itself could be refactored to enforce its invariants. By refactoring such that a type is introduced to abstract the notion of a temperature, the invariants can be enforced by construction.

class Temperature {
private:
    double valueKelvin;
    Temperature(double value):value(valueKelvin) { }

public:
    static Temperature FromKelvin(double kelvin) { 
        /* validate value in range or throw exception */ 
    }

    static Temperature FromCelcius(double celcius) {
        /* convert to kelvin and call FromKelvin() */ 
    }

    double GetAsKelvin() { return valueKelvin; }
    double GetAsCelcius() { /* convert to celcius and return */ }
};

By using this data type as part of the API, the method signature becomes;

/*
* Method - UpdateThermostat
*
* Arguments - temperature. This value represents the
* temperature to set the thermostat to.
*/
void UpdateThermostat(Temperature temperature);

This method can then rely on the invariants being enforced in the Temperature class, reducing the need to validate the parameter (reducing DRY fail if temperatures are used in multiple places). This method also abstracts the UpdateThermostat method from the underlying data type used to represent the temperature. Crucially, this method no longer needs to state its invariants in a comment header as they are enforced by the type system. This demonstrates that many benefits can be derived by selection of a better design, and the presence of comments is an indicator that a better design choice is available. After the refactoring, the comment header adds no additional information beyond the method signature, and is a candidate for removal.

Another common example of an invariant which is stated via comments is the implicit sequence of valid operations for a class. Consider the following example;

/*
* Class - Database
*
* This class abstracts a database to allow queries to be executed. 
*
* Users MUST call Connect() before executing any queries. It is an
* error to call ExecuteQuery() before connect has been called.
*/
class Database {
    bool Connect(Credentials credentials);
    bool ExecuteQuery(string query);
}

Again, this comment appears to be useful as it adds semantic information which the code API itself does not present to the consumer. However, this comment also represents a failure to refactor the design. The Database class has two responsibilities; connecting to a database, and executing queries. These responsibilities could be separated into their own dedicated classes, which could allow the invariant described in the comment to be enforced by the system design. Consider the following refactoring;

class Database {
    DatabaseConnection Connect(Credentials credentials);
}

class DatabaseConnection {
    bool ExecuteQuery(string query);
}

By extracting the responsibility for performing queries from the type which is responsible for making connections, the system can enforce that it is only possible to perform a query from the result of a connection. By doing so, the system can enforce its invariant that the Connect() method is called prior to calling ExecuteQuery(), by providing access to ExecuteQuery() via the result of a connection.

Refactoring to eliminate comments can be applied at the statement level, as well as at the level of API design and interaction between components. Consider the following code, which exists in the context of some commerce application;

// calculate total cost by applying 20% sales tax rate
result = x + (x * 0.20);

Again, the comment does serve to clarify the intent of the code. However, the code also contains a magic number and non-descriptive variable names. By addressing these other deficiencies, the utility of the comment is reduced.

// calculate total cost by applying 20% sales tax rate
const TaxRate = 0.20;
totalCost = preTaxCost + (preTaxCost * TaxRate);

This can be refactored for further clarity by extracting a method;

// calculate total cost by applying 20% sales tax rate
totalCost = ApplyTaxRate(preTaxCost);

The initial comment now seems absurd in this new context; in addition to stating what is obvious from the code, it also conveys implementation details which may be misleading, should the tax rate change and the ApplyTaxRate method be updated. This illustrates how seemingly innocuous comments can contain DRY (Don’t Repeat Yourself) violations, as the information in the comment can become out of date and stale, such that it no longer conveys the intent of the source code it attempts to describe. As the pithy quote attributed to Ron Jeffries states, “Code never lies, comments sometimes do”. The earlier example from the Quake 3 source would be a good candidate for this simple refactoring, where an appropriate method name could remove the need for the comment to explain the obscure implementation. Writing “exceedingly clear”, verbose code is advocated for by David Heinemeier Hansson in [4]. In Refactoring [7], Fowler captures the essence of refactoring to eliminate comments; “When you feel the need to write a comment, first try to refactor the code so that any comment becomes superfluous”.

# Avoiding Redundancy – Executable Documentation Via Automation

Given that source code comments may exhibit problems such as redundancy and repetition, it is worth considering what alternatives are available for filling the role that comments play in providing a high level description of program behaviour. A suite of automated unit tests can help convey the intended usage of the components in the program, and integration tests can illustrate the interactions between components. Well named tests which capture usage scenarios can serve the same purpose as comments in describing the invariants of the component under test. A significant benefit of automated tests is that the tests are more likely to be maintained, as the will (hopefully!) fail if aspects of the program change, such as invariants or expected behaviour. Additionally, documenting behaviour through tests requires that the software be designed in a testing-friendly (i.e., loosely coupled and highly cohesive) manner, further clarifying the API and reducing the need for comments as the responsibilities and capabilities of components should be clear. Automated testing therefore represents a good alternative to comments for program documentation to be consumed by software developers working on the project.

To accommodate external consumers of the program who are not primary developers (such as downstream users of a software library), it may be necessary to include comments. For example, a business client may insist on class or method level comment headers. If this is the case, there may be a “natural” canonical source of documentation or specification of the domain rules for the software. Ideally, comments should refer to other canonical documentation sources so as to not introduce repetition. For library consumers, another possible documentation method could be a set of (ideally automatically testable) API examples of the various use cases of the library, which can be used to demonstrate functionality without having to describe the current behaviour in natural language. If these API examples were automatically tested, regressions of API breakages could be detected, whereas a comment or document describing how to use the library to achieve the use case may break silently.

# Comments Are A Useful Crutch And May Sometimes Be Required

This article has presented arguments for the benefits which can be derived for refactoring to remove comments, in terms of software clarity and enforcement of invariants. Automated tests and API examples have been suggested as possible mechanisms to derive similar benefits to comments in terms of providing a higher level description of the software behaviour, with the added benefit that automated tests are inherently self-validating, so should be more resistant to becoming stale and outdated.

Other categories of comments can be potentially be replaced with more appropriate tools. For example, the classic TODO or FIXME comment could be replaced with a bug tracker describing the undesirable behaviour of the current implementation or design, and could be used for tracking possible remedies for the the problem. This has the benefit of integrating a more formal life-cycle for these types of annotations to the source code, such that these defects can be categorised and searched for, resolved or waived, etc. By adopting this approach, the tracking of technical debt can be made more transparent and available to non-technical team members, rather than informally captured as source comments. Similarly, NOTE and BUGFIX comments may represent a failure to use descriptive log messages when committing code to project’s version control software (VCS). Common VCS features such as blame can be used to investigate the history of the source code.

Despite the arguments presented here, it would be remiss to suggest that it is practically possible to remove all source code comments, or that this is even a reasonable or worthwhile goal. Sometimes comments are necessary to explain a strange or unexpected interaction with a third party dependency, such as a workaround for a known defect; sometimes they are required to explain why an obvious and well known algorithm for solving a problem hasn’t been implemented. Perhaps an iterative solution for a naturally recursive problem has been selected due to stack size limitations. These types of comments strive to explain the why, but critically it is a why which is answerable in the domain of the implementation, and expressible solely in the source code, such that there is no risky repetition of business rules.

Agile coach Tim Ottinger states in his short article on the subject [5] that comments are expressing a deficiency in the code, that the code has failed to meet the natural expectation of the reader, and should therefore be considered as “apology notes”. These apologies may be for a failure to refactor to remove problems, for not being able to use a well known algorithm (or for requiring a hand optimisation due to an earlier decision to favour a non-optimal design), or for simply having written hard to understand code. Ideally when reaching for a comment as the tool to solve a problem in your source code, ask if the comment is explaining away a deficiency in your design. Can the solution be implemented in such away that the comment becomes unnecessary? Would renaming variables or extracting methods help clarify intent? Would splitting up a module to better encapsulate its responsibilities help avoid a breakable invariant by design? Is your comment replicating information form a canonical source which may be subject to change? It may be that time pressures or other constraints require that the comment be written, or it may be that the comment really is required to convey information in the domain of the implementation that would otherwise be impossible to ascertain, and which could not be more appropriately conveyed in another medium. When adding these types of comments, consider what would be required to remove the comment, be it a bug-fix from a third party, or redesigning a poor interface and write this in the comment. Then, when the underlying issue is fixed, delete the comment.

# References

[1] Martin, Robert C. “Clean code: a handbook of agile software craftsmanship”. Pearson Education, 2008.

[2] Quake 3 Arena. Source Code Listing. URL: https://github.com/id-Software/Quake-III-Arena/blob/dbe4ddb10315479fc00086f08e25d968b4b43c49/code/game/q_math.c#L552 (last accessed: 21/02/14)

[3] Yourdon, Edward. “Techniques of program structure and design”. Prentice Hall PTR, 1986.

[4] Hansson, David H. “Clarity over brevity in variable and method names”. URL: http://signalvnoise.com/posts/3250-clarity-over-brevity-in-variable-and-method-names (last accessed: 21/02/14)

[5] Ottinger, Tim. “Apologies In Code”. URL: http://butunclebob.com/ArticleS.TimOttinger.ApologizeIncode (last accessed: 21/02/14)

[6] Kernighan, Brian W., and Phillip James Plauger. “The elements of programming style”. New York: McGraw-Hill, c1978. 1 (1978).

[7] Fowler, Martin. “Refactoring: improving the design of existing code”. Addison-Wesley Professional, 1999.

[8] Hunt, Andrew, and David Thomas. “The pragmatic programmer: from journeyman to master”. Addison-Wesley Professional, 2000.

One thought on “/* DELETE THIS COMMENT */”

Comments are closed.