This is a response article to Scott Murray’s which can be found at blog.inf.ed.ac.uk/sapm/2014/02/21/delete-this-comment/.
I enjoyed the article so much it made me rethink one of my ongoing projects. The main idea discussed is that in many cases comments within code reflect poor design decisions. The author includes some very illustrative examples and shows how refactoring can render the comments useless and improve the design of the software. The article also discusses other strategies/technologies that can help drive down the number of comments. While I agree with the main point and most of its supportive ideas and arguments, I would like to discuss the cases where comments are really needed.
The simplest situations that recommend comments are those where a third party dependency results in strange behaviour, and they have been recognized in the original article. I will give examples of other situations where comments are desirable, but don’t necessarily reflect design issues within the code or its dependencies. This will support the fact that in some cases, comments are not only needed, but don’t pose risks and thus we should not seek to replace them with other techniques.
If you read the original article, you will note we are running some assumptions here. Obviously depending on the environment used, the perfect design cannot be implemented sometimes. Comments are welcome in these situations, since they explain the unexpected to the reader. We will further ignore these situations in our discussion.
Suppose you are writing a function to analyse a large set of data. The methodology is not well known, and there is no theoretical background you can point to that exactly reflects this situation. Let’s use an example. There is a matrix called features which holds vectors of features; for a given pair of row_index and column_index, you are trying to estimate the expected value of the feature at the corresponding position in the matrix (note you already know its value). To do this, you want to use the information from the nearest neighbours of the vector at row_index, but the high-level decision is that you don’t want to use the attribute information of column_index to compute those neighbours. In MATLAB, the similarity metric could easily manipulate the input data to achieve this goal:
features(:, [1:column_index-1 column_index+1:end])
which is interpreted as “from features, select all rows and all columns except column_index“.
The hiding of this column is dictated by the methodology. Instead of trying to come up with clever names for an intermediate variable that will hold the result of the code example (think features_without_column_to_estimate), a comment explaining what the goal of the code is would be more appropriate. Keep in mind this decision is made once before the code being implemented, and although it might never be changed (or even read again), it is good to have an explanation for posterity. If the methodology changes, it is rarely the case that someone will try to change this code; more often than not, a new function will be created for evaluation of estimates, if there is a new and possibly better idea.
The given example illustrates that in some cases comments might be the best way to document the code (as opposed to automated tools or other techniques). In these cases the users of the code will usually not care about its inner workings, its methodology. If for some reason someone needs to understand the code (be it debugging or methodology improvements), comments can explain the goal and intent of the code; they do not reflect bad design decisions and should not be replaced by other, more complex, tools.