Incentives Can Be Poisonous, So How Do We Fix Them?

In “Incentives Poison Extreme Programming Values”, Craig Innes explored the dangerous effect external incentives can have on the decision to use an Agile methodology, using the third year SDP project as an example of such a scenario. In this response, I intend to expand on his article by explaining why these incentives existed and how their goals can be met in a less damaging way. I will then generalize principles from this example that can apply to a variety of situations (work environment, assignment structure) in which authorities need to monitor or incentivize work, yet do not want to disrupt the Agile process.

Milestones, friendlies, and stack ranking: why were they used?

In order to understand how to improve SDP, we have to understand why the course implemented milestones, competitive friendlies, and zero-sum marking. Only then can we redesign the course to achieve the same ends without interfering with the Agile process.

The milestones in SDP primarily served as a way to measure the progress of student teams, but also as explicit short-term goals to aid team planning. Measuring the progress of teams not only allowed for intervention if progress was slow, but also allowed for a relatively standardized set of grading against an objective measure of success. The fixed goals also allowed teams of students who were new to major project planning to have some idea of what capabilities they should have at certain dates.

The competitive nature of friendlies was another incentive to ensure that teams took the opportunity seriously. If doing poorly meant a worse seeding in the final tournament, teams would be incentived to work at performing well at these events.

Finally, the competitive marking at milestones ensured that individuals did not freeride off of the group’s efforts, while at the same time it prevented the group from conspiring to give each other high marks. The limited number of high marks ensured that the group could not grant everyone an undeserved high score, and the resulting competition incentived denying freeriders a high mark.

In summary:

  • Milestones and competitive friendlies were meant to ensure that teams took their responsibilities seriously, and provide fixed, short-term goals for members to target.
  • Competitive marking was meant to both ensure individual effort, and defend against widespread mark-fixing within groups.

How do we fix it?

Milestones are a serious problem, as they are critical to measuring and maintaining the team’s progress. Yet this can be achieved, as Craig himself suggested in his reply to a comment, with group’s setting their own milestones. Each group has a mentor, a student who has previously taken SDP and who signs off on major decisions (such as allocation of marks), who could help the members of a group decide on a reasonable milestone to target next. Teams could then adjust their milestones to measure progress which is relevant to them – some teams may want the traditional feature-by-feature approach, while others might adopt a more Agile strategy of frequent, iterative releases. Either way, this approach would grant much more flexibility to teams, while retaining a grounded approach thanks to the mentor’s input.

The solution for competitive friendlies seems obvious enough – make them noncompetitive, or even end them all together! Tournament seeding can be random, and teams will have an existing incentive to take any opportunity to simulate the conditions of the final tournament seriously. Friendlies can either be organized by the course organizers as mandatory or encouraged matches with no stakes, or it can be left up the teams to organize their own friendlies.

Finally, zero-sum marking is both understandable (the organizers want to avoid collusion and freeriding) and deeply perverse (similar to Microsoft’s “Stack Ranking”, creates an environment of mistrust and hostility, incentives hoarding code and “low hanging fruit”). This problem is much harder to patch than the previous ones, as it is a symptom of a larger issue with a dissonance between the intent and practice of SDP. The easiest patch is to simply remove the limitation on high marks, and give more authority to mentors and the course organizers to review and modify the marks given. This does generate more work for those in charge of the course, but allows teams to focus directly on rewarding members as best makes sense to them.

Conclusion

While SDP’s milestones, competitive friendlies, and stack ranking seriously interfered with Agile approaches, they can be revised or eliminated in a way that allows for a variety of team methodologies. Even in an environment like SDP, which seeks to assign each student a grade to determine their grasp of the material, structures and incentives which undermine an Agile approach can be dealt with if the course organizers are willing to meet students halfway.

Test-Driven Development – How Applicable is it?

Test-driven development (TDD) is a development process for software that attempts to simplify designs through the writing of tests.

It involves the developer writing tests with an improvement or new functionality in mind and then writing code so this test can be passed. This forces the developer to concentrate on the requirements of the software before writing code, as they need to know what functionality to test. Once this simple code passes the test, it can be refactored to reach the standards acceptable to the development team.

However, how applicable is this developmental process? Can it really be applied to every development situation?

I will be exploring these questions in my post, looking at the benefits of a TDD and problems experienced when dealing with legacy code, small projects and also at how much testing really should be done under a TDD.

What is TDD?

Firstly, in order to be analysed, TDD needs to be correctly specified.

Defined by Kent Beck in 2003, [1] TDD is built upon the concepts of TFD (Test-First development), and can be described simply as:

“TDD = Refactoring + TFD” [2]

A developer who is programming under the TDD approach never writes a new feature or functionality, until there exists a test that will fail because of a lack of implementation.

After the test is passed, the programmer will refactor this simple code to bring it up to the quality of the team standards (this step turns TFD into TDD).

Simply it can be summarised in the following cycle: [See diagram below]

  1. Add a test.
  2. Run all tests and see if the new one fails.
  3. Write some code (causing the test to pass).
  4. Run tests again.
  5. Refactor code.

Diagram 1

Benefits

But are what are the benefits of this practice?

Naturally, due to the nature of TDD, more code has to be written due to the extensive testing carried out.

A model (designed by Muller et al.) compares the development times of startup projects, one developed under conventional methods and one under TDD and the results show that the total code implementation time is, in fact, reduced. [3] (See Image below)

Development Time

Due to the extensive unit testing being carried out, there’s also a greater likelihood that defects are caught early in the development lifecycle, saving time and money had they not been found till later.

Lech Madeyski, showed the benefits of the practice empirically over a more traditional development approach where testing was left to the end of development.

He found that the quality of the unit tests created under TDD was greater, as they scored better on branch coverage and the mutation score indicator than the tests from the traditional approach. Branch coverage and mutation score indicator can be used to show the thoroughness and helpfulness of tests for detecting faults in the software. [4]

Silver bullet?

So we can see the benefits of the conceptually simple TDD in both the speed of development and quality of tests but can all software development projects really be developed under TDD?

Code that has already been developed from a different development style, called legacy code, can be brought up to the standard of TDD, in that small tests can be written to demonstrate the functionality of the program but for a large codebase this can prove tricky and begs the question: is it really worth it?

Having a consistent development process within a team is a reason for converting legacy code to TDD, as it would mean developers who were more comfortable developing under TDD wouldn’t have to switch to unfamiliar practices to work with the older code, likely resulting in less mistakes being made.

Nevertheless, repetition of test creation would almost certainly occur, as the legacy code would already have tests in it. These would either have to be redesigned or ignored in order to fit into the TDD style.

Shihab et al present an approach of such an adaptation, coining it “Test-Driven Maintenance”. It enables teams with limited resources to test legacy systems efficiently by producing a prioritised list of functions that updates as unit tests are written and passed. [5]

As there exists an approach to bring a legacy system under a TDD-like development process (TDM) and because of the positives of having a TDD, it’ll be worthwhile to convert legacy systems, regardless of the repetition that might occur.

The positives are even greater when you see that companies tend to spend over 90% of their total software cost in the maintenance and evolution phases and so making sure these stages are efficient and as well developed as possible will help to save a lot of money. [6]

We need it quick!

Some people argue that the upfront cost, in terms of time, of constructing tests is not worthwhile for a small prototype program rushed to market.

While this might make sense in concept, the consequences of not following TDD are not quite so obvious.

In reality, TDD tends to be very useful in smaller projects, rather than overkill. This is because on this smaller scale it’s easier to be disciplined enough to continue to write tests and follow the practices of TDD. Whereas, on larger projects, developers tend to get complacent due to time constraints and the complexity/number of features that need to be written.

If this small project were developed under TDD and proved to be successful, the extra time sacrificed to create tests would pay dividends in its future development. If, however, the system was developed under some other method it would probably continue with it and so we will end up with another legacy system. Converting this system to a TDM would take a lot of effort, much more than if TDD was followed from the start.

So, even for small projects, I think that TDD is a development style that should be practiced as even though the upfront costs of the tests may slow down progress at the start, time spent here will prove to be invaluable later on.

More tests?!

Many sceptics of TDD complain that while testing is useful for ensuring successful software, it spends far too much time concerned with testing very basic functionality, which they believe doesn’t need to be tested. Also while testing might ensure that the developers are confident that their code works, they don’t get paid to test, this is the job of the testers. [7]

However, redundant testing is just what non-practisers of TDD think it’s all about.

Here is what, Kent Beck, the father of TDD says:

“I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a level of confidence… If I don’t typically make a kind of mistake… I don’t test for it”

This shows that even in TDD you don’t have to test every single function and that common sense should be applied.

If you’re concerned you might get it wrong, write a test for it.

In larger projects this philosophy will help keep the total number of tests down, which will mean less time spent maintaining them. It also means that developers won’t spend too long writing tests for the software, while still being able to reap the profits of the TDD development style.

Conclusion

Test-driven development is definitely a development process that needs to be seriously considered more by software development companies.

As we’ve seen it shines in a variety of situations, from legacy code to quick prototypes and even in large startup developments. The positives are shown through the overall shorter implementation times, better quality unit tests and greater chance of catching defects early in development.

The father of TDD, Kent Beck, also dispels some myths about TDD, revealing that tests only should be made when YOU think they should and until YOU are confident with your code.

For all these reasons I think that TDD needs to be something that development teams need to reconsider.

References

[1] – Kent Beck, “Test Driven Development: By Example”, Addison-Wesley, 2003.

[2] – Scott Wambler, “What is TDD?”, 2013, http://www.agiledata.org/essays/tdd.html#WhatIsTDD [Accessed on: 1st March 2014].

[3] – Matthias Muller & Frank Padberg, “About the Return on Investment of Test-Driven Development”, 2012, http://www.ipd.kit.edu/KarHPFn/papers/edser03.pdf [Accessed on: 1st March 2014].

[4] – Lech Madeyski, “Test-Driven Development – An Empirical Evaluation of Agile Practice”, Springer, 2010.

[5] – Shihab et al, “Prioritizing the Creation of Unit Tests in Legacy Software Systems”, 2010, http://sail.cs.queensu.ca/publications/pubs/shihab_spe2011.pdf [Accessed on: 1st March 2014].

[6] – Len Erlikh, “Leveraging legacy system dollars for e-business”, 2000, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=846201 [Accessed on: 2nd March 2014].

[7] – Assaf Stone, “10 Reasons to Avoid Test Driven Development”, December 2011, http://www.softwareandi.com/2011/12/10-reasons-to-avoid-test-driven.html [Accessed on: 2nd March 2014].

Pair Programming, Best Development Practice?

This discussion is a response to s1369981’s post, “Why Pair Programming is the Best Development Practice?”

In their post, the author tries to convince the reader of the benefits of pair programming, highlighting the positives that this agile developmental style brings, such as, increased knowledge sharing and project ownership, stating that:

“Pairing is about collaboration and teamwork, thus it will shine… [in] knowledge sharing and project ownership…”

Although, despite the author’s arguments, I’m not completely convinced of their stance, mostly due to the, exaggerated, claim that pair programming is the overall best development practice in all scenarios of programming development.

I think that while there are certain situations when it can be beneficial, (between two novices) extending this to every programming situation is not significantly justified.

The author doesn’t see this because they neglect to address:

  1. The economic downfalls of the practice.
  2. Developers’ natural opposition to this working style.
  3. The fact programming might not be so mundane, if you enjoy where you’re working.

These counter arguments that I present will show that the author has failed to consider these important points in their article and that they’ve been too general in their argument that pair programming is the best development practice for all situations.

But Work is Boring?

In the post, s1369981 makes certain claims, that I’m not particularly in agreement with, such as:

“… most programming tasks in a software company will focus on boring and repeatable tasks with very few of them being creative and challenging.”

This pessimistic view of what the programming world after university is like tends to suggest that the only hope for a programmer to have an enjoyable time is to pair up, therefore distracting you from your “boring and repeatable tasks”.

This solution of improving your enjoyment at your job would only ever be a temporary one, as the novelty of pair work wears off.

Finding a more exciting company according to your personal tastes in programming would help you to enjoy you work more, without needing the distraction of a partner to making it bearable. Also, by simply increasing your communication amongst members in the team, working on different projects, would increase team spirit and cooperation and make it feel much less like you’re working on your own.

I’m stuck!

Speaking from personal experience, while on my internship, I found that instead of any pair programming scenarios, the newcomers (or contractors) to the team sought out the help of more experienced senior developers when stuck, rather than pairing up with them while programming.

This practice produced similar benefits of a senior developer working with a novice, in that the more experienced developer could pass on valuable knowledge and use their expertise without feeling restricted by having to effectively babysit this new employee.

This also left the senior developer with time to apply their invaluable knowledge elsewhere by programming solo, where they would be able to maintain their high productivity. [1]

As mentioned before, having a pair programming situation amongst two novices or a novice and someone who is competent would be helpful because, on their own, they’d undoubtedly have a low production levels but together they can boost their learning levels and this allows new recruits to get up to speed quickly. [1]

Economics

Something not mentioned in the author’s article is the economic viability of mass pair programming, as the team would need to have more employees to manage the same amount of projects.

In controlled studies it was found that it wasn’t economically viable as only for simple systems was a significant decrease in development time found and no significant difference in correctness of solutions. [2]

In fact, in this large empirical study, Arisholm et al. found that the results did not support the general consensus and that:

“… pair programming reduces the time required to solve tasks correctly or increases the proportion of correct solutions.”

Instead, they discovered that, in general, there is an 84% increase of effort required from the programmers to perform the tasks prescribed correctly, where effort (or cost) is the total programmer hours spent on the task.

These empirical results give us a more concrete measure of the benefits of pair programming amongst a variety of levels of programmer and I believe this evidence to be more reputable than remarks from people who’ve tried out pair programming, as this is open to bias.

The findings back up the reasoning that for a team to be operating at the same level as they are currently, managing as many different projects as they are, they’d have to hire more employees to maintain this level of output even when the benefits of pair programming aren’t so great.

It ain’t all fun

The author’s conclusion takes a simplified view of the situation by suggesting it should be adopted because:

“Pair Programming is the best development practice, because it is fun!”

But as suggested earlier in the article, by the author, there is a lot of strong opposition to this with people arguing adamantly against this belief. [3]

So, certain people will not work well in pairs, no matter how many statistics or studies you throw at them and I believe that if it is going to be used in a team, it should be tried out for a certain period where productivity can be monitored.

As mentioned and described by s1369981, people should be also be educated in how to properly undertake the pair programming developmental process if they’re going to be working with it and this can help to eliminate common mistakes and incorrect assumptions made about the practice.

Once the practice has been carried out correctly, the management can get feedback from it both empirically and from the developers who tried it so that they can make a reasoned decision on whether it is a viable option for your team.

Here, developer input should be considered closely because regardless of whether it makes your programmers more productive, making them uncomfortable in their work environment will cause some people to quit.

Conclusion

There are some points in s1369981’s article that I agree with, such as, the fact that pair programming can increase knowledge sharing and project ownership in a team.

However, the application of pair programming to all forms of development is an overstretch due to the economic downfalls, some developers being opposed to paired work and the argument that only pair programming can make your job enjoyable.

I do believe that it still has its place e.g. between two novices in a company or for complex tasks, as it can help to improve correctness of code, but bear in mind that this comes at a price: overall increased effort. [1] [3]

Therefore, any adoption of pair programming should be evaluated on a case-by-case basis to see if it really is the “best development practice”.

References

[1] – Derek Neighbors, “Should Senior Developers Pair Program?”, November 2012, http://derekneighbors.com/2012/11/should-senior-developers-pair-program/ [Accessed on: 26th February 2014]

[2] – Erik Arisholm et al, “Evaluating Pair Programming with Respect to System Complexity and Programmer Expertise”, 2007, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4052584 [Accessed on: 26th February 2014]

[3] – Matt Ervin, “Pair Programming (give it a rest)”, November 2013, http://peniwize.wordpress.com/2013/11/17/pair-programming-give-it-a-rest/ [Accessed on: 28th February 2014]

“Do you have testers?”

The Joel Test: “Do you have testers?” – No, and neither should you (probably)

tl;dr

The times they are a-changin’. In the post-Agile world, not having dedicated testers might just be advantageous for you. Depending on the software your team produces, factors such as time-to-market might be more important than a perfect product. “Build the right it” first, “build it right” later.

Setting the scene

In 2000, the prolific software writer Joel Spolsky proposed the “Joel Test”: twelve simple yes-or-no questions to judge the quality of a software development life-cycle. The test invites the inquirer to ask questions such as “Do you use source control?” or “Do programmers have quiet working conditions?”. According to Spolsky, any good software development process will answer “no” to at most two of the test’s questions.

Fourteen years later, the test is still widely recognized as a useful, low-cost tool to evaluate the efficacy of a software development process. But is it really? A lot has happened in the world of software engineering since the test’s inception. When reading Spolsky’s original proposal nowadays, I can’t help but feel as though some of the questions are overly dogmatic or at least a bit dated.

For instance, question 10 of the tests states that

[i]f your team doesn’t have dedicated testers, at least one for every two or three programmers, you are either shipping buggy products, or you’re wasting money by having $100/hour programmers do work that can be done by $30/hour testers. Skimping on testers is such an outrageous false economy that I’m simply blown away that more people don’t recognize it.

Source: “The Joel Test”.

In the remainder of this blog post, I will analyze this assertion and discuss to what extent its premise is still relevant in 2014.

Spolsky’s argument then versus in the trenches now

Let us first have a look at the argument made by Spolsky for why dedicated testers are imperative for a good software development process and then contrast this with my personal experiences in the software world.

Question 10 of the “Joel Test” (quoted above) observes two issues:

  • A team without dedicated testers will release a bad product because of lacking quality assurance (QA) from both technical and functional points of view.
  • A team without dedicated testers wastes money since testers are cheaper than developers.

In later posts, Spolsky refines his point further:

  • Programmers make bad testers – the required skill sets are not related.
  • Having testers shortens the time between developers and getting feedback on product quality.
  • Testers will catch problems that the original developer didn’t (or wouldn’t).

Sources: “Why Testers?”, “Top Five (Wrong) Reasons You Don’t Have Testers”.

These assertions do not match my personal experience. Time for some anecdata.

Last summer I worked at the Amazon Development Centre Scotland (ADCS). Most people would likely agree that ADCS is a high quality software development environment. As a matter of fact, ADCS would obtain a perfect score on the “Joel Test” if it were not for the lack of dedicated testers on my team… however, this does not mean that ADCS ships bad code, as Spolsky claims must inevitably happen. Far from it. Software quality is simply assured by other (in my opinion superior) means:

  • Strict pair programming when code is being touched.
  • Unit and integration tests written for any new feature.
  • Before a major feature is released, the entire team pitches in and spends half a day to a day focusing on testing the new feature. If any show-stopper bugs are found, the release of the feature is delayed.
  • One person on the team is “on-call” every week, handling bug reports as they come in.

Arguably, this Agile-inspired approach has a number of advantages over having dedicated testers:

  • Everyone in the team has a rough idea of what is going on. There is less of a chance of knowledge silos developing.
  • There is a strong incentive to produce high quality code as your colleagues are going to be directly affected by the mess you cause (as opposed to testers in some other team you might not know).
  • More eyes see the code than with Spolsky’s recommended “1 tester per 2 engineers”, increasing the chance of finding bugs or poor user experience.
  • Developers will never have to wait for the testing team. Automated tests give instant feedback. More involved evaluation such as user acceptance testing can be prioritized appropriately by the developers in order to integrate it seamlessly with the rest of the development process.

In addition to these benefits, I also found that foregoing dedicated testers means that the development team is more likely to develop a strong sense of ownership of their product. The team owns the software development life cycle end to end and is therefore more likely to “do the right thing and do it right”: develop reliable solutions that do what they are supposed to do. Contrast this with just implementing some specification, passing it on to testers (or later business analysts) and never looking back… Farewell, “code, compile and forget”. You shall not be missed.

A brave new world?

So how do we consolidate this difference between what the “Joel Test” claims should happen and what actually does happen on the grounds, even at a top-tier, quality obsessed company like Amazon?

It is important to note that the “Joel Test” was written in 2000 and very much reflects the spirit of the time in its attitude towards testing. A lot has happened in the world of software since then; most notably, development methodologies have shifted away from the Waterfall-like methods prevalent at the start of the century to a wide-spread adoption of Agile methods.

Software produced in the times of the “Joel Test” would have one, two, maybe a handful of releases a year, indiscriminately distributed via physical media to a company’s entire client base. Nowadays, the near-ubiquitous use of web-based services means that new software versions can often be rolled out instantly or to only a small part of your company’s customers. Being able to perform split tests on your product by releasing new features gradually into the wild means that the impact of bugs or poor design is less severe than with the old model where a company’s reputation could rise or fall on the newest polycarbonate-plastic pressed version of its developers efforts. Thus reduced the importance of the tester.

Furthermore, the modern-day developer has access to a plethora of new tools that reduce the burden of test. While some of these tools might have been available at the time of the “Joel Test”, they would have been nowhere near as ubiquitous as nowadays when FindBugs is in the top ten most downloaded Eclipse plugins and all major Java IDEs come with JUnit installed by default. Static code analysis finds a plethora of bugs even before the developer hits “compile”. Unit and integration tests take snapshots of the functionality of a system and therefore prevents test regressions by default. This eliminates a big chunk of the grunt work that formerly had to be done manually by QA departments. Thus reduced the importance of the tester.

Additionally, the switch to Agile-like methods brought about a mentality change in the software field. Previously, QA teams were often “separate but unequal” parts of the development apparatus. Under Agile and its offspring like Test Driven Development, everyone is a tester: everyone wears all hats. The role of QA can thus grow to encompass more directly technical or business-related aspects such as focusing on user acceptance testing and performance- or security-testing. From a button-mashing unequal to a respected fellow reliability engineer. Thus reduced the importance of the tester.

However, the shift to Agile can’t explain the entire difference between what the “Joel Test” claims and what I observe in the modern software world. Changes in tools and mentality have shifted some of the burden of test from traditional testers to the entire development team… but traditional Agile is still far from the laissez-faire “ship it” attitude that I see pervading the software world. As a matter of fact, Agile evangelist and writer Don Wells claims that

[q]uality assurance (QA) is an essential part of the [Extreme Programming (XP)] process. On some projects QA is done by a separate group, while on others QA will be an integrated into the development team itself. In either case XP requires development to have much closer relationship with QA.

Source: Extreme Programming Rules

I think that this remaining disjoint can be explained by observing that some companies might have moved on from Agile and are adopting elements from even more “lean” philosophies. “Idea bugs” are harder to fix than software bugs. If you build a product that people really want to use, they will be forgiving (prime example: the Twitter service-outage whale). Getting something out of the door and running it by real-world users is thus increasingly becoming the most important aspect of software development: delivering products, not code – focusing on “building the right it” rather than “building it right”. This is something traditional QA can’t really help with. And therewith died testing.

The future of test and concluding remarks

Yet test will surely live on with “I’m not dead yet” tenacity. The software forest is not only comprised of Agile web development trees. Traditional testing is sure to find respite in havens such as AAA game development with its ever-present big releases or security- and safety-critical programming with its zero-tolerance for bugs (e.g. medical or aeronautical software). For the remaining denizens of QA-land, however, I fear that the time has come to leave testing drugery behind and move on to the greener pastures of product management or reliability engineering.

As a closing note, I’d like to leave you with the following video that makes a similar point to the one labored in this blog post, but in a much more entertaining way: “Test is Dead” by Alberto Savoia.