Agile and Critical Systems

Introduction

Whilst researching this author’s first article, one fact came up time and again – Agile is not suitable for critical systems and is rarely used, if at all, in this area.  Countless experts made the point that the Waterfall methodology, with its emphasis on documentation, upfront planning and analysis and defined phases, is a much better option.  This made the author curious.  What is it about Agile that makes it a bad option?  Is there anything that can be done to improve its chances of being taken seriously as a methodology for critical systems?

What is a critical system?

Before starting to look into how Agile could (or could not) be used, it is prudent to define exactly what a ‘critical system’ is.  A critical system, also known as a ‘life-critical’ or ‘safety-critical’ system, is a system in which failure is likely to result in the loss of life or environmental damage.  Failure can be considered to include both catastrophic failure of the system or mere malfunctions.  Examples of critical systems include medical appliances, nuclear reactors, air traffic control systems and an airbag system in a car.  The fields that critical systems are employed in are wide, ranging from the examples given in medicine, energy and transport to spaceflight and recreation.

What is meant by ‘Agile’?

‘Agile’ is generally used as a catch-all term for a particular family of software development methodologies.  These methodologies all use the ‘Agile manifesto’ as a starting point but interpret and implement its philosophy in differing ways:

We are uncovering better ways of developing software by doing it and helping others do it.  Through this work we have come to value:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on the right, we value the items on the left more.

Agile methods include Extreme Programming (XP), Scrum and Crystal.  Whilst these methodologies all implement the principles of the Agile Manifesto differently, they all follow the same principles of teamwork, quality, communication and adaptation.

Current practice in the development of critical systems

As a result of the potential consequences of a failure in a critical system, there is a requirement for very high reliability.  This requirement and the need to be able to demonstrate a high level of confidence has led to regulatory standards becoming common within industries that utilise critical systems.  For example, the development of software within the aviation industry in Europe must adhere to the ED-12C standard from the European Organisation for Civil Aviation Equipment (EUROCAE) (referred to as DO-178C in the US).  IEC 62304 is a standard specified for the development of medical device software from the International Electrotechnical Commission (IEC).   However, whilst these standards define strict frameworks for the development process, they are not prescriptive.  The organisations or teams involved in developing critical systems are free to choose their own methodology, so long as the activities and tasks specified by the standards are implemented.

This need to meet standards has led to heavyweight methodologies, such as Waterfall, dominating software development in critical systems.  These processes, with their focus on upfront analysis and design and defined phases, are viewed as better suited to the process.  The standards also generally require that records are kept (to ensure traceability) and again, with their emphasis on documentation, heavyweight development processes are a natural choice.

In addition, the single, large implementation at the end of a heavyweight development process allows for safety certification to occur once on the completed critical system.

Why use Agile?

There is not an existing Agile methodology that could be safely used unaltered in the development of critical systems.  However, these is no reason why certain parts of the Agile methodology could not be selected and incorporated into working practices.

There are several of Agile’s principles or consequences that would be a natural fit in the development of critical systems.  Quality is the central factor in critical systems so Agile’s focus on improved quality would only be an asset.

Agile’s focus on developing and testing iteratively means that problems are identified much earlier than would be the case in a heavyweight methodology.  As a result, the risk of defects in the end system and their potential consequences is reduced.  It also removes the risk of human error in a long and complex testing phase at the end of the project.

Also, the Agile practice of continuous integration has benefits to critical systems development.  Integration occurs much earlier in the project lifecycle and with less sizeable and complex components.  This allows immediate feedback to the developer and rapid corrective action to be taken.    Continuous integration also reduces the possibility of reaching the end of a project to discover a fundamental issue with the software developed.

Any downsides?

Some of the Agile practices would have a detrimental impact on the development of safety critical systems and need to be discarded.

Firstly, documentation is a crucial element of critical systems development.  This means that Agile’s focus on minimal documentation is not a good fit.  The certification system demands that each decision and design is ‘traceable’.  This requires extensive documentation.  As the maintenance of critical systems is almost as important as their initial development, documentation is also an important tool when maintaining critical systems.

There is also the issue that the development of critical systems can take many years and it is likely that staff will move on and new staff will join the development team.  Again, documentation is crucial in dealing with this.

There remains a need to have a large part of the design completed upfront.  This is required in order to give fixed requirements so that certification and safety analysis can be carried out early in the project.  This would mean that a critical system project that utilised Agile methods would still require the dedicated upfront design period to produce architectural models and functional requirements.

Whilst iterations would be possible and add benefits, they would have to be changed slightly from traditional Agile iterations.  Each iteration would need to produce evidence that its output was fundamentally safe for certification purposes.  However, as the iterations in a critical systems project would not contain much, if any, design or analysis work, adding safety certification to the iteration’s acceptance criteria need not impact on their frequency.

Refactoring is another Agile practice that would not fit naturally with the development of critical systems.  If code is refactored on a critical system, it has the potential to invalidate previous certification or security analysis.  This would cause extensive rework and would need to be avoided, whenever possible.

Conclusion

Overall, despite the added complexity and demands of a critical system development, it appears that Agile methods could be adopted successfully by the industry.  If chosen carefully, the benefits would be tangible and, despite the concerns, actually increase the chance of developing a stable and useful critical system.  The real challenge is in choosing the practices to adopt and creating an environment within the organisation that ensures the successful integration of these practices.

Bibliography

Sommerville, I.  (2007).  Software Engineering.  Harlow, England: Addison Wesley

Ge, X., Paige, R. F., and McDermid, J. A. (2010). An iterative approach for development of safety-critical software and safety arguments. In Proceedings of the 2010 Agile Conference, AGILE ’10, pages 35–43, Washington, DC, USA. IEEE Computer Society.

Sidky, A. and Arthur, J. (2007). Determining the applicability of agile practices to mission and life-critical systems. In Proceedings of the 31st IEEE Software Engineering Workshop, SEW ’07, pages 3–12, Washington, DC, USA. IEEE Computer Society.

Heimdahl, M. P. E. (2007). Safety and software intensive systems: Challenges old and new. In 2007 Future of Software Engineering, FOSE ’07, pages 137–152, Washington, DC, USA. IEEE Computer Society.

Lindvall, M., Basili, V. R., Boehm, B. W., Costa, P., Dangle, K., Shull, F., Tesoriero, R., Williams, L. A., and Zelkowitz, M. V. (2002). Empirical findings in agile methods. In Proceedings of the Second XP Universe and First Agile Universe Conference on Extreme Programming and Agile Methods – XP/Agile Universe 2002, pages 197–207, London, UK, UK. Springer-Verlag.

Cawley, O., Wang, X., and Richardson, I. (2010). Lean/agile software development methodologies in regulated environments – state of the art. In Abrahamsson, P. and Oza, N. V., editors, LESS, volume 65 of Lecture Notes in Business Information Processing, pages 31–36. Springer.

Douglass, B.P., and Ekas, L. (2012). Adopting agile methods for safety-critical systems development.  IBM. http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=RAW14313USEN

Turk, D., France, R., and Rumpe, B. (2002). Limitations of agile soft- ware processes. In Proceedings of the Third International Conference on Extreme Programming and Flexible Processes in Software Engineering (XP2002), pages 43–46. Springer-Verlag.