Issues and challenges large-scale system development

Introduction

This  article reflects on the issues and challenges large-scale system development face. Software is hard to engineer on a small scale, but at a larger scale, engineering and management tasks are even more difficult. In the context of software product line evolution, the goal of this work is to look at current managing practice, through the lens of Systems Thinking. They develop a System Dynamics model to operationalize the notions examined here and run a variety of experiments representative of real situations, from which we learn some lessons and recommend policies that engineering leaders may use to manage large-scale software development organizations. Since large-scale development is an enormous subject, there are two main problems. First, large software projects are almost universally troubled, and second, all large-scale systems-development projects of almost every kind now involve large amounts of software. Unfortunately, this implies that almost all kinds of large-scale development projects will be troubled unless we can devise a better way to develop the software parts of these large-scale systems. The increasing need for large-scale system-development projects raises many questions and presents a significant challenge to those of us in the development business.

In today’s fast-moving world, successful software companies need to grow continuously in revenue, which translate into growth in headcount, market share, product feature set and product line-up. These also form the basis on which software companies compete.Companies that remain small typically merely cater to a niche in the larger market:

This is success by one measure, as such niches might indeed be quite large, but this is not the kind of development effort we are interested. Here we are specifically looking at what happens in larger efforts, usually multi-year projects employing hundreds or thousands of engineers. This is the environment the majority of my work experience comes from, and even though managers know instinctively that there are limits to growth, limits to how many features one can cram into a single release of the product, that freighting is best avoided, that projects easily spiral out of control if the organization promises more than it can reasonably be expected to deliver, all of these things still frequently do happen.

Emergent Properties of Systems

Perhaps the greatest single problem with large-scale system development concerns what are called the emergent properties of these systems. These are those properties of the entire system that are not embodied in any of the system’s parts. Examples are system security, safety, and performance. While individual components of a system can contribute to safety, security, and performance problems, no component by itself can generally be relied on to make a system safe, secure, or high performing.

The reason that emergent properties are a problem for large-scale systems is related to the way in which we develop these systems. As projects get larger, we structure the overall job into subsystems, then structure the subsystems into products, and refine the products even further into components and possibly even into modules or parts. The objective of this refinement process is to arrive at a series of “bite-sized projects” that development teams or individual developers can design and develop.

This refinement process can be effective as long as the interfaces among the system’s parts are well defined and the parts are sufficiently independent that they can be independently developed. Unfortunately, the nature of emergent properties is that they depend on the cooperative behavior of many, if not all, of a system’s parts. This would not be a problem if the system’s overall design could completely and precisely specify the properties required of all of the components. For large-scale systems, however, this is rarely possible.

While people have always handled big jobs by breaking them into numerous smaller jobs, this can cause problems when the jobs’ parts have inter-dependencies. System performance, for example, has always been a problem, but we have generally been able to overpower it. That is, the raw power of our technology has often provided the desired performance levels even when the system structure contains many inefficiencies and delays.

As the scale of our systems increases, and the emergent properties become increasingly important, we now face two difficult problems. First, the structural complexity of our large organizations makes the development process less efficient. Since large-scale systems are generally developed by large and complex organizations, and since these large organizations generally distribute large projects across multiple organizational units and locations, these large projects tend also to have complex structures. This added complexity both complicates the work and takes added resources and time.

The second problem is that, as the new set of emergent properties becomes more important, we can no longer rely on technology to overpower the design problem. Security, for example, is not something we can solve with a brute-force design. Security problems often result from subtle combinations of conditions that interact to produce an insecure situation. What is worse, these problems are rarely detectable at the module or part levels.

The Tropical Rain Forest

The fundamental problem of scale is illustrated by analogy to the ecological energy balance in a tropical rain forest. In essence, as the forest grows, it develops an increasingly complex structure. As the root system, undergrowth, and canopy grow more complex, it takes an increasing percentage of the ecosystem’s available energy just to sustain the jungle’s complexity. Finally, when this complexity consumes all of the available energy, growth stops.

The implication for both projects and organizations is that, as they grow, their structure gets progressively more complex, and this increasingly complex structure makes it harder and harder for the developers to do productive work. Finally, at some point, the organization gets so big and so complex that the development groups can no longer get their work done in an orderly, timely, and productive way. Since this is a drastic condition, it is important to understand the mechanisms that cause it.

Organizational Growth

In principle, organizations grow because there is more work to do than the current staff can handle. However, this problem is usually more than just a question of volume. As the scale increases, responsibilities are subdivided and issues that could once be handled informally must be handled by specialized groups. So, in scaling up the organization, we subdivide responsibilities into progressively smaller and less meaningful business elements. Tasks that could once be handled informally by the projects themselves are addressed by specialized staffs. Now, each staff has the sole job of ensuring that each project does this one aspect of its job according to the rules. Furthermore, since each staff’s responsibility is far removed from business concerns, normal business-based or marketing-based arguments are rarely effective. The staffs’ seemingly arbitrary goals and procedures must either be obeyed or overruled.

This growth process generally happens almost accidentally. A problem comes up, such as a missed schedule, and management decides that future similar problems must be prevented. So they establish a special procedure and group to concentrate on that one problem. In my case, this was a cost-estimating and planning function that required a plan from every project. Each new special procedure and group is like scar tissue and each added bit of scar tissue contributes to the inflexibility of the organization and makes it harder for the developers to do their work. Example staffs are pricing, scheduling, configuration management, system testing, quality assurance, security, and many others.

One of the most critical aspects of managing large-scale projects is making sure that decisions are properly made. The executive’s responsibility must be to identify the right people to make the decision, insist that the goals used for making the decision be defined and documented, and require that the criteria for the decision be established. While there are far too many technical decisions in large-scale projects for management to require that they all be made in this way, there are a relatively few times when technical decisions are escalated to senior management. However, whenever they are, these decisions are almost certainly technical issues that have become political.

If the executive does not insist that each of these politically tinged technical decisions is properly made, he or she is likely making a very big and possibly fatal mistake. If ever, this is the one time when the executive should insist that the decision be made in the right way. While these decision situations always come up when there is no time and when everybody, including the executive, is in a rush to get on with the job, this is precisely the time when proper decision making is most important. When executives insist that rush decisions be made in the proper way, they are demonstrating their ability to be technical executives. Therefore, the first two ground rules for the proper management of large-scale projects are the following: Insist that all technical decisions be made by the proper technical people.Make sure that, in making these decisions, the technical decision makers thoughtfully evaluate the available alternatives against clearly defined criteria.

Conclusion

In this article I discussed some of the development issues related to large-scale projects. Since this is an enormous subject, I cannot hope to be comprehensive but i discussed few issues.