Chose the right Software Architecture if you want your Open Source project to succeed.

While open source is by definition different from closed-source projects in terms of governance, I believe that the right architecture needs to be chosen if an open-source project is to receive uptake by users and long-term contributions from these same users. The architecture needs to be as modular as possible, and the developers must be able to take different approaches to solve the problems that each new feature aims to solve.

Why is software architecture so important to Open-Source collaborative projects?

While in proprietary software, the developers and users are mostly separate sets of people (except developers of development tools maybe), in Open-Source software the main driving force behind the project is that the users can themselves further develop functionality in the software and contribute it so that other users can benefit from the work. Since now users and the developers may be one and the same, the architecture that the software uses is of great interest to the user. This idea is also presented by Eric Raymond in his 6th rule [1]:

6.Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.

If the software that is being used is very modular in design, it should be easier for a user to add her/his required functionality to the program, without having to understand the rest of the code other than how his new feature will interact with the rest of the software.

Another important aspect in my opinion is how easily the feature can be implemented in a way that is independent from the rest of the software. That is, how free is the new contributor to choose whatever solution is suitable for him to implement the feature, keeping within the design rules set out initially?
For a new user of the software that finds a missing feature, would it be easier for him to add the functionality he requires, or would it be better for him to use another solution, or develop the solution from scratch? The learning curve needs to be kept minimal for the user to actually contribute. It may also be that open source projects will be competing between them when they have similar uses and audiences.
These are the same factors that Baldwin and Clark [2] use in their attempt to use game theory to show that architecture is important in limiting the free riding of open source software, and how architectural decisions help promote the contribution by users. According to the model, having a modular and option value (ability to implement new feature in the easiest way for the developer) is optimal in allowing exchange of valuable work among the developers. And when the resources for contributing to open source software are already limited [3], we really want to minimize the hurdles that such developers encounter to enable them to easily reuse or contribute new code.

Do developers chose this architecture beforehand, or does it come about because of the nature of the collaborative software?

Although the architecture may be crucial, most open source projects that are based around collaborative input by many voluntary developers usually do not start out with the final scale in mind. Moreover, since many open-source projects use agile development methodologies, having a clearly defined architecture from the start is not the norm. However, the argument here is that to facilitate contribution, the modularity of the code should be kept as a high priority to keep the project going more easily. This may come at a cost of increase workload, since refactoring continually will increase workload, and may also make it difficult to keep track of all the modules and how they interact. However, in their research [4], Capra et al. show that the good quality present in open source projects is not just a coincidence. They argue that design quality is both an enabler for the type of governance present in such projects, where everyone can easily collaborate and contribute, as well as being an effect of it. Since the contributors have a sense of ownership of the code, the design quality is given greater importance in the work, possibly at the cost of prolonging the time between releases.

Is this any different from what proprietary software companies try to do?

Within closed-source projects, the design is always kept internal, and the company knows that it has a number of developers that it can commit to the project, whether the developers like it or not. This means that there is less risk of the project being abandoned because it is too hard to contribute to, and the architecture will not influence the number of developers that are contributing to it. However, such companies may take a leaf out of this book, because developers do have feelings and opinions, and making their life easier can only help to improve the quality of the code that they generate for the company. We also know that once a new developer is assigned to a new project, there is a learning curve associated, which can be greatly reduced by having a modular architecture, where the developer does not have to learn how the whole code base works before they can start contributing to the project. This would cut down the cost of personnel moving between projects. It would also make the software malleable, so that new features can be added more easily, and code reuse is also much easier.

This difference in architectural choices is shown in the work of MacCormack et al. [5], where they compare the source code of Linux with that of Mozilla when it was first published in the public domain. The lack of modularity in the Mozilla code is clear, and in fact the Mozilla project underwent a lot of initial work to be modularized, so that it could function better in the open source world.

Interestingly though, all this is arguably contradicted by the empirical study carried out by Paulson et al. [6], where they claim that the results do not support the claim that open-source software is more modular than closed-source. However, the metric they chose to use to verify this hypothesis was correlation between the number of new functions introduced to the number of changes to previous functions. They claim that the number of changes to older functions increased with the number of new functions added, implying that the introduction of new functions may be requiring changes to other functions, indicating that the code is not as modular as claimed. I feel that this is a very simplistic metric, since it simply assumes that in a good modular system functions are only added and no refactoring is required. I believe modularity should be measured by the dependencies between modules, rather than the changes being carried out. Another contributing factor to this result, which the authors point out themselves, is that the projects chosen were mature projects (namely Linux, GCC and Apache) so maintenance may have contributed to the high amount of modifications to functions. Given all this, I still believe that the argument for modularity in open-source projects holds, since we can clearly saw the benefits of such a system, even if they were not currently used as Paulson et al. claim.

So What?

So in conclusion, when we stop to think what architecture the software project that we are about to start should have, whether the project is an open source project or not should be a major factor to consider. Assuming that we do want the project to not be abandoned if the lead developer (us) happens to take a break.

References

[1] E. Raymond, “The cathedral and the bazaar,” Knowledge, Technology & Policy, vol. 12, no. 3, pp. 23–49, 1999.
[2] C. Y. Baldwin and K. B. Clark, “The architecture of participation: Does code architecture mitigate free riding in the open source development model?” Management Science, vol. 52, no. 7, pp. 1116–1127, 2006.
[3] S. Haefliger, G. Von Krogh, and S. Spaeth, “Code reuse in open source software,” Management Science, vol. 54, no. 1, pp. 180–193, 2008.
[4] E. Capra, C. Francalanci, and F. Merlo, “An empirical study on the relation-ship between software design quality, development effort and governance in open source projects,” Software Engineering, IEEE Transactions on, vol. 34, no. 6, pp. 765–782, 2008.
[5] A. MacCormack, J. Rusnak, and C. Y. Baldwin, “Exploring the structure of complex software designs: An empirical study of open source and proprietary code,” Management Science, vol. 52, no. 7, pp. 1015–1030, 2006.
[6] J. W. Paulson, G. Succi, and A. Eberlein, “An empirical study of open-source and closed-source software products,” Software Engineering, IEEE Transactions on, vol. 30, no. 4, pp. 46–256, 2004.