Real-Time Collaborative Programming in Software Business?

Introduction

Real-time collaborative programming describes a way of concurrent programming, where all developers in a software project can edit the source code at the same time, propagating the changes in real time. As this can be a very interesting approach for improving the collaboration between developers, it might be worth implementing such a technology into a software project in the industry. Even though there also exists an implementation of this technique for the programming language SmallTalk [1], in this blog post the Web-based collaborative Java IDE “Collabode” [2] is analysed on its usefulness in software business.

While there are a few advantages of this form of real-time collaboration in software engineering, such as improved pair-programming, the disadvantages outweigh them. As the main disadvantage is not being able to use a Version Control System (VCS) in a normal way, also the benefits of using a VCS are lost. In summary, the approach considered has too much drawbacks to use it in software business.

ui-codecomplete

Screenshot of Collabode, Source: [2]

Overview of Collabode

In 2011, Goldman et al published the web-based Java IDE Collabode for real time collaboration in the browser, based on Eclipse. It makes concurrent editing of the source code files for an arbitrary number of developers possible. [3] The behaviour of the system is thereby similar to Google Docs, which provides real-time concurrent document editing.

Changes in the source code are propagated to the other developers as soon as possible, but with a certain error-awareness, so that a change is propagated immediately once it does not cause compiler errors. If a change results in errors, the change is kept in a local working copy, so the engineers do not break their common build. [4]

The underlying multi-user editor is Etherpad [5]. [4] It supports simple versioning and version access of single files, but other functions of a modern VCS are not included.

Benefits

According to [3], developers can benefit from using this approach in the following ways.

Firstly, it is possible to ask colleagues for help for small code snippets easily (“micro outsourcing”. Secondly, it enables Pair Programming on different machines. Finally, this approach improves teaching possibilities. [3]

In addition to these points, such a collaborative system can be particularly helpful when developers are not in the same place. So Pair Programming can be achieved even if the colleagues are for example in different countries.

Furthermore, such an approach is improving the ability to teach new colleagues developing the product. As the new developer can work on his own computer while collaborating with a senior team partner, he arguably gets more involved into the learning process. Moreover, this is also more convenient for the senior developer and may therefore reduce a possible unwillingness to teach, as he is able to work from his own computer while teaching.

Finally, with Collabode a developer does not need to manually commit to a Version Control System, which saves time. [4]

Minor Drawbacks

However, there are several disadvantages of using this real-time collaborative programming tool:

Firstly, even though Collabode uses Eclipse as a basis, not the full Eclipse IDE is implemented in the browser-based user interface. [4] Therefore, some useful IDE features cannot be used with this approach.

Secondly, more communication is necessary between the developers. Considering the case that developer A is implementing something that uses a class that is being refactored by developer B in the same time, A might get unexplainable results. Therefore practically every developer in team must know what the others are doing at the moment. This communication overhead can lead to less efficiency.

Thirdly, the approach is not tested in large-scale projects, so there is still some research to be done here to be sure the system works in a large software project.

Main Disadvantage: VCS limitations

The main drawback of using this approach is that the normal use of a Version Control System is not possible. This section is focused on the most important reasons for using a VCS.

According to Somasundaram ( [6] ) the main advantages of using a Version Control System are the opportunity to keep different versions of the source code and the possibility to roll back to a previous state as a failsafe plan. [6]

Concerning the first point, restoring previous versions of single files is possible due to the functionalities of EtherPad. However, when changing multiple files in one change, it is not traceable that these individual changes belong to one logical union like when committing manually. It is therefore difficult to figure out a consistent state to roll back to. Furthermore, the lack of commit comments makes such an operation even more difficult. So, both main advantages of a Version Control System mentioned above cannot be used in a practical way in Collabode.

Moreover, since there are no commit comments, an important source of information is missing, for example which bug the change refers to.

Finally, this way of micro-committing everything automatically makes Continuous Integration very difficult. It is much harder to find a commit that breaks a test when there are this much uncommented commits.

Summary

While the real-time collaborative programming approach like implemented in Collabode is certainly suitable for teaching and pair programming, its usefulness in software industry is very limited. This is mainly the case because it circumvents the principles of committing to a Version Control System. If a company wants to benefit from the advantages of this new way of programming, it loses the much more important main advantages of using a normal Version Control System. Therefore I do not recommend using such an approach in a normal business software development process.

However, if a company focuses more on the improved collaboration and less on the benefits of using a VCS, using Collabode can be taken into consideration. Also a partial use of this system for training sessions in the industry can be advantageous.

References

[1] J. Jordan, “Wolf Pack Programming? | Cincom Smalltalk,” 2010. [Online]. Available: http://www.cincomsmalltalk.com/main/2010/10/wolf_pack_programming/ [Accessed 3 3 2014].
[2] “Collabode – Collaborative Coding in the Browser,” 2012. [Online]. Available: http://groups.csail.mit.edu/uid/collabode/ [Accessed 3 3 2014].
[3] M. Goldman, G. Little and R. C. Miller, “Collabode: Collaborative Coding in the Browser,” in Proceeding of the 4th International Workshop on Cooperative and Human Aspects of Software Engineering, New York, 2011.
[4] M. Goldman, G. Little and R. C. Miller, “Real-time collaborative coding in a web IDE,” in Proceedings of the 24th annual ACM Symposium on User Interface Software and Technology, New York, 2011.
[5] “Etherpad,” 2014. [Online]. Available: http://etherpad.org/ [Accessed 7 3 2014].
[6] R. Somasundaram, Git, Birmingham: Packt Pub., 2013

 

 

 

Continuous Integration: Software Quality vs Resources

This is a response article to “Why don’t we use Continuous Integration?” [1] by user s1367762.

Introduction

In his post [1], the author describes his own working experience in a small start-up company with mostly one developer at a time working on different small projects. As not all projects were using Continuous Integration, the author states a variety of reasons why not every software engineering company and project use it so far, based on his own experience.

This post discusses these various arguments that may prevent a company of implementing Continuous Integration for a software project. Generally, it is always a trade-off between software quality and resources.

Commits Frequency

The first point is that it may be difficult to make small incremental commits, when a change requires more time to make the whole software work again eventually. Commits in between would therefore break the build and the whole Continuous Integration pipeline.

I absolutely agree with that argument. When for example a new module is developed, at first the whole software may be broken. Not committing to the main line at least daily contradicts the basic rules of Continuous Integration; see for example [2] by Martin Fowler.

However, is that really a problem? From my personal business experience, it is very common and easy to implement a new feature in a new version control system branch, often referred to as a feature branch. This branch may have own Continuous Integration processes to ensure the quality, but in the first place this does not break the main build. When the feature is in a stable condition, the branch can be merged into the main product and be part of the general CI process. Martin Fowler’s blog post [3] describes this process in more detail.

Mainline Only

The second point mentioned by s1367762, is that there may be code that is not really part of the software project, for example used only by a few customers for very special use cases. Therefore, it does not make sense to commit this code to the main line as suggested by Continuous Integration.

I absolutely understand this point. However, if there is some code that is not really part of the product, there is no need for Continuous Integration for these few special modules. From my point of view, CI can be implemented also when ignoring such modules.

Automated Tests

I absolutely agree on this point, especially when dealing with GUI components, automating Tests is time-consuming and difficult. Furthermore, without having good code coverage Continuous Integration is less effective. However, it is better than no Continuous Integration at all. Also, this is clearly a trade-off between saving time not automating tests and final software quality.

Appropriate Timing, Direct Costs and Project ROI

In these three points the author states that it is more difficult to implement CI into an existing project that started without it. He furthermore describes the costs of learning to implement CI and operating build and test machines as expensive. Finally, he contends that implementing Continuous Integration is not worth the effort for short term project without long term maintenance.

All these points are completely understandable. To my mind, they all lead to one question for the project manager: How important is the quality of my project? If it is not a major requirement, for example if the software is being used only for a short period of time, Continuous Integration is not worth implementing.

Summary

In summary, s1367762 demonstrates well why Continuous Integration is not always a good idea in software projects. However, especially for the first point regarding commits frequency, it is easy to work around it by using feature branches without completely losing the idea of Continuous Integration. Furthermore, if there are modules that do not really belong to the project, they can be easily ignored for the CI approach. From my point of view, a partly implemented CI is much better than no CI at all.

Finally, everything depends on the management decision if maintaining quality by investing time and money is wanted for a project. The company I worked for never questioned the importance of quality, so Continuous Integration was implemented in sophisticated detail. However, if quality is not a major point in some projects, as s1367762 describes according to his business experience, it is absolutely reasonable not to implement Continuous Integration for some projects.

References

[1] s1367762, “Why don’t we use Continuous Integration? | SAPM: Course Blog,” [Online]. Available: http://blog.inf.ed.ac.uk/sapm/2014/02/14/why-dont-we-use-continuous-integration/ . [Accessed 27 2 2014].
[2] M. Fowler, “Continuous Integration,” 2006. [Online]. Available: http://www.martinfowler.com/articles/continuousIntegration.html . [Accessed 27 2 2014].
[3] M. Fowler, “FeatureBranch,” 2009. [Online]. Available: http://martinfowler.com/bliki/FeatureBranch.html . [Accessed 27 2 2014].

 

 

 

Why don’t we use Continuous Integration?

1. Introduction:

While reading some articles or blog posts ([1], [2], [3]) on Continuous Integration it’s easy to come to the conclusion that this approach ensures your code quality, reduces risks and sets programmers minds at ease. It is not dependant on any particular software development methodology. If it is so, I thought that everybody should use it for the projects they carry out. However, it is not a case – there are many companies that still haven’t started development with CI and I question myself “why?”. This article aims to answer that question.

2. Continuous Integration itself:

Continuous Integration [1] is a software development practice which assumes constant integration of code under development. Developers check-ins to the integration server should be made few times a day. Process of code submission is not done until entire compilation, tests and inspection is carried out, proving no problems. Otherwise instant measures should be taken by the developer, to fix any issues. Most important outcome is that there is a great probability that software is functional and the newest version is available to whoever wants it.

3. Background:

Some of the issues that I raise in this article are backed up with examples taken from my experience in a company I worked for. Therefore a brief introduction to it is necessary. The firm is a small start-up project, which manufactures home automation system devices. The software developed there is to be considered as long-term projects and large when compared to the number of people working on them. Main software developed there is:

  • low level programs for field devices (sensors and actuators);
  • applications for controllers running embedded (mainly Linux) operating systems or for mobile devices. This software was the biggest one, constantly extended with newer modules;
  • installation configuration system for Desktop computers.

When I joined that company, there was only one experienced programmer working for it. After some time, he left leaving me as the only permanent one, with others being partially hired for some tasks, when there was a need for that. Don’t take the impression of the company, as working in terms of anti-patterns – I mention here its bad aspects only, since the article requires so. I believe there are many companies like this one, especially small ones, that work the same way.

4. Why CI has not been implemented yet:

Although many of its advantages, CI doesn’t make its way to all the projects being developed. Most people seem to know its advantages and the outcome it gives to the project. It sounds like a panacea to many of the issues concerning software development – it enforces programming good practices and forces developers to be consequent. However, many aspects such as human state of mind of both managers and programmers, developers’ habits and abilities or the company targets (e.g. money first, quality later) stop CI from being the part of their projects.

4.1. Commits frequency:

First of all, I find that basic assumption of CI could be problematic in some circumstances – very frequent code check-outs. To realize it, programmer commits are to be small what poses a difficulty of dividing work into really tiny chunks [2]. Even though developers tend to modularize and divide their work into manageable pieces, it’s often hard to have such granularity of modifications that each is ready in few hours. Every feature being developed requires full life-cycle consisting of consideration, planning and implementation. Especially, if the feature is to be tested before check-in, it’s problematic to reduce its development time.

Moreover, incremental committing just doesn’t work sometimes. This could happen, for instance, if software architecture was not well thought. That kind of issue may be even more frequent in Agile development. Let’s say, one needs to modify part of the code, to adapt it to some changed requirements. If multiple modules are dependent on that change, the tests may fail if the modification is not made in its entirety. Thus bigger change is to be made and it could take more time, before the program works again in as a whole. Then, in order not to break the build, the commits may not be just checked into the main repository for even few day, whatever is needed.

The above example may be a little exaggerated but there is even more to it – human mentality. Programmers, who start working on project parts in the beginning of the day, naturally tend to first analyse the problem and after solution is chosen, implement that strategy. That means most of the commits could occur in the end of the day, which makes integration even more troublesome – they have to get into the queue to the integration server and that takes time.

Increasing commits to the high frequency expected by CI requires significant mind shift. Thus, if there is no mentor, who may help in such process (e.g. experienced manager or some senior developer, who has done CI before) this transition may become a barrier not to be crossed.

4.2. Mainline only:

Another aspect, which seems counter-intuitive and may result in turning CI down, is requirement to keep every code in a mainline. This seems even more insane, when CI assumes that broken build gets highest developer’s priority. The problem here is that it is not uncommon that programmers work on some feature in their spare time, between other, more important works. That feature may not be usable or functional for a longer period, and thus incorporating it would very probably break the build.

In my company, there were many projects (mostly extensions to the controlling projects), that were started because there was a need for such functionality. However, there was no need for them to be completely finished or polished (e.g. easily configurable as the rest of the software), because they were meant to be used just in a few instances. Once proof-of-concept was working, there was no additional time devoted for finishing them and making a cohesive part of the entire project. The reason for this was, of course, limitation of resources. If the need occurred, they were meant to be continued, but until then there they had no appropriate tests and they were not acting as a cohesive part of the project. Thus they were held as a separate branches, which according to [1] is against the CI idea.

The main reason why some developed modules should not be on the mainline is lack of resources devoted by managers to it. If it is not finished or tests are not complete, and other duties have higher priority, committing it to the main code is not what CI expects.

4.3. Automate your tests:

Creating automated tests for every part of the project, when introducing CI, may impose another difficulty. As mentioned, automated tests are very important part of the CI process and they should cover the code pretty well to reduce the risk of potential bugs [4]. Even though it’s a common practice to write tests, some of them are very difficult to automate.

In our team we didn’t have people who were specialized in tests. This led to situation that testing of more complex modules like those responsible for configuration generation (writing configuration validation was as very difficult) or some GUI components was very challenging and time consuming. Given limited number of developers and time, some of them were carried out manually or by the group of people who used applications (like beta-testers). In the second case, if some bug was detected, first its harmfulness was estimated and then fix priority set. So the “small” bug might be there left for appropriate time to come, which is again against the CI ideas.

One may argue, that difficulty of automated tests writing is not a point against Continuous Integration. CI may still be used, just with small coverage or no code tested at all. However, in such a case it is not that efficient (only compilation correctness is proved) and convincing the manager of CI usability is much more difficult. So, when decision about taking the track of CI is to be made, there are less advantages to speak out.

4.4. Appropriate timing:

One important aspect that should be addressed is the time in the project lice cycle, when CI is to be introduced. It is much easier to have the project running with CI, if it was started with CI in mind. Then, all the components of the project (such as tests, scripts, even its architecture) are prepared so that it’s compatible with the idea and tools used for CI.

It is not easy, though, if the project is already ongoing (this could be the case of long-term projects). In this case it could require too significant modifications or time-consuming additions to be made, to consider CI as a good solution (of course this approach is short-sighted). In such situations, it should be estimated, what are the advantages and costs of switching to CI – the latter ones could be too high, so even if there is a team will, managers don’t agree.

4.5. Direct costs:

Not to be forgotten obstacle of introducing CI in the company may be the cost of integration servers and time cost, required to run these machines [5]. There are many open source projects such as Cruise Control [6] that could be used for that purpose. However, there is still the need for some training (especially, when no one in the team has done that before). The hardware cost depends on the type of project being tested. If it’s small and tests are not resources consuming, some old computer may be used [7]. Otherwise, newer machine should be used. In cases of more complicated processes, where build time exceeds 10 minutes limit [1], the build should be divided into few steps over multiple computers. Again, this may pose significant costs, especially for smaller companies. Larger ones, probably won’t see that as a drawback, since in comparison to money they pay as salary to the software engineers is much more insignificant cost.

4.6. Project ROI:

Last thing, not connected to large projects, but worth mentioning while CI is discussed is return of investment [5]. In case of small programs with short life cycle period, especially for those that long term maintenance is not assumed, the burden of setting things up could be too high.

5. Summary:

In my opinion, presented issues are the most important ones that prevent software developer teams from  introducing Continuous Integration. It is worth for a team considering using CI to detect if these apply to them. If so, they should not be discouraged but try more step-by-step approach. Continuous Integration has significant advantages which are especially obvious for projects where more than one developer work on. It is worth the effort in a longer run, since it reduces integration risks, gives better overview of the project state, encourage developers to work proactively and not to procrastinate. It is a really good way to achieve high quality product just by sticking to some general rules.

6. Bibliography:

[1] http://www.martinfowler.com/articles/continuousIntegration.html
[2] http://www.jrothman.com/blog/mpd/2011/12/is-the-cost-of-continuous-integration-worth-the-value-on-your-program-part-1.html (also 2nd and 3rd parts);
[3] Paul Duvall, Stephen M. Matyas, and Andrew Glover. 2007. Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series). Addison-Wesley Professional
[4] http://www.ibm.com/developerworks/java/library/j-ap11297/index.html
[5] http://stackoverflow.com/questions/214695/what-are-some-arguments-against-using-continuous-integration
[6] http://cruisecontrol.sourceforge.net/
[7] http://www.jamesshore.com/Blog/Continuous-Integration-on-a-Dollar-a-Day.html
[8] http://callport.blogspot.co.uk/2009/02/continuous-integration-should-we-always.html

The Anaemic Domain Model is no Anti-Pattern, it’s a SOLID design

Design Patterns, Anti-Patterns and the Anaemic Domain Model

In the context of Object-Oriented software engineering, a “design pattern” describes a frequently recurring and effective solution to a commonly encountered problem. The utility of formalising and sharing design patterns is to provide a set of “battle-tested” designs as idiomatic solutions for classes of problems, and to increase the shared vocabulary among software engineers working on collaboratively developed software. The term was coined in the seminal book by Gamma et al [5], which named and described a set of common design patterns. The lexicon of design patterns grew from the initial set specified in the book, as the notion gained in popularity [6], [17].

Following the increasing popularity of design patterns as a concept, the idea of “design anti-patterns” entered popular discourse [7][8]. As implied by the name, an anti-pattern is the opposite of a pattern; while it too describes a recurring solution to a commonly encountered problem, the solution is typically dysfunctional or ineffective, and has negative impacts on the “health” of the software (in terms of maintainability, extensibility, robustness, etc.). Anti-patterns serve a similar purpose to patterns; the description of the anti-pattern might illustrate a typical implementation of the anti-pattern, explain the context it generally occurs in, and show how the implementation results in problems for the software.

A potential problem with the concept of a design anti-pattern is that it might discourage critical thought about the applicability of the pattern. A design that may be inappropriate in some contexts may be a sensible decision in others; a solution might be discarded after being recognised as an anti-pattern, even though it would be a good fit for the problem at hand.

I contend that such an anti-pattern is the Anaemic Domain Model (ADM), described by Martin Fowler [1] and Eric Evans [2]. The ADM is considered by these authors as a failure to model a solution in an Object-Oriented manner, instead relying on a procedural design to express business logic. This approach is contrasted with the Rich Domain Model (RDM) [1], [20], in which classes representing domain entities encapsulate all business logic and data. While the ADM may certainly be a poor design choice in some systems, it is not obvious that this is the case for all systems. In this blog post I will consider the arguments against the ADM, and contend that in some scenarios, the ADM appears be an reasonable choice of design, in terms of adherence to the SOLID principles of Object-Oriented design, established by Robert Martin [3], [4]. The SOLID principles are guidelines which seek to balance implementation simplicity, scalability, and robustness. Specifically, by contrasting an ADM design with an RDM design for a hypothetical problem, I will attempt to show that ADM is a better fit for the SOLID principles than the RDM solution. By doing so, I hope to demonstrate a contradiction in the received wisdom with regard to this anti-pattern, and consequently suggest that implementing an ADM is a viable architectural decision.

Why is the Anaemic Domain model considered by some to be an Anti-Pattern?

Fowler [1] and Evans [2] describe an ADM as consisting of a set of  behaviour-free classes containing business data required to model the domain. These classes typically contain little or no validation of the data as conforming to business rules; instead, business logic is implemented by a domain service layer. The domain service layer consists of a set of types and functions which process the domain models as dictated by business rules. The argument against this approach is that the data and methods are divorced, violating a fundamental principle of Object-Oriented design by removing the capability of the domain model to enforce its own invariants. In contrast, while an RDM consists of the same set of types containing necessary business data, the domain logic is also entirely resident on these domain entities, expressed as methods. The RDM then aligns well with the related concepts of encapsulation and information hiding; as Michael L. Scott states in [9], “Encapsulation mechanisms enable the programmer to group data and the subroutines that operate on them together in one place, and to hide irrelevant details from the users of an abstraction”.

In an RDM, the domain service layer is either extremely thin or non-existent [20], and all domain rules are implemented via domain models. The contention is that domain entities in a RDM are then entirely capable of enforcing their invariants, and therefore the system is sound from an Object-Oriented design perspective.

However, the capability of a domain entity to enforce local data constraints is only a single property in a set of desirable qualities in a system; while the ADM sacrifices this ability at the granularity of the individual domain entities, it does so in exchange for greater potential flexibility and maintainability of the overall implementation by allowing the domain logic to be implemented in dedicated classes (and exposed via interfaces). These benefits are particularly significant in statically typed languages such as Java and C# (where class behaviour cannot simply be modified at run-time) for improving the testability of the system by introducing “seams” [10], [11] to remove inappropriate coupling.

A Simple Example

Consider the back end of an e-commerce website in which a customer may purchase items, and offer items for sale to other customers across the globe. Purchasing an item reduces the purchaser’s funds. Consider the implementation of how a customer places a purchase order for an item. The domain rules state that the customer can only place an order if they have enough funds, and the item must be available in region for that customer. In an RDM, a Customer class would represent the domain entity for the customer; it would encapsulate all the attributes for the customer, and present a method such as PurchaseItem(Item item). Like Customer, Item and Order are domain models representing purchasable items and customer orders for items respectively. The implementation of the Customer (in pseudo-C#) might be something like;

/*** BEGIN RDM CODE ***/

class Customer : DomainEntity // Base class providing CRUD operations
{
    // Private data declared here

    public bool IsItemPurchasable(Item item) 
    {
        bool shippable = item.ShipsToRegion(this.Region);
        return this.Funds >= item.Cost && shippable;
    }

    public void PurchaseItem(Item item)
    {
        if(IsItemPurchasable(item))
        {
            Order order = new Order(this, item);
            order.Update();
            this.Funds -= item.Cost;
            this.Update();
        }
    }
}

/*** END RDM CODE ***/

The domain entities here implement the Active Record pattern [17], exposing Create/Read/Update/Delete methods (from a framework/base class) to modify records in the persistence layer (e.g., a database). It can be assumed that the PurchaseItem method is invoked in the context of some externally managed persistence layer transaction (perhaps initiated by the HTTP request handler/controller, which has extracted a Customer and an Item from the request data). The role of the Customer domain entity in this RDM is then to model the business data, implement the business logic operating on that data, construct Order objects for purchases, and interface with the persistence layer via the Active Record methods; the model is Croesus-like in its richness, even in this trivial use case.

The following example demonstrates how the same functionality might be expressed using an ADM, in the same hypothetical context;

/*** BEGIN ADM CODE ***/

class Customer { /* Some public properties */ }
class Item { /* Some public properties */ }

class IsItemPurchasableService : IIsItemPurchasableService
{
    IItemShippingRegionService shipsToRegionService;

    public bool IsItemPurchasable(Customer customer, Item item)
    {
        bool shippable = shipsToRegionService.ShipsToRegion(item);
        return customer.Funds >= item.Cost && shippable;
    }
}

class PurchaseService : IPurchaseService
{
    ICustomerRepository customers;
    IOrderFactory orderFactory;
    IOrderRepository orders;
    IIsItemPurchasableService isItemPurchasableService;

    // constructor initialises references

    public void PurchaseItem(Customer customer, Item item)
    {
        if(isItemPurchasableService.IsItemPurchasable(customer, item))
        {
            Order order = orderFactory.CreateOrder(customer, item);
            orders.Insert(order);
            customer.Balance -= item.Cost;
            customers.Update(customer);
        }
    }
}

/*** END ADM CODE ***/

Contrasting the example with respect to the SOLID Principles

At first glance, the ADM is arguably worse than the RDM; there are more classes involved, and the logic is spread out over two domain services (IPurchaseService and IItemPurchasableService) and a set of application services (IOrderFactory, ICustomerRepository and IOrderRepository) rather than resident in the domain model. The domain model classes no longer have behaviour, but instead just model the business data and allow unconstrained mutation (and therefore lose the ability to enforce their invariants!). Given these apparent weaknesses, how can this architecture possibly be better than the altogether more Object-Orientation compliant RDM?

The reason that the Anaemic Domain Model is the superior choice for this use case follows from consideration of the SOLID principles, and their application to both of the architectures [12]. The ‘S’ refers to the Single Responsibility Principle [13], which suggests that a class should do one thing, and do it well, i.e., a class should implement a single abstraction. The ‘O’ refers to the Open/Closed Principle [14], a similar but subtly different notion that a class should be “open for extension, but closed for modification”; this means that, in so far as possible, classes should be written such that their implementation will not have to change, and that the impact of changes is minimised.

Superficially, the Customer class in the RDM appears to represent the single abstraction of a customer in the domain, but in reality this class is responsible for many things. The customer class models the business data and the business logic as a single abstraction, even though the logic tends to change with higher frequency that the data. The customer also constructs and initialises Order objects as a purchase is made, and contains the domain logic to determine if a customer can make an order. By providing CRUD operations through a base class, the customer domain entity is also bound to the persistence model supported by this base implementation. By enumerating these responsibilities it is clear that even in this trivial example, the RDM Customer entity exhibits a poor separation of concerns.

The ADM, on the other hand, decomposes responsibilities such that each component presents a single abstraction. The domain data is modelled in “plain-old” language data structures [18], while the domain rules and infrastructural concerns (such as persistence and object construction) are encapsulated in their own services (and presented via abstract interfaces). As a consequence, coupling is reduced.

Contrasting the flexibility of the RDM and ADM architectures

Consider scenarios in which the RDM Customer class would have to be modified; a new field might be introduced (or the type of an existing field may need changed), or the Order constructor may require an additional argument, or the domain logic for purchasing an item may become more complex, or an alternative underlying persistence mechanism might be required which is unsupported by the hypothetical DomainEntity base class.

Alternatively, consider scenarios in which the ADM types must change. The domain entities which are responsible for modelling the business data will only need modified in response to a requirements change for the business data. If the domain rules determining if an item is purchasable become more complex (e.g., an item is specified to only be sold to a customer above a certain “trust rating” threshold), only the implementation of IsItemPurchasableService must change, while in the RDM the Customer class would require changing to reflect this complexity. Should the ADM persistence requirements change, different implementations of the repository [17], [19] interfaces can be provided to the PurchaseService by the higher-level application services without requiring any changes whereas in the RDM, a base class change would impact all derived domain entities. Should the Order constructor require another argument, the IOrderFactory [5] implementation may be able to accommodate this change without any impact on the PurchaseService. In the ADM each class has a single responsibility and will only require modification if the specific domain rules (or infrastructural requirements) which concern the class are changed.

Now consider a new business requirement was added to support refunds for purchases with which a customer is unsatisfied. In the RDM, this might be implemented by adding a RefundItem method to the Customer domain entity, given the simplistic argument that domain logic related to the Customer belongs as a member function of the Customer domain entity. However, refunds are largely unrelated to purchases, for which the Customer domain entity is already responsible, further mixing the concerns of this type. It can be observed that in an RDM, domain entity classes can accumulate loosely related business logic, and grow in complexity. In an ADM, the refund mechanism could be implemented by introducing a RefundService class, solely concerned with the domain logic for processing refunds. This class can depend on the narrow set of abstractions (i.e., interfaces of other domain and infrastructural services) required to implement its single concern. The new RefundService can be invoked at high level (in response to some refund request), and this new domain behaviour has been implemented without impacting any of the existing functionality.

In the example, the ADM solves the problem of bundling unrelated concerns into the same module identified in the RDM by taking advantage of the ‘I’ and ‘D’ in SOLID, namely the Interface Segregation Principle [15] and the Dependency Inversion Principle [16]. These principles state that an interface should present a cohesive set of methods, and that these interfaces should be used to compose the application (i.e., the domain service layer in the ADM). The interface segregation principle tends to result in small narrowly focussed interfaces such as our IItemShippingRegionService and IIsItemPurchasableService, as well as abstract repository interfaces; the dependency inversion principle compels us to depend on these interfaces, to decouple the implementation of a service from the details of the implementation of another.

The Anaemic Domain Model better supports Automated Testing

As well as more flexible and malleable application composition, adoption of these principles allows the ADM to extract the indirect benefits over RDM of simpler automated testing; this is because highly cohesive, loosely coupled components which communicate via abstract interfaces and are composed via dependency injection allow for trivial mocking of dependencies. This means that in the ADM it is simple to construct a scenario in an automated test which might be more complicated to construct in an RDM, so the maintainability of the automated tests is improved; the effect of this is that automated testing has a lower cost, so developers will be more inclined to create and maintain tests. To illustrate this, consider the example above, such that unit tests are to be written for the IsItemPurchasable.

The (current) domain rules for an item being purchasable are that the customer has sufficient funds, and is in a region that the item ships to. Consider writing a test that checks that when a customer has sufficient funds but is not in a shipping region for the item, the item is not purchasable. In the RDM this test might be written by constructing a Customer and an Item, configuring the customer to have more funds than the item costs, and configuring the customer region to be outside the regions the item ships to, and asserting that the return value of customer.IsItemPurchasable(item) is false. However, the IsItemPurchasable method depends on the implementation details of the ShipsToRegion method of the Item domain entity. A change to the domain logic in Item might change the result of the test. This is undesirable, as the test should be exclusively testing the logic of the customer’s IsItemPurchasable method; a separate test should cover the specifics of the item’s ShipsToRegion method. As domain logic is expressed in the domain entity, and the concrete domain entity exposes the interface to the domain logic, implementations are tightly coupled such that the effects of changes cascade, which in turn makes automated tests brittle.

The ADM, on the other hand, expresses the IsItemPurchasable domain logic on a dedicated service, which depends on an abstract interface (the ShipsToRegion method of IItemShippingRegionService). A stubbed, mock implementation of IItemShippingRegionService can be provided for this test, which simply always returns false in the ShipsToRegion method. By decoupling the implementations of the domain logic, each module is isolated from the others and is insulated from changes in the implementation of other modules. The practical benefits of this are that a logic change will likely only result in the breakage of tests which were explicitly asserting on the behaviour which has changed, which can be used to validate expectations about the code.

Refactoring the RDM to apply SOLID tends to result in an ADM

A proponent of the RDM architecture might claim that the hypothetical example provided is not representative of an true RDM. It might be suggested that a well implemented Rich Domain Model would not mix persistence concerns with the domain entity, instead using Data Transfer Objects (DTO’s) [18, 17] to interface with the persistence layer. The inclusion of directly invoking the Order constructor might be viewed as constructing a straw man to attack; of course no domain model implementation would bind itself directly to the constructor of another object, using a factory is just common sense [5]! However, this appears to be an argument for applying the SOLID principles to the application level infrastructural services, and disregarding the SOLID principles for domain design. As the hypothetical RDM is refactored to apply the SOLID principles, more granular domain entities could be broken out; the Customer domain entity might be split into CustomerPurchase and CustomerRefund domain models. However, these new domain models may still depend on atomic domain rules which may change independently without otherwise affecting the domain entity, and might be depended on by multiple domain entities; to avoid duplication and coupling, these domain rules could then be further factored out into their own modules and accessed via an abstract interface. The result is that as the hypothetical RDM is refactored to apply the SOLID principles, the architecture tends towards the ADM!

Conclusion

By exploring the implementation of a straightforward example, we have observed that an Anaemic Domain Model better adheres to the SOLID principles than a Rich Domain Model. The benefits of adherence to the SOLID principles in the context of domain design were considered, in terms of both loose coupling and high cohesion and resulting increased flexibility of the architecture; evidence of this flexibility was that testability was improved by being able to trivially provide stubbed test implementations of dependencies. By considering a how the benefits of adherence to the SOLID principles might be gained in the RDM, the refactoring tended to result in an architecture resembling an ADM. If adherence to the SOLID principles is a property of well engineered Object-Oriented programs, and an ADM adheres better to these principles than an RDM, the ADM cannot be an anti-pattern, and should be considered a viable choice of architecture for domain modelling.

References

[1] Fowler, Martin. Anaemic Domain Model. http://www.martinfowler.com/bliki/AnemicDomainModel.html, 2003.

[2] Evans, Eric. Domain-driven design: tackling complexity in the heart of software. Addison-Wesley Professional, 2004.

[3] Martin, Robert C. The Principles of Object-Oriented Design. http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod, 2005.

[4] Martin, Robert C. Design principles and design patterns. Object Mentor, 2000: 1-34.

[5] Erich, Gamma, et al. Design patterns: elements of reusable object-oriented software. Addison Wesley Publishing Company, 1994.

[6] Wolfgang, Pree. Design patterns for object-oriented software development. Addison-Wesley, 1994.

[7] Rising, Linda. The patterns handbook: techniques, strategies, and applications. Vol. 13. Cambridge University Press, 1998.

[8] Budgen, David. Software design. Pearson Education, 2003.

[9] Scott, Michael L. Programming language pragmatics. Morgan Kaufmann, 2000.

[10] Hevery, Miško. Writing Testable Code. http://googletesting.blogspot.co.uk/2008/08/by-miko-hevery-so-you-decided-to.html, Google Testing Blog, 2008.

[11] Osherove, Roy. The Art of Unit Testing: With Examples in. Net. Manning Publications Co., 2009.

[12] Martin, Robert C. Agile software development: principles, patterns, and practices. Prentice Hall PTR, 2003.

[13] Martin, Robert C. SRP: The Single Responsibility Principle. http://www.objectmentor.com/resources/articles/srp.pdf, Object Mentor, 1996.

[14] Martin, Robert C. The Open-Closed Principle. http://www.objectmentor.com/resources/articles/ocp.pdf, Object Mentor, 1996.

[15] Martin, Robert C. The Interface Segregation Principle. http://www.objectmentor.com/resources/articles/isp.pdf, Object Mentor, 1996.

[16] Martin, Robert C. The Dependency Inversion Principle, http://www.objectmentor.com/resources/articles/dip.pdf, Object Mentor, 1996.

[17] Fowler, Martin. Patterns of enterprise application architecture. Addison-Wesley Longman Publishing Co., Inc., 2002.

[18] Fowler, Martin. Data Transfer Object. http://martinfowler.com/eaaCatalog/dataTransferObject.html, Martin Fowler site, 2002.

[19] Fowler, Martin. Repository. http://martinfowler.com/eaaCatalog/repository.html, Martin Fowler site, 2002.

[20] Fowler, Martin. Domain Model. http://martinfowler.com/eaaCatalog/domainModel.html, Martin Fowler site, 2002.