Introduction
This article is written in response to Functional programming in large-scale project development, in which the author highlights some characteristics that make functional programming suitable and/or beneficial for large scale software development.
Whilst I do agree with most of the points made, I think they are not sufficient to convince people to change. Change requires effort, and that effort is justified by one thing: awareness that the status quo is flawed.
This means that one not only has to convince people that the new approach is correct, but also the what they are currently doing is wrong.
Specifically, to enforce the transition from OOP to functional programming , the point to be made is:
The use of mutable state is dangerous, and should be used very carefully, only when it’s a strict necessity.
State: what is it?
The state of a program (or, better, of a process) refers to the values assumed by its variables at a certain time during the execution of the program itself.
Referential transparency
In functional programming, data structures are immutable by default, hence functions cannot modify the state of the program.
Functions can therefore achieve a property called referential transparency: since the state is immutable, it means that they will always have the same output when a given input is provided.
Of course this is not true for an object method: as it can modify the state of the program, we cannot guarantee that its return value won’t change accordingly.
In fact, it has to change accordingly, as the whole OOP paradigm is based on sending messages to objects as the only means to interact with their state.
OOP + Mutable State = Bugs
When referential transparency does not hold, one is required to know two things in order to predict the behaviour of a program: the input of the program, and state the program is in.
Why is this a problem? Well, the biggest selling point of OOP is that it can manage complexity via encapsulation, specifically by hiding the state of objects, and exposing their behaviour solely via the interface provided by their methods.
However one does need to know about the state of the program to reason about its execution, but the state is:
- Hidden, so one cannot rely on the interface alone, but instead has to inspect the implementation .
- Split across multiple objects, so one has to take into account all the objects that are interacting with each other at any given time.
Of course this can be managed for small projects, but it’s an approach that scales poorly: when there are thousand of objects involved, most of which were written and engineered by other people, reasoning about the code is a nightmare.
That means the predicting the program’s behaviour becomes infeasible, leading to:
- More bugs, as one doesn’t know what the code is doing, is harder to ensure it’s doing it correctly.
- Very (I mean very) difficult debugging, as the execution depends on the state, which could have been changed by any of the methods of any of the objects involved.
Referential transparency avoids most of this problems, and, as an aside, ensures true encapsulation, for the interface provides all the information needed to use a module ( or whatever is the preferred means of abstraction in the functional language of choice).
Parallelism made simple
Another issue with mutable state is consistency: if the correct result of an operation depends on the state, then changing it could lead to failure. When multiple threads are executing concurrently and non-deterministically, ensuring consistency manually through the use of locks becomes a really hard task.
On the other hand, if the state is immutable, then consistency is assured automatically. As the clock speeds are now decreasing to leave way to multicore architectures, there is no coming back: a paradigm that is not naturally oriented towards parallelism and concurrency is doomed for failure.
I am not gonna write further on this topic, as the author makes already a very good case for it in the original article.
Is mutable state evil?
Of course there are situations in which the use of mutable state is necessary, e.g. IO. Functional programming languages do allow the use of mutable state (via monads, impurity etc.) but make sure that it is used only when is really necessary, by enforcing immutability by default.
Conclusions
One might argue that there’s never been a better time for functional programming than now.
As the author correctly states in the section OOP languages shifting towards functional paradigms, several languages are including features like lambdas, first class functions, and even list comprehensions in some cases.
However, I am convinced that this trend will only make incremental improvements to the development of large scale software, as (good) programming languages are not just a bunch of features, but rather a coherent vision of how to write software correctly, and the most important aspect of functional programming, which is the use of immutable state by default, is still being neglected.
In fact, we have seen as mutable state means lack of referential transparency, which, when combined with the OOP approach to encapsulation, makes the code harder to reason about, and therefore more bug-prone and harder to debug.
So in conclusion:
Program imperatively when needed, and functionally when possible.[1]
References
[1] Micheal O. Church, Functional programs rarely rot.
P.S The pun in the title has been used in many articles about functional programming, even though I haven’t used them as an inspiration. What I did use are all the Rich Hickey’s (creator of Clojure, one of my favourite languages) talks, I suggest you check them out at:
http://www.infoq.com/author/Rich-Hickey
in a sense, so what? at some level you have to have mutation: an sql database would not be useful if it could only return a single answer to the same given query over time.
so the real question that should come out of these kinds of posts i this is: where should the mutable state really go? and it is going to have to be big. just because you can make a small local program work ok functionally, and just because there’s distributed map-reduce doesn’t mean everything can be functional in a modern world of big distributed service oriented systems.
maybe alan kay wanted us all to realize that if we require knowing about the whole system at any level in order to debug it then we must be doing it wrong. there’s a school of thought about ood/oop that says it is about cooperating agents sending messages resulting in emergent and flexible behaviour (since it wasn’t functionally decomposed in the first place). that’s a pretty freaking cool trick if you can do it.
presumably some people do it at the distributed level. what are those tricks? are they hacky bad tricks, like having to rely on splunk and timeouts and retries? what would principled methods be to deal with this?
(cf. erlang or E or waterken or clusterken or whatever else is out there.)