Can you get a (type) hint?

Introduction

The type system has always been  a major feature in a programming language, to the extent that it is at the base of a primary classification of languages.
In fact, dynamically typed languages are often perceived as scripting languages, very productive and fun to work in but unsuitable for large-scale development.
Statically typed languages, on the other hand, are seen as the natural tool for a large project, at the cost of verbosity and complexity.
This article will explore a third approach, specifically aimed at getting the best of  both worlds:  optional type annotations.

 Dynamic type systems: productivity

A language is said to be dynamically typed if it performs the type checking at runtime.
This means that one does not have to write explicit type signatures, resulting in less code to write and quicker development times.

However, this is not the main reason why dynamically typed languages are more productive. In fact, this misconception comes from comparing Ruby, Python et similia with Java and C++, which require an explicit type signature for every single variable.
However languages whose static type system  performs type inference are just as concise as their dynamic counterparts, as one could easily see by comparing, for example, Python with Haskell.

What really favours productivity is that dynamically typed languages require much less design upfront to produce working software.
In fact the major designing issue with statically typed languages is modelling one’s problem domain in a way that satisfies the constraints imposed by the type system, no matter how expressive it is.

This really hinders productivity when coupled  with the constant, fast paced requirements and design changes that every real world software project undergoes.

Furthermore, some features like reflection, meta-programming and dynamic dispatch are simpler and less cumbersome in a dynamically typed language, as anyone who’s written template meta-programming in C++ probably already knows.

Static type systems: scalability

So, why is everyone insisting on using Java and C++ for large scale software, instead of  using dynamically typed languages?
Well, it turns out there is a problem with them: they don’t scale well.

As our system has grown, a lot of the logic in our Ruby system sort of replicates a type system, either in our unit tests or as validations on models. I think it may just be a property of large systems in dynamic languages, that eventually you end up rewriting your own type system, and you sort of do it badly. You’re checking for null values all over the place.

Alex Payne on why Twitter is switching to Scala for server-side code

Choosing the best tool for the job: optional type annotations.

All the flamewars on static vs. dynamic essentially boil down to this: choosing between productivity and flexibility vs stability and scalability.
The problem is that this choice is being  made by the language, when instead it should be made by the programmer: she should be able to make a trade-off  depending on the specific problem her code is trying to solve.

I believe that dynamic typing should be the default, as the advantages it brings are simply too valuable to give up on. However, there are cases in which enforcing the type a function expects at compile-time is simpler and sounder, without letting this additional guarantee getting in one’s way when it is not needed.
This can be fully achieved by using optional type annotations.

The most mature example of a successful application of this approach is Clojure. Clojure is a dialect of Lisp that runs on the JVM, and is receiving quite a lot of attention lately for its power and expressivity. Being a  Lisp, it is dynamically typed, but its core.typed library adds an optional static type system using annotations. For example this function checks whether an integer is divided by another:

(ann div-by [AnyInteger AnyInteger -> Boolean])
(defn div-by [x y]
(== (mod y x) 0))

The interesting part is that the annotation (the top line) can be added later without modifying the actual code, letting the programmer decide which functions should be left generic and which should not.

Another very important point to be made is that this library has been originally written by an external developer, and added to the language core later, it was not necessary to have native support for this feature. This  bodes well for the inclusion of such a feature in other languages like Python or Ruby.

Conclusion

Many dynamically typed  languages are powerful and fun to work in, but they are often set aside when it comes to large scale software, due to the perceived unsuitability that the lack of a static type system implies.
An optional system based on annotations however, as we have seen, succeeds at bringing the best of both worlds, allowing  programmers to do what is always required from them: choosing the best tool for the job.

Related readings

Steve Yegge , Is Weak Typing Strong Enough?
Martin Fowler, Dynamic typing, 14 March 2005
Bill Venners,Twitter on Scala, 3 April 2009
Gilad Bracha, Pluggable type systems, 27 October 2004
Adam Bard , Clojure’s core.typed vs Haskell, 29 September 2013

 

Enemy of the state! or why OOP is not suited to large scale software.

Introduction

This article is written in response to Functional programming in large-scale project development, in which the author highlights some characteristics that make functional programming suitable and/or beneficial for large scale software development.

Whilst I do agree with most of the points made, I think they are not sufficient to convince people to change. Change requires effort, and that effort is justified by one thing: awareness that the status quo is flawed.
This means that one not only has to convince people that  the new approach is correct, but also the what they are currently doing is wrong.

Specifically, to enforce the transition from OOP to functional programming , the point to be made is:

The use of mutable state is dangerous, and should be used very carefully, only when it’s a strict necessity.

State: what is it?

The state of a program (or, better, of a process) refers to the values assumed by its variables at a certain time during the execution of the program itself.

Referential transparency

In functional programming, data structures are immutable by default, hence functions cannot modify the state of the program.
Functions can therefore achieve a property called referential transparency: since the state is immutable, it means that they will always have the same output when a given input is provided.

Of course this is not true for an object method: as it can modify the state of the program, we cannot guarantee that its return value won’t change accordingly.
In fact, it has to change accordingly, as the whole OOP paradigm is based on sending messages to objects as the only means to interact with their state.

OOP + Mutable State = Bugs

When referential transparency does not hold, one is required to know two things in order to predict the behaviour of a program: the input of the program, and state the program is in.
Why is this a problem? Well, the biggest selling point of OOP is that it can manage complexity via encapsulation, specifically by hiding the state of objects, and exposing their behaviour solely via the interface provided by their methods.
However one does need to know about the state of the program to reason about its execution, but the state is:

  • Hidden, so one cannot rely on the interface alone, but instead has to inspect the implementation .
  • Split across multiple objects, so one has to take into account all the objects that are interacting with each other at any given time.

Of course this can be managed for small projects, but it’s an approach that scales poorly: when there are thousand of objects involved, most of which were written and engineered by other people, reasoning about the code is a nightmare.

That means the predicting the program’s behaviour becomes infeasible, leading to:

  • More bugs, as one doesn’t know what the code is doing, is harder to ensure it’s doing it correctly.
  • Very (I mean very) difficult debugging, as the execution depends on the state, which could have been changed by any of the methods of any of the objects involved.

Referential transparency avoids most of this problems, and, as an aside, ensures true encapsulation, for the interface provides all the information needed to use a module ( or whatever is the preferred means of abstraction in the functional language of choice).

Parallelism made simple

Another issue with mutable state is consistency: if the correct result of an operation depends on the state, then changing it could lead to failure. When multiple threads are executing concurrently and non-deterministically, ensuring consistency manually through the use of locks becomes a really hard task.

On the other hand, if the state is immutable, then consistency is assured automatically. As the clock speeds are now decreasing to leave way to multicore architectures, there is no coming back: a paradigm that is not naturally oriented towards parallelism and concurrency is doomed for failure.

I am not gonna write further on this topic, as the author makes already a very good case for it in the original article.

Is mutable state evil?

Of course there are situations in which the use of mutable state is necessary, e.g. IO. Functional programming languages do allow the use of mutable state (via monads, impurity etc.)  but make sure that it is used only when is really necessary, by enforcing immutability by default.

Conclusions

One might argue that there’s never been a better time for functional programming than now.
As the author correctly states in the section OOP languages shifting towards functional paradigms, several languages are including features like lambdas, first class functions, and even list comprehensions in some cases.

However, I am convinced that this trend will only make incremental improvements to the development of large scale software, as (good) programming languages are not just a bunch of features, but rather a coherent vision of how to write software correctly, and the most important aspect of functional programming, which is the use of immutable state by default, is still being neglected.

In fact, we have seen as mutable state means lack of referential transparency, which, when combined with the OOP approach to encapsulation, makes the code harder to reason about, and therefore more bug-prone and harder to debug.

So in conclusion:

Program imperatively when needed, and functionally when possible.[1]

 References

[1] Micheal O. Church, Functional programs rarely rot.

P.S  The pun in the title has been used in many articles about functional programming, even though I haven’t used them  as an inspiration. What I did use are all the Rich Hickey’s (creator of Clojure, one of my favourite languages) talks, I suggest you check them out at:
http://www.infoq.com/author/Rich-Hickey

From Functions to Functors: programming languages evolution towards code reuse in the large .

What this post is not.

This blog post does not aim to generate yet another flame war on programming languages, something along the lines of:

All of your problems are there because you keep using PHP, why don’t people just switch to OCaml and stop complaining…”

What this post is actually about.

This post stems from the consideration that often the choice of language opens opportunities (or sometimes poses constraints) on the design choices software engineers and programmers face during development. In particular, I am convinced that some often overlooked features, namely enhanced module systems like the one used in SML and OCaml, can help reuse code on a larger scale than what is currently achievable in most languages, with lesser need to resort to complex design patterns.

Why reuse code?

One might say that one of the main qualities of  a good programmer is her ability to maximise the amount of reusable code she writes.

The advantages are obvious:

  • Reusing code means writing less code, which equals less development time.
  • Reusing code avoids tedious tasks, like copy-paste, or even worst, rewriting from scratch.
  • Most importantly, duplicate code is a nightmare when an error or bug is detected, because doing the same correction in multiple places increases the chance of neglecting something (plus, it’s utterly boring ).

All the issues deriving from lack of code reuse can somewhat be neglected in small scale software (I have done some programming tutoring, and heard very often the phrase: “Why should I use polymorphism, my code  works perfectly!!”, usually referred to some ugly switch statement). However, as the scale increases, they can seriously hinder the successful development of a project.
It’s therefore better to be aware of the DRY principle [1]:

Don’t repeat yourself!

 

From pattern to feature: how programming languages evolve.

Let me make this clear straight away: respecting the DRY principle does not depend on the programming language.
In fact, look at this C code, does it look familiar to you OO programmers?

//Definition of generic stack
typedef struct{
void* value;
size_t size;
Stack* next;
} Stack;

void new_Stack(Stack*);
void push(Stack*,void*,size_t);
void pop(Stack*,void*,size_t);
void delete_Stack(Stack*);

The truth is that very good programmers can (almost) always design  good software, and often what is considered to be a good solution will be “standardised” in a design pattern; so that patterns will naturally arise from good practices.

Nevertheless, even when a pattern is well known, how it is applied is an entirely different matter, and still requires considerable effort in the design phase. So, while is always possible to write good code, how easily one can do so is influenced by other factors, and here is were language features weigh in.

As an example consider the Command pattern [2] implemented in Java , which involves defining an interface, and several classes to implement it, in order to encapsulate the information necessary to call a method.
However, if you are using a language with first-order functions, you don’t need to take any design choice to deal with this problem, but just pass the function to the method that needs to call it.

So, languages tend to absorb patterns by exposing additional features, turning what used to be complex design problems into trivial tasks.

A key concept towards code reuse: code parameterisation.

When considering any Assembly listing, one will probably see plenty of code that looks the same, made of registry manipulation and conditional branches. Some recognizable patterns  in the use of goto were encoded as a feature in higher level languages, giving birth to the while loop. Soon enough patterns started to emerge in the use of while as well, and the for loop was born.

A huge leap forward was the discovery that code could be parameterised. Since the introduction of functions, there has been a constant evolution that allowed more and more behaviour to be expressed concisely using some carefully parameterised piece of code, starting from procedures and leading to OOP programming.

This powerful process, however, seems to have stopped, and for many programmers object polymorphism is the top level of generalisation that can be achieved using a language feature. If one wants to generalise further, one needs to resort to carefully chosen design patterns.

Reusing code on a larger scale: SML-like Functors

If one looks at the hierarchy of an OO program, one will probably see methods, encompassed by classes, encompassed by modules ( packages, for those who come from a Java background). We have also seen how the key to code reuse is parameterisation, and the higher we apply it in the hierarchy, the larger the scale we will be able to abstract and, ultimately, reuse code on.

The idea is then simple, why don’t we parametrise modules? Well, turns out we already can.

SML and OCaml are functional languages of the ML family, sporting one of the most powerful module systems ever conceived. This module system is specifically intended for programming in the large, in that it expresses the architecture of the software, where each module provides abstraction through a descriptive interface for a group of types  and expressions[3].

In this module system, a functor [4] is a function that maps modules to other modules.  That is, it allows to create a module parameterised by other modules, with two remarkable effects:

  1. The reusability of a parameterised module is greatly enhanced since its functionality can be applied to similar, not just identical, domains.
  2. High reusability is achieved at the module level using a feature of the language, without the need to use complex design patterns, just as with functions and polymorphism at a smaller scale.

I love it but I need my own language!

Notwithstanding its expressive power, this module system has not been widely adopted, partly due to the limited popularity of SML and OCaml. Good news is, you don’t need either to benefit from it. Xavier Leroy [5] has been able to come up with an SML-like module system, implemented as a module parameterised by a base language and its type-checker, showing how it could be used by many different languages, not necessarily close to the ML semantics. Moreover, the fact that this module system is a module itself serves as a further statement of functors’ capabilities.

Conclusion

Design patterns lie just outside the scope of what we can solve using programming languages features, and require considerable effort in the design phase.
Functors provide a powerful means to specify and  glue  reusable modules together, and, if more widely adopted, can be a stepping stone towards code reuse in the large.

References

[1] Clark, Allan. “Design Patterns.” SAPM Lecture Slides, University of Edinburgh,2014

[2] “Command pattern“, Wikipedia definition.

[3] Fiore , Marcelo. “SML modules.”  Concepts in Programming lLanguages Lecture Slides .    Cambridge University, 2011.

[4] “Functors-Paremeterized Modules. Data Structures and Functional Programming  Lecture Slides . Cornell University.

[5]Leroy, Xavier. “A modular module system.Journal of Functional Programming 10.3 (2000): 269-303.