I/O and purity

There is considerable confusion and misunderstanding of how and whether (pure) FP languages can read from and write to databases, receive input from command line, send and receive data through network etc., so generally speaking perform I/O. “Opinions” and claims range from aggressive anti-intellectualism to subtle misunderstanding by people whom I was surprised by to not fully grasp all details and properties and be somewhat stubborn about. In the end, I/O and its relation to purity is subtle, but very simple.

So here is a very incomplete and overlapping list of what people think:

Pure FP languages cannot perform I/O at all.
Pure FP languages can perform I/O, but that is impure, so actually there are no pure FP languages.
I/O in (say) Haskell is impure.

And variations thereof.

Pure FP languages cannot perform I/O at all

This one is most ridiculous, especially when people make jokes like “Ha-ha, your Haskell program does not do anything except for making the CPU warm!” and continue to conclude in full faith that FP is a toy and an “academic activity”. This is just extremely stupid. Do those people think that all the effort, work and industry sponsorship money put in Haskell since early ’90-s is to make something like this? But hey, it’s a “research language”, so problem solved, no further critical thinking is needed. I can’t stand this shit (in general, it’s not about Haskell), when it is so simple in this Internet age to educate yourself by just looking this up, not necessarily in detail. I gladly make fun of such folk and am not ashamed of it. You are not engineers.

Of course Haskell, PureScript, Elm and other FP languages can do I/O just fine. How this works is further down, for now as a simple evidence consider this Web site: it was generated by a Haskell program, by reading and writing files (duh!).

Is Haskell then really pure?

OK, so now that we established that I/O is possible, doesn’t a Haskell program which performs it automatically become impure? Is the famous IO type and functions which have IO in their types impure? Many people say “yes” and/or introduce impure/pure sandwiches along the way. To be fair, Mark is not precise about his usage of the word “impure”, so I don’t fully understand his mental model. What I do want to focus on here is using “impure” in this context is at least confusing and easily leads to cascading misinterpretation and internalizing things which are then hard to unlearn. And what I mean by that is: even when doing I/O, everything in a Haskell program is pure, so also all layers in a “sandwich”.

Here comes the crucial subtlety. Programs are not (im)pure, expressions are. Paraphrasing, the notion of purity does not apply to a complete program, only to expressions (e.g. functions) constituting it.

A program which performs I/O can consist of 100% pure expressions only.

And this is exactly the case with Haskell, PureScript and others (yes, except for unsafePerformIO, but it does not change the essence at all). This is why I object to calling parts of the “sandwich” impure: there simply are no impure expressions there! Before proceeding to a simple experiment which illustrates this, consider the following: if you don’t agree, than either you are wrong or the Haskell Web-site and documentation is wrong in prominently calling Haskell purely functional…

IO actions

Consider the following (complete) programs in F# and Haskell:

[<EntryPoint>]
let main args =
    printfn "Hello from F#" // returns unit
    0

and

main :: IO ()
main = do
    print "Hello from Haskell" -- returns IO ()
    return ()

Both programs do exactly the same: output text to the terminal. But there is a big difference too: function printfn has the deliberate side-effect of printing, while function print is pure. Yes, print is really pure: it does not actually print anything, it computes and returns an I/O action, a value which describes printing. It is also deterministic: when applied to the same argument, it will always return the same (equivalent) IO action. Why does the Haskell program then print the message? The only reason is that the I/O action is bound to the main function (in the sense of the monadic function bind). This function is executed by the run time system, just like in F#, and it is only then that actual I/O described by the nested IO actions happens. Of course, this is a tiny tiny program; a typical one would consist of many I/O actions interspersed with non IO computations on different levels of composition – the “sandwich”. But in the end, everything is composed into the main function, which (see its type) is by itself also an IO action!

A Haskell program is one big description of I/O steps with non-I/O computations in between.

It is easy to show that IO actions are really descriptions, decoupled from “running” or “executing” them. We only need one small change to the Haskell program:

main :: IO ()
main = do
    let p = print "Hello from Haskell" -- p has type IO ()
    return ()

Now this program does not output anything to the terminal. As per above, function print merely returns an action, now named p, which is subsequently ignored. In particular, it is not bound to the main function inside of the do-block.

Finally, let’s show that print is referentially transparent, because it really means it is pure. Referential transparency means that we can replace all occurences of an expression with the variable it is bound to and the other way around, without changing the behaviour of the program. Start with the following program which prints two identical messages:

main :: IO ()
main = do
    print "Hello from Haskell"
    print "Hello from Haskell"

It feels totally natural and expected to be able to “remove repetition” by introducing a varible for the common parts and reusing it:

main :: IO ()
main = do
    let p = print "Hello from Haskell"
    p
    p

Indeed, these two programs are equivalent at run time: the same message is still printed twice. This refactoring is only possible because eveything in Haskell is referentially transparent and pure. Things are different with F# though:

[<EntryPoint>]
let main args =
    printfn "Hello from F#"
    printfn "Hello from F#"
    0

And after applying the same syntactical transformation

[<EntryPoint>]
let main args =
    let p = printfn "Hello from F#"
    p
    p
    0

program behavior is not the same anymore: the message is only printed once. This is because printfn is not referentially transparent, i.e. impure.

Being more precise about sandwiches

Having said all this, I think calling functions which return or combine IO actions impure is simply wrong. Especially for learners or recreational readers. Instead, if you really like the “sandwich” idea, I’d think of it in terms of I/O steps and non-I/O computations, logic if you will. But then, is the “sandwhich” notion interesting and deep enough to give it so much attention? At least not for me, aren’t most programs like this? Perhaps it seems very interesting exactly when viewed from the prism of the wrong (for FP languages!) distinction between purity and impurity, with the subsequent message about functional core/imperative shell. It is however a useful notion for FP-first languages like F#.