Introduction to Cloud Haskell

An approach to distributed programming in Haskell

Alen Ribic / @alenribic

January 2013, Lambda Luminaries

In this talk...

  • Fundamentals of Cloud Haskell
  • Tutorial
  • Cloud Haskell and embedded systems
  • Current state and Future work

What is Cloud Haskell?

Cloud Haskell is a DSL for developing programs for a distributed computing environment in Haskell.
  • No changes to the language (shallow embedding as library)
  • Programming at the same level as with Concurrent Haskell elements (forkIO, MVar)
  • Computational model strongly based on the message-passing model of Erlang
  • We leverage Haskell’s purity, types, and monads

Programming model

  • Explicit concurrency
  • Lightweight processes
  • No state shared between processes
  • Asynchronous message passing

Often called the 'Actor model'

Background and ideas


Initial prototype

  • remote package by Jeff Epstein


Cloud Haskell design

  • Lightweight processes as Process monad (built on GHC's lightweight threads)
  • Can spawn, monitor and terminate process on any node
  • Inter-process communication by sending messages
  • Can send and receive messages of any type (Erlang)
  • Can leverage Haskell's strong type system to provide static guarantees about the content of messages (typed channels)
  • Novel method for serializing function closures that enables higher-order functions to be used in a distributed environment

The interface functions of Cloud Haskell

Basic types

newtype Process a = Process {
        unProcess :: ReaderT LocalProcess IO a
} deriving (Functor, Monad, MonadIO, MonadReader LocalProcess,
            Typeable, Applicative)

data ProcessId
data NodeId

class (Binary a, Typeable a) => Serializable a

Basic messaging

send :: Serializable a => ProcessId -> a -> Process ()
expect :: Serializable a => Process a

Advanced messaging

receiveWait :: [Match b] -> Process b
receiveTimeout :: Int -> [Match b] -> Process (Maybe b)
match :: Serializable a => (a -> Process b) -> Match b
matchIf :: Serializable a => (a -> Bool) -> (a -> Process b) -> Match b

Process management

spawn :: NodeId -> Closure (Process ()) -> Process ProcessId
terminate :: Process a
getSelfPid :: Process ProcessId
getSelfNode :: Process NodeId

Process monitoring

link :: ProcessId -> Process ()
monitor :: ProcessId -> Process MonitorRef


say :: String -> Process ()


Implementation design decisions - highlights

  • Swappable network transport layer
  • Multiple Cloud Haskell backends to handle

distributed-process internal design

|                        Application                         |
|                               |
V                               V
+-------------------------+   +------------------------------+
|      Cloud Haskell      |<--|    Cloud Haskell Backend     |
+-------------------------+   +------------------------------+
|           ______/             |
V           V                   V
+-------------------------+   +------------------------------+
|   Transport Interface   |<--|   Transport Implementation   |
+-------------------------+   +------------------------------+
| Haskell/C Transport Library  |

Network transport layer


  • TCP/IP
  • Unix pipes (in progress)
  • CCI (in progress)
    (Common Communication Interface is an HPC networking library supporting infiniband, etc.)

Network transport layer

Also possible

  • Shared memory
  • SSH (One could write an interesting ops tool!)
  • UDP
  • TCP with SSL/TLS

Cloud Haskell backends

"SimpleLocalnet" backend

  • Simple backend to get started quickly
  • No configuration
  • Uses the TCP transport
  • Node discovery using local UDP multicast

Cloud Haskell backends

"Windows Azure" backend

  • Uses Linux VMs
  • Uses the TCP transport between the VMs
  • Initialise with Azure account and SSL certificates
  • Support for: VM enumeration, copying binaries to VMs, spawning nodes on VMs


Building a distributed app to find and sum up a number of prime factors for every natural number [1..n]

Prime factorization

factors :: [Integer] -> Integer -> [Integer]
factors qs@(p:ps) n
  | n <= 1 = []
  | m == 0 = p : factors qs d
  | otherwise = factors ps n
  (d,m) = n `divMod` p

primeFactors :: Integer -> [Integer]
primeFactors = factors primes

numPrimeFactors :: Integer -> Integer
numPrimeFactors = fromIntegral . length . primeFactors
Written by Dan Weston


Push work to available nodes and sum up the results

master :: Integer -> [NodeId] -> Process Integer
master n slaves = do
  us <- getSelfPid

  -- Start slave processes
  slaveProcesses <- forM slaves $
    \nid -> spawn nid ($(mkClosure 'slave) us)

  -- Distribute 1 .. n amongst the slave processes
  spawnLocal $ forM_ (zip [1 .. n] (cycle slaveProcesses)) $
    \(m, them) -> send them m

  -- Wait for the result
  sumIntegers (fromIntegral n)

sumIntegers :: Int -> Process Integer
sumIntegers = go 0
   go :: Integer -> Int -> Process Integer
   go !acc 0 = return acc
   go !acc n = do
     m <- expect
     go (acc + m) (n - 1)


Compute the number of prime factors and send results to master node

slave :: ProcessId -> Process ()
slave them = forever $ do
  n <- expect
  send them (numPrimeFactors n)

remotable ['slave]

main function

main = do
  args <- getArgs

  case args of
    ["master", host, port, n] -> do
      backend <- initializeBackend host port rtable
      startMaster backend $ \slaves -> do
        result <- master (read n) slaves
        liftIO $ print result
    ["slave", host, port] -> do
      backend <- initializeBackend host port rtable
      startSlave backend
./prime-factors slave localhost 8081
./prime-factors master localhost 8080 100
=> 239

For more examples, check out the distributed-process-demos package.

Cloud Haskell and embedded systems

A brief look at running Raspberry Pi in a Haskell Cloud

Why bother running Cloud Haskell on embedded devices?

Cloud Haskell brings some key contributions that, in one form or another, can play a major role in the next generation of cloud computing. And because Raspberry Pi, through its sheer low cost and capability, has the potential to gain the greatest reach of any embedded system.


  • Cloud Haskell running on Raspberry Pi without Template Haskell (a bit more work, lack of splicing $(mkClosure 'f))
  • Tested with GHC 7.4.1 (ARM build, stage-1 compiler)
  • Changes made to the distributed-process to enable sidestepping of TH where no stage-2 available


  • GHC 7.4.2 stage-2 support (build distributed-process with TH support)
  • Find a simpler way to cross-compile with GHC 7.x (current option is to install QEMU), a generic and open source machine emulator


Current state of the implementation

  • Covers the full API
  • Made a first release and several minor bug-fix releases
  • Reasonable test suite
  • Reasonable performance

Ready for serious experiments, but not yet for serious use.

Future work

Significant TODOs

  • Larger scale testing
  • Node disconnect and reconnect needs more work and testing
  • More demos
  • Comparative benchmarking needed


  • Shared memory transport
  • SSH transport
  • Ability to use multiple transports
  • Implementation of the ‘static’ language extension
  • Higher level libraries, such as a task layer framework

Cloud Haskell Packages


  • distributed-process: The main CH package
  • distributed-process-simplelocalnet: Simple backend for local networks
  • network-transport: Transport interface
  • network-transport-tcp: TCP instantiation of Network.Transport
  • distributed-process-azure: Azure backend

Source code and documentation


The New Cloud Haskell presentation by Duncan Coutts and Edsko de Vries delivered at the Haskell Implementors Workshop.


Alen Ribic /