Introduction to Cloud Haskell

An approach to distributed programming in Haskell

Alen Ribic / @alenribic

January 2013, Lambda Luminaries

In this talk...

  • Fundamentals of Cloud Haskell
  • Tutorial
  • Cloud Haskell and embedded systems
  • Current state and Future work

What is Cloud Haskell?

Cloud Haskell is a DSL for developing programs for a distributed computing environment in Haskell.
  • No changes to the language (shallow embedding as library)
  • Programming at the same level as with Concurrent Haskell elements (forkIO, MVar)
  • Computational model strongly based on the message-passing model of Erlang
  • We leverage Haskell’s purity, types, and monads

Programming model

  • Explicit concurrency
  • Lightweight processes
  • No state shared between processes
  • Asynchronous message passing


Often called the 'Actor model'

Background and ideas

Papers

Initial prototype

  • remote package by Jeff Epstein

Implementation

Cloud Haskell design

  • Lightweight processes as Process monad (built on GHC's lightweight threads)
  • Can spawn, monitor and terminate process on any node
  • Inter-process communication by sending messages
  • Can send and receive messages of any type (Erlang)
  • Can leverage Haskell's strong type system to provide static guarantees about the content of messages (typed channels)
  • Novel method for serializing function closures that enables higher-order functions to be used in a distributed environment

The interface functions of Cloud Haskell

Basic types


newtype Process a = Process {
        unProcess :: ReaderT LocalProcess IO a
} deriving (Functor, Monad, MonadIO, MonadReader LocalProcess,
            Typeable, Applicative)

data ProcessId
data NodeId

class (Binary a, Typeable a) => Serializable a
                        

Basic messaging


send :: Serializable a => ProcessId -> a -> Process ()
expect :: Serializable a => Process a
                        

Advanced messaging


receiveWait :: [Match b] -> Process b
receiveTimeout :: Int -> [Match b] -> Process (Maybe b)
match :: Serializable a => (a -> Process b) -> Match b
matchIf :: Serializable a => (a -> Bool) -> (a -> Process b) -> Match b
                        

Process management


spawn :: NodeId -> Closure (Process ()) -> Process ProcessId
terminate :: Process a
getSelfPid :: Process ProcessId
getSelfNode :: Process NodeId
                        

Process monitoring


link :: ProcessId -> Process ()
monitor :: ProcessId -> Process MonitorRef
                        

Logging


say :: String -> Process ()
                        

distributed-process

Implementation design decisions - highlights

  • Swappable network transport layer
  • Multiple Cloud Haskell backends to handle

distributed-process internal design

+------------------------------------------------------------+
|                        Application                         |
+------------------------------------------------------------+
|                               |
V                               V
+-------------------------+   +------------------------------+
|      Cloud Haskell      |<--|    Cloud Haskell Backend     |
+-------------------------+   +------------------------------+
|           ______/             |
V           V                   V
+-------------------------+   +------------------------------+
|   Transport Interface   |<--|   Transport Implementation   |
+-------------------------+   +------------------------------+
|
V
+------------------------------+
| Haskell/C Transport Library  |
+------------------------------+
                        

Network transport layer

Implementations

  • TCP/IP
  • Unix pipes (in progress)
  • CCI (in progress)
    (Common Communication Interface is an HPC networking library supporting infiniband, etc.)

Network transport layer

Also possible

  • Shared memory
  • SSH (One could write an interesting ops tool!)
  • UDP
  • TCP with SSL/TLS

Cloud Haskell backends

"SimpleLocalnet" backend

  • Simple backend to get started quickly
  • No configuration
  • Uses the TCP transport
  • Node discovery using local UDP multicast

Cloud Haskell backends

"Windows Azure" backend

  • Uses Linux VMs
  • Uses the TCP transport between the VMs
  • Initialise with Azure account and SSL certificates
  • Support for: VM enumeration, copying binaries to VMs, spawning nodes on VMs

Tutorial

Building a distributed app to find and sum up a number of prime factors for every natural number [1..n]

Prime factorization


factors :: [Integer] -> Integer -> [Integer]
factors qs@(p:ps) n
  | n <= 1 = []
  | m == 0 = p : factors qs d
  | otherwise = factors ps n
 where
  (d,m) = n `divMod` p

primeFactors :: Integer -> [Integer]
primeFactors = factors primes

numPrimeFactors :: Integer -> Integer
numPrimeFactors = fromIntegral . length . primeFactors
                            
Written by Dan Weston

Master

Push work to available nodes and sum up the results


master :: Integer -> [NodeId] -> Process Integer
master n slaves = do
  us <- getSelfPid

  -- Start slave processes
  slaveProcesses <- forM slaves $
    \nid -> spawn nid ($(mkClosure 'slave) us)

  -- Distribute 1 .. n amongst the slave processes
  spawnLocal $ forM_ (zip [1 .. n] (cycle slaveProcesses)) $
    \(m, them) -> send them m

  -- Wait for the result
  sumIntegers (fromIntegral n)

sumIntegers :: Int -> Process Integer
sumIntegers = go 0
 where
   go :: Integer -> Int -> Process Integer
   go !acc 0 = return acc
   go !acc n = do
     m <- expect
     go (acc + m) (n - 1)
                        

Slave

Compute the number of prime factors and send results to master node


slave :: ProcessId -> Process ()
slave them = forever $ do
  n <- expect
  send them (numPrimeFactors n)

remotable ['slave]
                        

main function


main = do
  args <- getArgs

  case args of
    ["master", host, port, n] -> do
      backend <- initializeBackend host port rtable
      startMaster backend $ \slaves -> do
        result <- master (read n) slaves
        liftIO $ print result
    ["slave", host, port] -> do
      backend <- initializeBackend host port rtable
      startSlave backend
                        
./prime-factors slave localhost 8081
                        
./prime-factors master localhost 8080 100
                        
=> 239
                        

For more examples, check out the distributed-process-demos package.

Cloud Haskell and embedded systems

A brief look at running Raspberry Pi in a Haskell Cloud

Why bother running Cloud Haskell on embedded devices?

Cloud Haskell brings some key contributions that, in one form or another, can play a major role in the next generation of cloud computing. And because Raspberry Pi, through its sheer low cost and capability, has the potential to gain the greatest reach of any embedded system.

Status

  • Cloud Haskell running on Raspberry Pi without Template Haskell (a bit more work, lack of splicing $(mkClosure 'f))
  • Tested with GHC 7.4.1 (ARM build, stage-1 compiler)
  • Changes made to the distributed-process to enable sidestepping of TH where no stage-2 available

TODO

  • GHC 7.4.2 stage-2 support (build distributed-process with TH support)
  • Find a simpler way to cross-compile with GHC 7.x (current option is to install QEMU), a generic and open source machine emulator

Links

Current state of the implementation

  • Covers the full API
  • Made a first release and several minor bug-fix releases
  • Reasonable test suite
  • Reasonable performance


Ready for serious experiments, but not yet for serious use.

Future work

Significant TODOs

  • Larger scale testing
  • Node disconnect and reconnect needs more work and testing
  • More demos
  • Comparative benchmarking needed

Wishlist

  • Shared memory transport
  • SSH transport
  • Ability to use multiple transports
  • Implementation of the ‘static’ language extension
  • Higher level libraries, such as a task layer framework

Cloud Haskell Packages

Hackage

  • distributed-process: The main CH package
  • distributed-process-simplelocalnet: Simple backend for local networks
  • network-transport: Transport interface
  • network-transport-tcp: TCP instantiation of Network.Transport
  • distributed-process-azure: Azure backend


Source code and documentation

https://github.com/haskell-distributed/distributed-process

Acknowledgments

The New Cloud Haskell presentation by Duncan Coutts and Edsko de Vries delivered at the Haskell Implementors Workshop.

THE END

Alen Ribic / alenribic.com