Understanding Iteratees

Iteratees are an abstraction discovered by Oleg Kiselyov, which provide a performant, predictable, and safe alternative to lazy I/O. Though the data types involved are simple, their relationship to incremental processing is not obvious, and existing documentation ranges in quality from merely dense to outright baffling. This article attempts to clarify the concepts and use underlying iteratees.

Please note that these are my notes, as I attempt to implement iteratee–based libraries. I may have misunderstood minor or major parts of iteratees. If in doubt, the final authority is Oleg -- though understanding his answers requires a saving throw vs. confusion. Please e-mail me any comments or suggestions.

2010–08–19: the code available in this article has been expanded and packaged as the enumerator library.

Iteratees vs. Lazy I/O

Lazy I/O – eg, hGetContents and friends – is known to have several shortcomings. Most notably, IO errors can occur in pure code and Handles may remain open for arbitrary periods of time. Oleg notes[1] that this can lead to unexpected failures, due to resource exhaustion.

Iteratees do not suffer from these problems. Their resource use is bounded and predictable, and the type system provides guarantees that limited resources are released when no longer needed. Notably, iteratees can process arbitrarily large inputs in constant space.

Implementing iteratees

There are at least five generic iteratee libraries, each with differing type signatures and semantics. Oleg's Iteratee.hs, IterateeM.hs, & IterateeMCPS.hs, John Lato's iteratee package, and a post by Per Magnus Therning.

This page documents a sixth implementation, based on IterateeM, with simplified error handling and naming conventions (hopefully) more obvious to the average Haskell programmer.

data Chunk a = Chunk [a] | EOF deriving (Show, Eq) data Step e a m b = Continue (Chunk a -> Iteratee e a m b) | Yield b (Chunk a) | Error e newtype Iteratee e a m b = Iteratee { runIteratee :: m (Step e a m b) }

In general, an iteratee begins in the Continue state. As each chunk is passed to the continuation, the iteratee may return the next step, which is one of:

Continue
The iteratee requires more input before it can produce a result.
Yield
The iteratee has received enough input to generate a result, along with left–over input. If the iteratee will no longer accept input, it should yield EOF. If no input remains, but the iteratee can still accept more, it should yield Chunk [].
Error
The iteratee experienced an error which prevents it from proceeding further. The type of error contained will depend on the enumerator and/or iteratee – common choices are String and SomeException.

Based on these semantics, some simple instances can be created:

instance Monoid (Chunk a) where mempty = Chunk [] mappend (Chunk xs) (Chunk ys) = Chunk $ xs ++ ys mappend _ _ = EOF instance Functor Chunk where fmap _ EOF = EOF fmap f (Chunk xs) = Chunk $ map f xs instance (Show a, Show b, Show e) => Show (Step e a m b) where showsPrec d step = showParen (d > 10) $ case step of (Continue _) -> s "Continue" (Yield b chunk) -> s "Yield " . sp b . s " " . sp chunk (Error err) -> s "Error " . sp err where s = showString sp :: Show a => a -> ShowS sp = showsPrec 11

Slightly more complex is the Monad instance for iteratees. The first iteratee is run, and if it yielded a value, that value is fed into the second iteratee.

instance Monad m => Monad (Iteratee e a m) where return x = Iteratee . return . Yield x $ Chunk [] m >>= f = Iteratee $ runIteratee m >>= \mStep -> case mStep of Continue k -> return $ Continue ((>>= f) . k) Error err -> return $ Error err Yield x (Chunk []) -> runIteratee $ f x Yield x chunk -> runIteratee (f x) >>= \r -> case r of Continue k -> runIteratee $ k chunk Error err -> return $ Error err -- runIteratee (f x) does not consume any input; if it -- returns Yield, then its "extra" input must be -- (Chunk []) and can be ignored. Yield x' _ -> return $ Yield x' chunk instance MonadTrans (Iteratee e a) where lift m = Iteratee $ m >>= runIteratee . return instance MonadIO m => MonadIO (Iteratee e a m) where liftIO = lift . liftIO instance Monad m => Functor (Iteratee e a m) where fmap f i = i >>= return . f

Next, lets define a few simple primitive combinators for building iteratees from pure functions:

returnI :: Monad m => Step e a m b -> Iteratee e a m b returnI = Iteratee . return liftI :: Monad m => (Chunk a -> Step e a m b) -> Iteratee e a m b liftI k = returnI $ Continue (returnI . k) yield :: Monad m => b -> Chunk a -> Iteratee e a m b yield x chunk = returnI $ Yield x chunk continue :: Monad m => (Chunk a -> Iteratee e a m b) -> Iteratee e a m b continue k = returnI $ Continue k throwError :: Monad m => e -> Iteratee e a m b throwError err = returnI $ Error err

These combinators are sufficient to define simple iteratees; for example, a variation of dropWhile:

-- import Prelude hiding (dropWhile) -- import qualified Prelude as Prelude dropWhile :: Monad m => (a -> Bool) -> Iteratee e a m () dropWhile f = liftI step where step (Chunk xs) = case Prelude.dropWhile f xs of [] -> Continue $ returnI . step xs' -> Yield () (Chunk xs') step EOF = Yield () EOF

Or an iteratee for printing received chunks to stdout, useful for debugging:

printChunks :: MonadIO m => Show a => Bool -> Iteratee e a m () printChunks printEmpty = continue step where step (Chunk []) | not printEmpty = continue step step (Chunk xs) = liftIO (print xs) >> continue step step EOF = liftIO (putStrLn "EOF") >> yield () EOF

Finally, to extract the final result from an iteratee, it's sufficient to feed it EOF and check the returned Step. Note that a "well–behaved" iteratee continuation will always return Yield or Error in response to EOF – iteratees which return Continue may loop forever, depending on their monadic behavior.

run :: Monad m => Iteratee e a m b -> m (Either e b) run i = runIteratee i >>= check where check (Continue k) = runIteratee (k EOF) >>= check check (Yield x _) = return $ Right x check (Error e) = return $ Left e

Enumerators

Iteratees consume data from a sequence of input chunks. To generate those chunks, we define enumerators (and enumerator composition operators).

type Enumerator e a m b = Step e a m b -> Iteratee e a m b infixl 1 >>==, ==<< (>>==) :: Monad m => Iteratee e a m b -> (Step e a m b -> Iteratee e a' m b') -> Iteratee e a' m b' m >>== f = Iteratee (runIteratee m >>= runIteratee . f) (==<<):: Monad m => (Step e a m b -> Iteratee e a' m b') -> Iteratee e a m b -> Iteratee e a' m b' f ==<< m = m >>== f

Note that the Enumerator type is semantically equivalent to:

type Enumerator e a m b = Step e a m b -> m (Step e a m b)

Simple enumerators can be defined in terms of existing combinators. The basic format of an enumerator is that when it receives a Continue step, it passes a chunk to the continuation to generate its returned iteratee. Other step types are passed through unchanged.

enumList :: Monad m => [a] -> Enumerator e a m b enumList xs (Continue k) = case xs of [] -> k EOF (x:xs') -> k (Chunk [x]) >>== enumList xs' enumList _ step = returnI step

More complex enumerators require building the result manually. Note that while the recursive step is much larger in this example, the fundamental layout (loop on Continue, pass on others) remains.

enumHandle :: Handle -> Enumerator String ByteString IO b enumHandle h = Iteratee . allocaBytes bufferSize . loop where bufferSize = 4096 loop (Continue k) = do_read k loop step = const $ return step do_read k p = do n <- try $ hGetBuf h p bufferSize case (n :: Either SomeException Int) of Left err -> return $ Error $ show err Right 0 -> return $ Continue k Right n' -> do bytes <- packCStringLen (p, n') step <- runIteratee (k (Chunk [bytes])) loop step p

In some cases, it might make more sense to define this enumerator in terms of bytes rather than byte strings. The required changes are minor – the bytes are stored directly in the Chunk list.

enumHandle :: Handle -> Enumerator String Word8 IO bRight n' -> do bytes <- F.peekArray n' p step <- runIteratee (k (Chunk bytes)) loop step p

Enumeratees

Enumerators generate data, iteratees consume it. When a value needs to generate a stream using another stream as input, it is named an enumeratee.

type Enumeratee e aOut aIn m b = Step e aIn m b -> Iteratee e aOut m (Step e aIn m b)

Most interesting transformations in iteratee-based code are enumeratees. For example, map can be encoded as an enumeratee:

checkDone :: Monad m => ((Chunk a -> Iteratee e a m b) -> Iteratee e a' m (Step e a m b)) -> Enumeratee e a' a m b checkDone _ (Yield x chunk) = return $ Yield x chunk checkDone f (Continue k) = f k checkDone _ (Error err) = throwError err mapI :: Monad m => (ao -> ai) -> Enumeratee e ao ai m b mapI f = checkDone $ continue . step where step k EOF = yield (Continue k) EOF step k (Chunk []) = continue $ step k step k chunk = k (fmap f chunk) >>== mapI f

A more complex example: sequenceI converts an iteratee to an enumeratee, by feeding it input until it returns EOF. This is useful for chaining iteratees together, to support embedded streams.

finished :: Monad m => Iteratee e a m Bool finished = liftI $ \chunk -> case chunk of EOF -> Yield True EOF _ -> Yield False chunk sequenceI :: Monad m => Iteratee e ao m ai -> Enumeratee e ao ai m b sequenceI i = checkDone check where check k = finished >>= \f -> if f then yield (Continue k) EOF else step k step k = i >>= \v -> k (Chunk [v]) >>== sequenceI i

A join combinator is useful for "extracting" an output stream from an enumeratee's result.

joinI :: Monad m => Iteratee e a m (Step e a' m b) -> Iteratee e a m b joinI outer = outer >>= check where check (Continue k) = k EOF >>== check check (Yield x _) = return x check (Error e) = throwError e

  1. Oleg Kiselyov – Lazy vs correct IO

Change Feed