Iteratees are an abstraction discovered by Oleg Kiselyov, which provide a performant, predictable, and safe alternative to lazy I/O. Though the data types involved are simple, their relationship to incremental processing is not obvious, and existing documentation ranges in quality from merely dense to outright baffling. This article attempts to clarify the concepts and use underlying iteratees.
Please note that these are my notes, as I attempt to implement iteratee–based libraries. I may have misunderstood minor or major parts of iteratees. If in doubt, the final authority is Oleg -- though understanding his answers requires a saving throw vs. confusion. Please e-mail me any comments or suggestions.
2010–08–19: the code available in this article has been expanded and packaged as the enumerator library.
Lazy I/O – eg, hGetContents
and friends – is known to have several shortcomings. Most notably, IO errors can occur in pure code and Handle
s may remain open for arbitrary periods of time. Oleg notes
Iteratees do not suffer from these problems. Their resource use is bounded and predictable, and the type system provides guarantees that limited resources are released when no longer needed. Notably, iteratees can process arbitrarily large inputs in constant space.
There are at least five generic iteratee libraries, each with differing type signatures and semantics. Oleg's Iteratee.hs, IterateeM.hs, & IterateeMCPS.hs, John Lato's iteratee package, and a post by Per Magnus Therning.
This page documents a sixth implementation, based on IterateeM, with simplified error handling and naming conventions (hopefully) more obvious to the average Haskell programmer.
data Chunk a = Chunk [a] | EOF deriving (Show, Eq) data Step e a m b = Continue (Chunk a -> Iteratee e a m b) | Yield b (Chunk a) | Error e newtype Iteratee e a m b = Iteratee { runIteratee :: m (Step e a m b) }
In general, an iteratee begins in the Continue
state. As each chunk is passed to the continuation, the iteratee may return the next step, which is one of:
Continue
Yield
EOF
. If no input remains, but the iteratee can still accept more, it should yield Chunk []
.Error
String
and SomeException
.Based on these semantics, some simple instances can be created:
instance Monoid (Chunk a) where mempty = Chunk [] mappend (Chunk xs) (Chunk ys) = Chunk $ xs ++ ys mappend _ _ = EOF instance Functor Chunk where fmap _ EOF = EOF fmap f (Chunk xs) = Chunk $ map f xs instance (Show a, Show b, Show e) => Show (Step e a m b) where showsPrec d step = showParen (d > 10) $ case step of (Continue _) -> s "Continue" (Yield b chunk) -> s "Yield " . sp b . s " " . sp chunk (Error err) -> s "Error " . sp err where s = showString sp :: Show a => a -> ShowS sp = showsPrec 11
Slightly more complex is the Monad
instance for iteratees. The first iteratee is run, and if it yielded a value, that value is fed into the second iteratee.
instance Monad m => Monad (Iteratee e a m) where return x = Iteratee . return . Yield x $ Chunk [] m >>= f = Iteratee $ runIteratee m >>= \mStep -> case mStep of Continue k -> return $ Continue ((>>= f) . k) Error err -> return $ Error err Yield x (Chunk []) -> runIteratee $ f x Yield x chunk -> runIteratee (f x) >>= \r -> case r of Continue k -> runIteratee $ k chunk Error err -> return $ Error err -- runIteratee (f x) does not consume any input; if it -- returns Yield, then its "extra" input must be -- (Chunk []) and can be ignored. Yield x' _ -> return $ Yield x' chunk instance MonadTrans (Iteratee e a) where lift m = Iteratee $ m >>= runIteratee . return instance MonadIO m => MonadIO (Iteratee e a m) where liftIO = lift . liftIO instance Monad m => Functor (Iteratee e a m) where fmap f i = i >>= return . f
Next, lets define a few simple primitive combinators for building iteratees from pure functions:
returnI :: Monad m => Step e a m b -> Iteratee e a m b returnI = Iteratee . return liftI :: Monad m => (Chunk a -> Step e a m b) -> Iteratee e a m b liftI k = returnI $ Continue (returnI . k) yield :: Monad m => b -> Chunk a -> Iteratee e a m b yield x chunk = returnI $ Yield x chunk continue :: Monad m => (Chunk a -> Iteratee e a m b) -> Iteratee e a m b continue k = returnI $ Continue k throwError :: Monad m => e -> Iteratee e a m b throwError err = returnI $ Error err
These combinators are sufficient to define simple iteratees; for example, a variation of dropWhile
:
-- import Prelude hiding (dropWhile) -- import qualified Prelude as Prelude dropWhile :: Monad m => (a -> Bool) -> Iteratee e a m () dropWhile f = liftI step where step (Chunk xs) = case Prelude.dropWhile f xs of [] -> Continue $ returnI . step xs' -> Yield () (Chunk xs') step EOF = Yield () EOF
Or an iteratee for printing received chunks to stdout, useful for debugging:
printChunks :: MonadIO m => Show a => Bool -> Iteratee e a m () printChunks printEmpty = continue step where step (Chunk []) | not printEmpty = continue step step (Chunk xs) = liftIO (print xs) >> continue step step EOF = liftIO (putStrLn "EOF") >> yield () EOF
Finally, to extract the final result from an iteratee, it's sufficient to feed it EOF
and check the returned Step
. Note that a "well–behaved" iteratee continuation will always return Yield
or Error
in response to EOF
– iteratees which return Continue
may loop forever, depending on their monadic behavior.
run :: Monad m => Iteratee e a m b -> m (Either e b) run i = runIteratee i >>= check where check (Continue k) = runIteratee (k EOF) >>= check check (Yield x _) = return $ Right x check (Error e) = return $ Left e
Iteratees consume data from a sequence of input chunks. To generate those chunks, we define enumerators (and enumerator composition operators).
type Enumerator e a m b = Step e a m b -> Iteratee e a m b infixl 1 >>==, ==<< (>>==) :: Monad m => Iteratee e a m b -> (Step e a m b -> Iteratee e a' m b') -> Iteratee e a' m b' m >>== f = Iteratee (runIteratee m >>= runIteratee . f) (==<<):: Monad m => (Step e a m b -> Iteratee e a' m b') -> Iteratee e a m b -> Iteratee e a' m b' f ==<< m = m >>== f
Note that the Enumerator
type is semantically equivalent to:
type Enumerator e a m b = Step e a m b -> m (Step e a m b)
Simple enumerators can be defined in terms of existing combinators. The basic format of an enumerator is that when it receives a Continue
step, it passes a chunk to the continuation to generate its returned iteratee. Other step types are passed through unchanged.
enumList :: Monad m => [a] -> Enumerator e a m b enumList xs (Continue k) = case xs of [] -> k EOF (x:xs') -> k (Chunk [x]) >>== enumList xs' enumList _ step = returnI step
More complex enumerators require building the result manually. Note that while the recursive step is much larger in this example, the fundamental layout (loop on Continue
, pass on others) remains.
enumHandle :: Handle -> Enumerator String ByteString IO b enumHandle h = Iteratee . allocaBytes bufferSize . loop where bufferSize = 4096 loop (Continue k) = do_read k loop step = const $ return step do_read k p = do n <- try $ hGetBuf h p bufferSize case (n :: Either SomeException Int) of Left err -> return $ Error $ show err Right 0 -> return $ Continue k Right n' -> do bytes <- packCStringLen (p, n') step <- runIteratee (k (Chunk [bytes])) loop step p
In some cases, it might make more sense to define this enumerator in terms of bytes rather than byte strings. The required changes are minor – the bytes are stored directly in the Chunk
list.
enumHandle :: Handle -> Enumerator String Word8 IO b … Right n' -> do bytes <- F.peekArray n' p step <- runIteratee (k (Chunk bytes)) loop step p
Enumerators generate data, iteratees consume it. When a value needs to generate a stream using another stream as input, it is named an enumeratee.
type Enumeratee e aOut aIn m b = Step e aIn m b -> Iteratee e aOut m (Step e aIn m b)
Most interesting transformations in iteratee-based code are enumeratees. For example, map
can be encoded as an enumeratee:
checkDone :: Monad m => ((Chunk a -> Iteratee e a m b) -> Iteratee e a' m (Step e a m b)) -> Enumeratee e a' a m b checkDone _ (Yield x chunk) = return $ Yield x chunk checkDone f (Continue k) = f k checkDone _ (Error err) = throwError err mapI :: Monad m => (ao -> ai) -> Enumeratee e ao ai m b mapI f = checkDone $ continue . step where step k EOF = yield (Continue k) EOF step k (Chunk []) = continue $ step k step k chunk = k (fmap f chunk) >>== mapI f
A more complex example: sequenceI
converts an iteratee to an enumeratee, by feeding it input until it returns EOF
. This is useful for chaining iteratees together, to support embedded streams.
finished :: Monad m => Iteratee e a m Bool finished = liftI $ \chunk -> case chunk of EOF -> Yield True EOF _ -> Yield False chunk sequenceI :: Monad m => Iteratee e ao m ai -> Enumeratee e ao ai m b sequenceI i = checkDone check where check k = finished >>= \f -> if f then yield (Continue k) EOF else step k step k = i >>= \v -> k (Chunk [v]) >>== sequenceI i
A join combinator is useful for "extracting" an output stream from an enumeratee's result.
joinI :: Monad m => Iteratee e a m (Step e a' m b) -> Iteratee e a m b joinI outer = outer >>= check where check (Continue k) = k EOF >>== check check (Yield x _) = return x check (Error e) = throwError e
Oleg Kiselyov – Lazy vs correct IO