Iteratees are an abstraction discovered by Oleg Kiselyov, which provide a performant, predictable, and safe alternative to lazy I/O. Though the data types involved are simple, their relationship to incremental processing is not obvious, and existing documentation ranges in quality from merely dense to outright baffling. This article attempts to clarify the concepts and use underlying iteratees.
Please note that these are my notes, as I attempt to implement iteratee–based libraries. I may have misunderstood minor or major parts of iteratees. If in doubt, the final authority is Oleg -- though understanding his answers requires a saving throw vs. confusion. Please e-mail me any comments or suggestions.
2010–08–19: the code available in this article has been expanded and packaged as the enumerator library.
Lazy I/O – eg, hGetContents
and friends – is known to have several shortcomings. Most notably, IO errors can occur in pure code and Handle
s may remain open for arbitrary periods of time. Oleg notes
Iteratees do not suffer from these problems. Their resource use is bounded and predictable, and the type system provides guarantees that limited resources are released when no longer needed. Notably, iteratees can process arbitrarily large inputs in constant space.
There are at least five generic iteratee libraries, each with differing type signatures and semantics. Oleg's Iteratee.hs, IterateeM.hs, & IterateeMCPS.hs, John Lato's iteratee package, and a post by Per Magnus Therning.
This page documents a sixth implementation, based on IterateeM, with simplified error handling and naming conventions (hopefully) more obvious to the average Haskell programmer.
data Chunk a
= Chunk [a]
| EOF
deriving (Show, Eq)
data Step e a m b
= Continue (Chunk a -> Iteratee e a m b)
| Yield b (Chunk a)
| Error e
newtype Iteratee e a m b = Iteratee
{ runIteratee :: m (Step e a m b)
}
In general, an iteratee begins in the Continue
state. As each chunk is passed to the continuation, the iteratee may return the next step, which is one of:
Continue
Yield
EOF
. If no input remains, but the iteratee can still accept more, it should yield Chunk []
.Error
String
and SomeException
.Based on these semantics, some simple instances can be created:
instance Monoid (Chunk a) where
mempty = Chunk []
mappend (Chunk xs) (Chunk ys) = Chunk $ xs ++ ys
mappend _ _ = EOF
instance Functor Chunk where
fmap _ EOF = EOF
fmap f (Chunk xs) = Chunk $ map f xs
instance (Show a, Show b, Show e) => Show (Step e a m b) where
showsPrec d step = showParen (d > 10) $ case step of
(Continue _) -> s "Continue"
(Yield b chunk) -> s "Yield " . sp b . s " " . sp chunk
(Error err) -> s "Error " . sp err
where
s = showString
sp :: Show a => a -> ShowS
sp = showsPrec 11
Slightly more complex is the Monad
instance for iteratees. The first iteratee is run, and if it yielded a value, that value is fed into the second iteratee.
instance Monad m => Monad (Iteratee e a m) where
return x = Iteratee . return . Yield x $ Chunk []
m >>= f = Iteratee $ runIteratee m >>= \mStep -> case mStep of
Continue k -> return $ Continue ((>>= f) . k)
Error err -> return $ Error err
Yield x (Chunk []) -> runIteratee $ f x
Yield x chunk -> runIteratee (f x) >>= \r -> case r of
Continue k -> runIteratee $ k chunk
Error err -> return $ Error err
-- runIteratee (f x) does not consume any input; if it
-- returns Yield, then its "extra" input must be
-- (Chunk []) and can be ignored.
Yield x' _ -> return $ Yield x' chunk
instance MonadTrans (Iteratee e a) where
lift m = Iteratee $ m >>= runIteratee . return
instance MonadIO m => MonadIO (Iteratee e a m) where
liftIO = lift . liftIO
instance Monad m => Functor (Iteratee e a m) where
fmap f i = i >>= return . f
Next, lets define a few simple primitive combinators for building iteratees from pure functions:
returnI :: Monad m => Step e a m b -> Iteratee e a m b
returnI = Iteratee . return
liftI :: Monad m => (Chunk a -> Step e a m b) -> Iteratee e a m b
liftI k = returnI $ Continue (returnI . k)
yield :: Monad m => b -> Chunk a -> Iteratee e a m b
yield x chunk = returnI $ Yield x chunk
continue :: Monad m => (Chunk a -> Iteratee e a m b) -> Iteratee e a m b
continue k = returnI $ Continue k
throwError :: Monad m => e -> Iteratee e a m b
throwError err = returnI $ Error err
These combinators are sufficient to define simple iteratees; for example, a variation of dropWhile
:
-- import Prelude hiding (dropWhile)
-- import qualified Prelude as Prelude
dropWhile :: Monad m => (a -> Bool) -> Iteratee e a m ()
dropWhile f = liftI step where
step (Chunk xs) = case Prelude.dropWhile f xs of
[] -> Continue $ returnI . step
xs' -> Yield () (Chunk xs')
step EOF = Yield () EOF
Or an iteratee for printing received chunks to stdout, useful for debugging:
printChunks :: MonadIO m => Show a => Bool -> Iteratee e a m ()
printChunks printEmpty = continue step where
step (Chunk []) | not printEmpty = continue step
step (Chunk xs) = liftIO (print xs) >> continue step
step EOF = liftIO (putStrLn "EOF") >> yield () EOF
Finally, to extract the final result from an iteratee, it's sufficient to feed it EOF
and check the returned Step
. Note that a "well–behaved" iteratee continuation will always return Yield
or Error
in response to EOF
– iteratees which return Continue
may loop forever, depending on their monadic behavior.
run :: Monad m => Iteratee e a m b -> m (Either e b)
run i = runIteratee i >>= check where
check (Continue k) = runIteratee (k EOF) >>= check
check (Yield x _) = return $ Right x
check (Error e) = return $ Left e
Iteratees consume data from a sequence of input chunks. To generate those chunks, we define enumerators (and enumerator composition operators).
type Enumerator e a m b = Step e a m b -> Iteratee e a m b
infixl 1 >>==, ==<<
(>>==) :: Monad m =>
Iteratee e a m b ->
(Step e a m b -> Iteratee e a' m b') ->
Iteratee e a' m b'
m >>== f = Iteratee (runIteratee m >>= runIteratee . f)
(==<<):: Monad m =>
(Step e a m b -> Iteratee e a' m b') ->
Iteratee e a m b ->
Iteratee e a' m b'
f ==<< m = m >>== f
Note that the Enumerator
type is semantically equivalent to:
type Enumerator e a m b = Step e a m b -> m (Step e a m b)
Simple enumerators can be defined in terms of existing combinators. The basic format of an enumerator is that when it receives a Continue
step, it passes a chunk to the continuation to generate its returned iteratee. Other step types are passed through unchanged.
enumList :: Monad m => [a] -> Enumerator e a m b
enumList xs (Continue k) = case xs of
[] -> k EOF
(x:xs') -> k (Chunk [x]) >>== enumList xs'
enumList _ step = returnI step
More complex enumerators require building the result manually. Note that while the recursive step is much larger in this example, the fundamental layout (loop on Continue
, pass on others) remains.
enumHandle :: Handle -> Enumerator String ByteString IO b
enumHandle h = Iteratee . allocaBytes bufferSize . loop where
bufferSize = 4096
loop (Continue k) = do_read k
loop step = const $ return step
do_read k p = do
n <- try $ hGetBuf h p bufferSize
case (n :: Either SomeException Int) of
Left err -> return $ Error $ show err
Right 0 -> return $ Continue k
Right n' -> do
bytes <- packCStringLen (p, n')
step <- runIteratee (k (Chunk [bytes]))
loop step p
In some cases, it might make more sense to define this enumerator in terms of bytes rather than byte strings. The required changes are minor – the bytes are stored directly in the Chunk
list.
enumHandle :: Handle -> Enumerator String Word8 IO b
…
Right n' -> do
bytes <- F.peekArray n' p
step <- runIteratee (k (Chunk bytes))
loop step p
Enumerators generate data, iteratees consume it. When a value needs to generate a stream using another stream as input, it is named an enumeratee.
type Enumeratee e aOut aIn m b = Step e aIn m b -> Iteratee e aOut m (Step e aIn m b)
Most interesting transformations in iteratee-based code are enumeratees. For example, map
can be encoded as an enumeratee:
checkDone :: Monad m =>
((Chunk a -> Iteratee e a m b) -> Iteratee e a' m (Step e a m b)) ->
Enumeratee e a' a m b
checkDone _ (Yield x chunk) = return $ Yield x chunk
checkDone f (Continue k) = f k
checkDone _ (Error err) = throwError err
mapI :: Monad m => (ao -> ai) -> Enumeratee e ao ai m b
mapI f = checkDone $ continue . step where
step k EOF = yield (Continue k) EOF
step k (Chunk []) = continue $ step k
step k chunk = k (fmap f chunk) >>== mapI f
A more complex example: sequenceI
converts an iteratee to an enumeratee, by feeding it input until it returns EOF
. This is useful for chaining iteratees together, to support embedded streams.
finished :: Monad m => Iteratee e a m Bool
finished = liftI $ \chunk -> case chunk of
EOF -> Yield True EOF
_ -> Yield False chunk
sequenceI :: Monad m => Iteratee e ao m ai -> Enumeratee e ao ai m b
sequenceI i = checkDone check where
check k = finished >>= \f -> if f
then yield (Continue k) EOF
else step k
step k = i >>= \v -> k (Chunk [v]) >>== sequenceI i
A join combinator is useful for "extracting" an output stream from an enumeratee's result.
joinI :: Monad m => Iteratee e a m (Step e a' m b) -> Iteratee e a m b
joinI outer = outer >>= check where
check (Continue k) = k EOF >>== check
check (Yield x _) = return x
check (Error e) = throwError e
Oleg Kiselyov – Lazy vs correct IO