haskell-enumerator: An implementation of Oleg Kiselyov’s left-fold enumerators

Code: https://john-millikin.com/code/haskell-enumerator

History

This library is unmaintained and obsolete, and is of historical interest only.

It was originally a simplified implementation of Oleg Kiselyov’s left-fold enumerators (hence the name). Even with a reworked interface and smaller scope, the conceptual model of iteratees and enumerators proved too complex for widespread adoption. Error handling was particularly hard to reason about.

I was never able to find a use case that enumerators handled better than plain imperative code, and eventually abandoned the whole idea.

The most notable user of enumerators was Michael Snoyman's http-enumerator, which was also abandoned due to complexity.

For those interested in more information, several articles and tutorials are available:

And I made an attempt at it too, in Understanding Iteratees.

Original Project Summary

Say you want to read in a huge file. You're going to calculate its checksum, or count how many newlines it has, or whatever. How do you do that when the file's larger than your machine's memory?

In most languages, the answer is "write a loop". You read the file in small chunks, then run those chunks through whatever processing you want to do. Each loop might do something different with the data, but they've all got the same boilerplate structure.

Haskell programmers noticed that if you squint a bit, files look like really long lists of bytes. Haskell already has tons of functions which work on lists, so all they needed to do to get easy file processing was trick the compiler. The trick these programmers used is called "lazy I/O".

It turns out that lazy I/O has a big downside: it makes thinking about the program's resource requirements very difficult. Servers based on lazy I/O tend to run out of file descriptors, or allocate huge amounts of memory, without any obvious way to fix them.

Another approach to the problem is to go back to the original buffer/loop design, and chop it up. Loops are split into a data source (or enumerator), a data sink (or iteratee), and intermediate data transformers (or enumeratees). These types are composable just like basic list functions, so it's easy to build up complex data processors from re-usable components.

Examples

Here's a quick example; we're going to count how many Unicode characters are in a UTF-8 file.

import Data.Enumerator as Enum
import Data.Enumerator.Binary as Binary
import Data.Enumerator.Text as Text
import System.Environment (getArgs)

main :: IO ()
main = do
	args <- getArgs
	let filename = case args of
		[] -> error "Need a file to read from!"
		(x:_) -> x
	
	-- 'enumFile' is an enumerator (data source), which opens a file and
	-- streams its contents. 'decode' is an enumeratee (data transformer),
	-- which converts bytes into Unicode text.
	let enumFileUtf8 = Binary.enumFile filename $= Text.decode utf8
	
	-- 'fold' takes an update function and initial state, then runs for
	-- each character in the stream. Here we start with 0, then increment
	-- by 1 for each character.
	let countChars = Text.fold (\n _ -> n + 1) 0
	
	-- Up until now, we've just defined stuff. 'run' causes the pipeline
	-- to execute, and returns whatever the final iteratee yielded.
	count <- Enum.run_ (enumFileUtf8 $$ countChars)
	print count