anansi: a NoWeb-inspired literate programming preprocessor

Anansi is a preprocessor for literate programs, in the model of NoWeb or nuweb. Literate programming allows both computer code and documentation to be generated from a single unified source.

Compared to NoWeb, Anansi’s primary benefits are the ability to include separate files, and to automatically generate an entire directory tree from a project without having to enumerate each output.

Code: https://john-millikin.com/code/anansi (GitHub mirror)

History

From https://en.wikipedia.org/wiki/Literate_programming:

Literate programming is a programming paradigm introduced by Donald Knuth in which a program is given as an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which a compilable source code can be generated.

I learned about literate programming from Literate Haskell, an official syntactic extension for the Haskell language that allows code and commentary to be easily combined in the same file. The language spec provides examples of Haskell code embedded within LaTeX. I was fresh out of university at the time and hadn’t learned how to organize non-trivial codebases yet, so having a clear example derived from the experience of Donald Knuth was attractive.

To learn more, I decided to write a D-Bus1 library in Literate Haskell. The first commit was a parser for D-Bus type signatures. After spending about a month filling in the library, it became clear that Literate Haskell is unrelated to the Literate Programming of Knuth. It was merely an alternative syntax for block comments, and offered no advantage over plain Haskell source.

Next I decided to get closer to the origins of Literate Programming, and use NoWeb. NoWeb is very similar to the original CWEB in behavior, and among other interesting design choices it receives the entire document input as a single source file. After a few abandoned attempts to generate this file from smaller fragments, version 0.5 of my library was released as two giant NoWeb documents: dbus-core.nw and Tests.nw.

I used NoWeb for roughly 8 months for haskell-dbus, experimenting with various ways of structuring the document and processing the LaTeX markup into a human-readable document. Eventually the pain of working within a single massive source file became overwhelming, and I decided to write my own literate programming tool that could consume a filesystem hierarchy. This was the first version of Anansi.

I continued to use and develop Anansi for a few years. It actually got pretty good output – see dbus-core_0.9.2.1.pdf for 102 pages of typeset D-Bus client library implementation.

But in the end, I was never able to realize the promised benefits of Literate Programming. The typeset PDF was not easier to read than hypertext documentation generated by Haddock, and both were obviously worse than the very nice docs being created by the Python community with Sphinx.

So in mid 2012 (~3 years after starting the project) I removed all the literate annotations and styles, dropped the fancy build scripts, and released haskell-dbus 0.10 as standard Haskell.

I also stopped working on Anansi at that time. It does work, it’ll do exactly what it says on the tin and process your literate source files to separate the content from the commentary. I’m just not not convinced any more this is a useful goal.

Getting started

Anansi has eight commands:

:define or :d
Declares that the following code is part of a macro. Macros may be included in files, or in other macros.
:file path or :f path
Declares that the following code should be written to the given file path. A single Anansi file often generates multiple output files in the target language.
:include path or :i path
Similar to C’s #include or LaTeX’s \input{}, :include will behave as if the contents of the file at the given path is inserted at the current position. This is useful for separating a large project into more manageable segments. Paths are resolved relative to the current file.
:loom loom-name
Sets which loom should be used when weaving the document. This is a string like anansi.latex or anansi-hscolour.html.
:option name=value
Sets an internal option. The only currently supported option is tab-size, which controls how many spaces a single tab is expanded to in LaTeX output.
::
Inserts a literal : into the output.
:#
A comment—any remaining text on this line is ignored.
:
End a file or macro code block. Every code block must be terminated.

To generate haskell code from an Anansi file, run anansi tangle -o "output path". To generate literate markup, run anansi weave -o "output path".

Examples

:loom anansi-hscolour.html
:option tab-size=8

<!DOCTYPE html>
<html>
<head><title>anansi example</title></head>
<body>
<p>This is an example of using Anansi to generate an HTML document
and Haskell code from the same source.</p>

:d main
main :: IO ()
main = do
	putStrLn "Here we are in our cool main function!"
:

<p>Lets pull in some imports, to demonstrate how literate programming
enables out‐of‐order document construction:</p>

:d imports
import System.Environment (getProgName)
:

:d main
	progName <- getProgName
	putStrLn ("This program was run as " ++ progName)
:

<p>And we'll write the whole thing out to a single file,
<tt>Main.hs</tt>:</p>

:f Main.hs
|imports|
|main|
:
</body>
</html>

which is rendered to:

This is an example of using Anansi to generate an HTML document and Haskell code from the same source.

«main»
main :: IO ()
main = do
        putStrLn "Here we are in our cool main function!"

Lets pull in some imports, to demonstrate how literate programming enables out‐of‐order document construction:

«imports»
import System.Environment (getProgName)
«main»
        progName <- getProgName
        putStrLn ("This program was run as " ++ progName)

And we'll write the whole thing out to a single file, Main.hs:

» Main.hs
«imports»
«main»

and generates a single Haskell file, Main.hs:

import System.Environment (getProgName)

main :: IO ()
main = do
	putStrLn "Here we are in our cool main function!"

	progName <- getProgName
	putStrLn ("This program was run as " ++ progName)

Custom output formats

While the basic looms available will work for some documents, many users will want to write their own looms to generate specialized markup. For example, one might want to use an existing LaTeX style for an academic paper, or Markdown for posting to a forum. Custom looms allow complete customization of the weaving process.

See the API documentation, and perhaps the source code to Anansi’s built-in looms (such as loomHTML).

Pandoc

Pandoc is a general-purpose markup converter. It can take pretty much any kind of text document and convert it into a different format. The anansi-pandoc library provides Anansi looms backed by Pandoc.

Code: https://john-millikin.com/code/anansi-pandoc (GitHub mirror).

HsColour

HsColour is a library for syntax highlighting of Haskell code. At the time it was the best option for generating nice colorful output. In modern codebases, you should use one of the many unified syntax highlighting libraries such as Pygments, src-highlite, HighlightJS, or Chroma.

Code: https://john-millikin.com/code/anansi-hscolour (GitHub mirror).


  1. D-Bus is the inter-process communication protocol used by most Free Software desktop platforms, most notably Linux.