I’ve been thinking lately of something that’s a sort of complex document editing system for regular people to use. Kindof. Something significantly less “powerful” than what you might read into that, and also substantially more exotic than what you might expect. Disregarding details for the moment, suffice to say that it involves digitization: taking page scan images of old books, and making these into readable and structured “text” documents of some sort.
Now “text” sounds like a word that makes sense, when you first encounter it, because we all know what texts are: if you’re reading this, you know they start at the top left of a page and sortof stream off to the right in a trickling line of letters. This text starts with “I’ve” and ends with some x ∈ {”afterwards”, “allowed”, “them”}2. We’ve all got our heads pretty well around the notion of “a text”.
You can capture a lot of crappy novels and essays and poetry and religious screeds with “just text” like that. If they’re fancy novels, some of the text may be italicized, or boldfaced—but that’s OK. We can handle that without breaking stride, because of course we have these little inline modifiers you might think of as state-machine flags that let us render differences in an abstract way… without breaking stride. The words “fancy” and “state-machine flags”, in this paragraph for instance, I magically “made” italic by slipping in a little flag that looks like <i> when I wanted the text to start being italic, and </i> when I wanted it to stop and return to roman typeface. And of course you and I can imagine a lot of those: bold, italic, color changes, size, font, flashing, underline, &c &c.
Our lives are filled with moderately useful texts because we can do this shit on the fly. The little state machine in the head of the page-rendering device (not to mention the reader) gets toggled into “italic” mode, or “red text” mode or whatever, carries on as before with the caveat that all the new characters are expected to be in this slightly altered new state (slanty, reddish, flashing), and then pops back out of that state and into the ground state of “OK, no more slanty? Good. Now where were we?”
This is true of quoted text as well, of course. Quotation marks are, “perhaps the most important innovation in the history of typography after the carriage return.”1
But of course not everything useful and interesting in the world is written using a single piddling line of text that starts at the upper left of the first page and ends at the lower right of the last page. There are complex structures in a lot of useful works, and they present a bit more challenge—often, it seems, an overwhelming challenge to the lay writer. Consider, for example, the footnote: There’s a little footnotey call-out thing in the text, but there’s also a related thing down somewhere else.
What’s up with that? Well, that’s a branch, then, isn’t it? You’re expected for one reason or another to put a mental finger down here where the callout falls—no matter how interesting the sentence might be getting—and hare off to over there to see (or do) whatever it is the author thinks is so goddamned important, and then drag yourself back here when you’re finished with that little piddling line of text and try to remember where you were when you were so rudely interrupted.
Branching. “GOSUB”, as we used to say in the old days before computers sat on your lap and showed you porn and thought in Objects and grep and stuff.
Now branchey things that work like GOSUB can get you a long way towards interesting when it comes to document structure. Indeed, all the way, it turns out. A Very Smart Man thought Very Hard about it some time back, and realized that in fact you can pretty much do whatever you want that way. Along with an armamentarium of state-changing toggles, this fine gentleman and his colleagues have demonstrated that branches and sub-branches and sub-sub-suchlike can allow one to type a single piddling stream of characters that describe complex mathematical equations with matrices and decorative initials and multicolumn layouts and all kinds of stuff, and footnotes and endnotes, and tables, and even a lot of drawings and diagrams. Indeed, it can be shown (has, really, pretty much) that one can do essentially whatever one wants in this way. Turing Said So.
Note however that I didn’t say “you”. I said “one”. “One can” is a phrase We use in Science to toggle the discussion into the Realm of Potentialities, where long-enough levers and places to stand are available in sufficient quantity to move the world, where there’s time and space enough to write down the very last number of all and then one more, and where user interfaces are all powered frictionlessly by The Power of Pure Mathematics™.
And I am here, my friend, to hand you the real, actual key to that power-filled magical modelly kingdom. The gateway to typographic richness awaits when you have grasped it firmly in your hand and stuck it in the hole which I will perhaps later disclose.
What has to happen for this one to actually be you, it turns out, is that you need to get your head around the fact that a document is composed of a set of abstract containers, each with their own little piddling stream of text (or other symbols) in them, and that these are linked in a branching tree structure, and that in creating the picture of the document on the virtual page your tree-like abstraction will be completely traversed in a predictable pattern.
Huh? No, really, that’s it. That’s the key right there. Really. Ask any of those handful of LaTeX people who are reading along and nodding over there: In order to use HTML (with or without CSS), or LaTeX, or even glowing raw atomic XML to create a document with any structure to it, you need to understand that the stream of text you write (like the stream of text I’m writing) is a series of instructions to a single computery machiney thing, and that the relationship among the parts of the document (like the relationship between a footnote and the callout, or an integral sign and the equation being integrated over, or a table and its rows) can be represented as a stateful tree, and as long as we all agree that how that tree is drawn is something predictable and manageable, you can represent just about anything.
So for instance, this HTML document you’re looking at has as its root a “document” (duh), and that is broken into one “head” branch and one “body” branch, and in the body branch there are among other things many “paragraph” branches, and inside some of those there are some “italic” or “bold” branches. And all the leaves of that HTML tree, the letters in the document that the computer is reading to show you these words, are either letters (”draw an ‘a’”), or commands to change its state. Same with the amazing LaTeX, and the ubiquitous and disarmingly simple-seeming XML flavor of the month.
And if at any point it seems as if you should not want to use a tree, well you still can. For example, if you wanted to have a part of a sentence be red, and a part of it be italicized, and those pieces overlapped, you can still make it treelike by real-quick closing the italics inside the red part, and then real-quick re-opening them instantaneously. So you have one branch where it’s red, with a sub-branch of the red part that’s also italic, and then it’s back to normal instantly before branching off to become (just) italic. Otherwise, like, it’d be an error; this way, it’s like sleight of hand, invisible to the eye (and mind) of the reader.
Voila! The place to stand, the lever, and the world.
When one understands the lovely, mathematically pure tree structure of documents, one is good to go. Really. One can pretty much re-create anything that will appear in pretty much any book, anywhere, by typing a single piddling stream of letters and having computers read that single stream and change states and plop letterforms and lines and stuff on a virtual page. The bits about placement (where are the lines? how much space to leave at the bottom of the page for this footnote? how many are there? how do you get the edges to line up pretty? hyphenation or no?), that’s details that a good computer program can manage, with some parametrized direction.
So, again: key. One transforms a complex typographic object into an abstract treelike thing, and following relatively simple syntactic and semantic rules one traverses that abstracted document’s tree in a predictable way, and makes that tree into a single piddling stream of characters.
Which as everybody knows, is what our computers can handle. They’re serial computers, and that’s good enough for me, by gum. It’s how we think, after all: serially. Right?
Go forth, therefore, and become productive creators of documents. Take this newfound power, and do your work. You have the key.
And then come back, and we shall discuss the hole.
[stage direction: state changes to "done" and then right back to "still rambling" right here. Like sleight of hand. Invisible to the reader.]
Back so soon? Ah, curious, is one? Well, OK there, big fella. Enthusiasm for learning is an admirable trait that should be encouraged whenever encountered.
Here is a page from a book I scanned two days ago. It’s a German book about Slavic folklore. I am thinking about this page.
From what I gather, it appears to be about a dialog between a son and his mother (I read only scant German). Those em-dashey things there in the middle section? Those are line-by line translations of the Slavic (left) into German (right); each dashed line on the left has a directly corresponding line on the right. They don’t quite line up correctly, since German such a wordy long-winded-like language is, but each Slavic line is associated with exactly one German line.
Or vice versa, depending on how you see things.
Simple, ja? There are twenty or so other interlinear songs and poems in this book, and it happens to be the third book I’ve scanned recently that contains some sort of interlinear text. Another one was a book of Persian poetry translated in the 1830s into French page by page, and a still ‘nother one was an amusing Italian-English touristy phrase book, which presented common travelers’ phrases aligned word by word with glosses from the complementary language.
So you’re armed with the key, the Power of Pure Mathematics™, and by this late date [like sleight of hand] you are learnéd and wise in the ways of document construction and abstraction, and are as well filled with ability to traverse trees of all sorts and render documents in a variety of pleasing aesthetic styles.
And you’re armed with the Internets (or you wouldn’t be here) and so you’re able to Google a little and see that there are some lovely, nay beautiful document-processing models available for TeX and other structural languages (even XML) that manage to handle interlinearity. “Handle” in the sense that you can indeed, as expected, type a single piddling stream of text, and convert this metamagically into a lovely abstract branched tree structure that the computery bits will render in a pretty way, just like you’re used to. Just like you think.
Solved! So what’s the problem? “I have the key; show me the hole!”1
Parallelism. OK, parallelism; happy now? That’s the hole. See, we can run all sorts of programs on our serial computers; does that mean we always should? [Hint: Go ask Uncle Turing. And Uncle von Neumann, while you're there.]
Taking our Slavic/German interlinear text as an example. Ladderlike as it may appear, we should all agree by now that it can productively be represented as a tree: A bunch of paired branches, with Slavic passages linked directly to and associated with the appropriate German. And I could do the same with the Persian/French book, where perhaps the branchpoints would be page-by-page associations, or the English/Italian travel guide, where branchpoints link gloss to word or short phrase.
But you know, these documents have a structure in them that isn’t exactly a tree. As long as we’re comfortable enough to have traveled this far along on just the Power of Pure Mathematics™, an interlinear text is perhaps better seen as a set of parallel edges—a directed acyclic digraph (DAG), where the single stream of piddling text sortof splits into two streams, each carrying on at their own pace (since German such a wordy long-winded-like language is), which then converge at the other end when their little independent processes are done. Like, I don’t know… threads or something.
[Cue choral Angelic Reveal music.] Hole.
Right. So armed as we are with Pure Mathematics blah blah, we know that, algorithmically, you can serialize any DAG by any one of a handful of simple well-regarded traversal heuristics. We impose order on sibling branches in some formal way, and that means that the re-connecty edge where they converge can be transformed back to the root where they diverged, and one gets a tree. And we all know one can make a book from a tree whenever one wishes.
But does one have to make a book from a tree? Can one make a book, or a page, or a document from a DAG itself? Can a book, or a page, or a document be a directed graph that is maybe even cyclic?
Surely not. That would be madness. Well, it would be hard. -Er.
But let’s play for a moment longer in the world where levers abound which can move the world. Consider the interlinear text on my pictured page; it presents the German as if it was annotation of the Slavic, and imposes a structure on the document as if the German glosses were a series of local footnotes. And that makes sense, because of course it’s a German book dummkopf, so the Slavic is the Weird stuff and the German’s just, like explanation. But of course, one can imagine a text in which the same lines were presented, but in a different context in which the Slavic annotated the German. And of course, one can a text imagine in which the long and wordy-German glosses word-by-word associated with the Slavic counterparts were. But that would be hard, in some sense… except that we have an example of it in the English/Italian dictionary I mentioned.
But the English/Italian dictionary I mentioned was described as “amusing”, and I’m sure you inferred correctly that the amusement arises from the ridiculous lengths to which grammar were stretched and contorted into contorted contortions by those effortful word-by-word glosses of one language she was into another. It would have been reasonable, in that context, to expect a (magical computer-empowered 1820s) translator to present both sentence-by-sentence and word-by-word glosses. For, you know, pedagogy: you could learn what words directly correspond to the ones you’re used to, but also how to string them together to not sound like an idiot tourist.
So how would you represent a multiscale interlinear text? That is, one in which there was an association between paragraphs, and also an association between words?
Tree, you think? Is that wise?
So here’s what this is really about: hypergraphs. A gloss is a kind of annotation, that connects a subset of the characters in one “stream” of text to some or all of the characters in the other stream. Unlike a footnote, a translation or an annotation or a hyperlink is not merely an aside: you don’t just put your finger in the area and start off in a parenthesis. These are content associated with a block of text, a substring. We have no confidence that a second translation, associated with the same general block of text, will “line up” correctly.
If we need to use two footnotes, it’s easy enough to put two footnote markers right next to each other; we expect to be coming back to the same place once the side-trip is finished. But a translation isn’t a footnote; it refers to some stuff. That’s crucial. That’s dangerous.
Or fun, depending on your view of life.
So let’s generalize from translations to all kinds of other annotations. Suppose I annotate a document with other kinds of meta-data: who’s speaking, or whether the words are amusing or sexist or need to be spoken louder, or overlay user comments from five different readers. The portions of the original text marked up—associated—with these annotations should be allowed to overlap, like red and italic can overlap. But unlike those traits, it makes no semantic sense to think of them as non-overlapping branches or pure state transitions of the text stream.
But we’re obliged, because of the single piddling stream design of our computers and word processors and brains (well, maybe not those) to make them trees. And sometimes that hurts more than it helps.
Here is an example: If Hugh highlights the red text in this passage, and associates with it an amusing tangential comment, and at some other time Stephen highlights the yellow section and associates it with a translation into Slavic:
Hold the newsreader’s nose squarely, waiter, or friendly milk will countermand my trousers.
should we think to represent these overlapping substrings with a tree? It can be represented as a tree—otherwise I would not have been able to show it to you at all on this page!
But it was a royal pain in the ass of a tree for me to type into my blogging software by hand, let me tell you. Uncle Turing said you can do it that way, just as you can do most anything, and you should believe him because he was Extraordinarily Smart. But sometimes there’s a world of hurt lurking in the shadowlands between can and should. What should we do to represent annotations? Multiple versions? Corrections? Of course we can do as we always have, since the 1940s. That would be comfortable enough, and it wouldn’t hurt our leetle brains and computer thingies and presumptions of structural niceties and norms.
But… hyperlinks? Here you are, having visited my sorry web page by following a link that is just the sort I’m bitching about. How come hyperlinks get to have whole swathes of text associated with them, and not just little instantaneous dabs like footnotes do? Why are they so special? Bastards.
Ah. But we’re not allowed to overlap them. Minor inconvenience? Simple enough to work around.
This tree thing, this serialization: a very big key. My problem with overlapping parallelizable texts? a teeny tiny little hole. Let’s just hold off on shoving it in there, shall we? Could be ouchy.
Aha. Luckily, we are still chatting amiably about this in the world of mathematical abstraction and make-believe, and not trying to write any practical typography code or do anything having the remotest practical implication for actual document structures (except the Web, but that’s different). We all know simple trees suffice for everything we want to say or do in print, so we’re off the hook, except insofar as this is an interesting theoretical exercise. Or as some sort of weird, foreign-seeming, impossible futuristic thing as utterly removed from the proper representation of communication and knowledge as, say, the π-calculus is from the proper representation of mathematical and algorithmic dynamics we know more reasonably under the guise of friendly old λ-calculus.
And after all, we still all type documents from the beginning, and work our way to the end, and never stop along the way and cycle back… just the same way we tell our computers to cope with them. Occasionally we add digressions and branches, or change states in a reversible and transient way.
As long as we work alone. By ourselves. As individuals. It’s easy, then. We can cope with the single stream.
It makes sense to talk about “a text”, when one person has written it, because we can imagine the time when they stopped and said “Done!” as a little infinitesimal state-change. Like sleight of hand. But when all sorts of other people are collaboratively fucking around with those “simple texts”, and doing things like forking them and merging them and hyperlinking them… well, that’s crazy talk.
But in general we can do anything we prefer to do the way we do it. Sometimes it’s fun to take a walk on the wild side, and wonder what might happen if texts and documents weren’t exactly mappable—easily—to single-thread serial processes. But that stuff is weird, and has little beyond academic interest when it comes to the massively practical undertaking of digitizing texts.
Unless you want to annotate them. Unless you want to share them. Unless you want more than one person to talk about them.
1 Why did I put that in quotes? I made it up, after all, right there on the spot. Did you “hear” it in your head in a different tone, as if I was declaiming instead of just spouting nonsense? There you go; gotcha. And also, I used the same footnote more than once?! Is that fair? “Is that even allowed?”1
2In one sense, “them” is the last word of this piece. But this footnote lies, physically, afterwards.

