Revisiting Little Brother: a 1984-spam Vanessa took off his glasses

June 27th, 2014 § 0 comments

‘I would have been ten or fifteen pointy little bastards and we held each other and ground my jaws together. “We just drifted apart.” We walked like two people whom he did not seek power for its freedom. If I get a new Middle group splits off from the larynx without involving the higher brain centres at all. Likelier still, nobody knew how we were rubbing our sweaty bodies against each other as long as they parted at the Party is infallible. But since in reality disappeared. Each of the day. Of course. I was going to do so. It was now his diary. And in the future was unimaginable. What certainty had he added half-nostalgically: “‘Oranges and lemons,” say the – Something changed in some connection to this country. But your father is –” She bit into it. Chewed. Swallowed. Gave every impression of having very few words were not continuous. There had been singled out for the counter-attack. Four, five, six – in some way so that his heart a feeling of her t-shirt. Her warm tummy, her soft navel. They inched higher.’

The Python agents’ association called ‘Markmix‘ offer us a version of Doctorow’s Little Brother that brings the novel back in the loop, merging Orwell’s 1984 back into Little Brother. 1984 was a clear inspiration for Doctorow. Many details in the text reveal the source, as f.e.the first nickname of the main character Marcus Yallow. He calls himself w1n5t0n. Pronounced “Winston.”

Brendan Howell wrote a script in which he adapted one of the most used algorithms in our networked society: the Markov chain.

In its classical probability theory use, it is applied to numbers. As Florian Kramer mentions in his Words made Flesh, Claude Shannon describes the use of Markov chain for statistical text analysis in 1948. In 1984 the idea was written into the software Travesty by Hugh Kenner and Joseph O’Rourke. The source code was published in the computer magazine BYTE. The algorithm “scans the text for the transition probability of letter or word occurences, resulting in transition probability tables which can be computed even without any semantic or grammatical natural language understanding.” It can be used for analyzing texts (and not only text), but also for recombining them. That is what the Markmix does.

The result reads like spam. This is no coincidence. The Markov chain is widely used in attempts to pass messages through spam filters. Anti-spam software uses Bayesian analysis that is based on the Markov chain to keep up with spam techniques. In Little Brother, Marcus’ father is using it as well, “all the time at my work. It’s how computers can be used to find all kinds of errors, anomalies and outcomes. You ask the computer to create a profile of an average record in a database and then ask it to find out which records in the database are furthest away from average. It’s part of something called Bayesian analysis and it’s been around for centuries now. Without it, we couldn’t do spam-filtering –”

Hence, spam and anti-spam writers are continuously producing new ways to analyze and generate text, in a conversational manner. Whether we are encrypting a word like ‘Viagra’ or an entire novel that is still under copyright, the method is based on the same principle. The words of a text are scanned and sorted into a lookup table or “dictionary” of short chunks of text. The Markov generator begins
by organizing the words of a source text stream into a dictionary, gathering all possible words that follow each chunk into a list. Then the Markov generator begins recomposing sentences by randomly picking a starting chunk, and choosing a third word that follows this pair. The chain is then shifted one word to the right and another lookup takes place and so on until the document is complete. It allows for humanly readable sentences, but does not exclude errors the way we recognize them when reading spam.
In this case, a pair like, “Winston Smith”, would get attributed a series of values generated from the original text, like f.ex. “ate”, “said”, “walked”, … Whenever the name would appear in the newly generated novel, it would be followed by a word as it appeared in the original text.
The remix of the two texts happens in triples that merge into a new novel that is the average in length of the original text and the lists of words.

The beauty of using a Markov chain in this context, is multiple. First of all, this kind of linguistic analysis is yet another method Marcus Yallow could have applied in Little Brother to encrypt data, like in spam messages; or to generate noise in the profiling and surveillance techniques they (and all of us) are faced with. Ben Grosser’s Scaremail project uses Markov chains to generate a bulk of text full of words that are considered ‘dangerous’ by the NSA and includes it into your mail message.
Markov chains are indeed the common tools of today’s networked society. The applications are many, in fields as varied as gaming, physics, biology, weather forecasts, … Also Google’s PageRank algorithm is based on a Markov chain whose aim is to calculate for each web page the probability that a reader following links at random will arrive at that page.
Finally, the use of Markov for literary texts relates straight back to its origins. Andrey Markov was a Russian mathematician who lived in the second half of the 19th century. In a challenge to undermine the idea of his colleague Nekrasov that the law of large numbers applies only to independent events, he set out on a journey that would grant him the discovery of one of the most common contemporary tools, partly because he was the first to apply mathematics to poetry. He chose Pushkin’s prosaic poem Onegin, which Russian schoolchildren recite. The details of how he proceeded, are worth reading, so is Hayes’ full article, as the story features ‘an unusual conjunction of mathematics and literature, as well as a bit of politics and even theology’.

Applied to the text of only one author, markov.py would generate a parody of the text, keeping close to the style of the author. In theory, the Markov association could thus generate an infinite number of ‘real’ ghost writers. The first ghost writer of this kind was Mark V Shaney, a synthetic Usenet user in the beginning of the 80s, whose postings in the net.singles newsgroups were generated by Markov chain techniques, based on text from other postings. The user name is a play on the words “Markov chain”. Many readers were fooled into thinking that the quirky, sometimes uncannily topical posts were written by a real person. Quotations of his texts read like poetry : “I spent an interesting evening recently with a grain of salt.”
As this quotation shows, the Markmix’ most important quality of all lies in the linguistic freedom of the software. It possesses the natural power to voice objects, transgender characters or create new metaphors like in the last sentence of the fragment: Four, five, six – in some way so that his heart a feeling of her t-shirt. Her warm tummy, her soft navel. They inched higher.

You can find more versions of the novel on this blog and on gitorious.

Tagged ,

Leave a Reply