Revisiting Little Brother: the illusion of being unique

The four bash agents, ‘cat’, ‘tr’, ‘sort’ and ‘uniq’ easily collaborate in order to provide us in less than a second with a list of unique words in a text.
Their play allows for different readings.

The most spontaneous one is visual. If one turns the listing horizontally, it reads as a heartbeat frequency of a fit human body, never dying, never extending to unusual highths.
In a more classical linear reading one discovers some beautiful surrealistic pearls that remind of Gertrude Stein, like the following sentence: ‘Those though thought thoughtfully thoughts thousand thousands thousandth thrash thrashed.’

And then of course, a comparative reading is very tempting.
A unique wordlist of Orwell’s 1984 next to Doctorow’s Little Brother shows how timely words are. Orwell’s use of ‘super’ f.i. is contained in the series ‘superfluous, superior, superlatively, superseded, superstate, superstates’; while Doctorow’s style is so 21st century, using ‘super, superaids, superbug, supercareful, superchairs, superconscious, superhighway, superior, superprivate, superseded, superspy, superstrict, supersure, superuseful, supervised’.

The same uniqueness is found in the use of numbers.
Look at this beautiful pyramid by Cory Doctorow:

whereas the zeros and ones in Orwell’s 1984 are limited to ‘100, 101, 111’. Persons who read the novel will shiver by the view of 101 in this series. Other numbers in 1984 are much more related to historical events in the 20th century, like ‘1900, 1914, 1920, 1925, 1930, 1940, 1944, 1945, 1960, 1965, 1968, 1970, 1973, 1983, 1984’.

And finally, this very simple act of four Bash agents allows for an intersting algorithmic misreading by numbers.
Here comes the Judgement of Judgements, inspired by the fear of repetition that all writers are familiar with. Doctorow’s text counts 9825 unique words in a novel of 109260 words in total. This means 1 in 11 words is unique; whereas Orwell’s score is 1 in 10,4 and, oh Canon, Joyce’s Ulysses counts 23293 unique words for a total of 182361 words, which means 1 in 7,8.

Fortunately one of the most famous sentences in English highlights easily the relativeness of being unique. The tragedy of the Bash agent Uniq is that she misses out the utter beauty of Gertude Stein’s ‘Rose is a rose is a rose is a rose.’

cat Cory_Doctorow_-_Little_Brother.txt | tr ” ” “\n” | tr -d [:punct:] | tr [:upper:] [:lower:] | sort | uniq > novel.txt

Cat puts all words in a list, takes out all punctuations, and transforms all uppercase in lowercase.
Tr is the abbreviation of Translate or Transliterate, she literally ‘translates’ characters, replacing a space by a line break, deleting all punctuation and translating uppercases in lowercases.
Sort rearranges the numbers and words in numeric and alphabetical order.
Uniq takes out all doubles.

Inspired by:

You can find more versions of the novel on this blog and on gitorious.

