Search This Blog

Follow adrianbowyer on Twitter

My home page

Thursday, 22 July 2010


Stylometry is the battery of techniques used to compare the authorship of documents. Stylometrists do things like counting the frequency of the uses of "and" and "the". Similarity in a whole group of such measures and related statistics can say with reasonable certainty whether two documents were written by the same person.

Stylometry is usually carried out for academic or forensic purposes: literary researchers want to identify the authorship of interesting anonymous works; the police want to identify the authorship of the ransom note.

But the web is a vast collection of samples of writing. It ought to be possible to run a crawler over it that logs the stylometric measures of every piece. A search engine could then be constructed that allowed anyone to submit some text for analysis. The engine would report back the most similar pages on the web in terms of writing style.

Would this be intrusive or useful? Probably both - people could check whether the reference they have just been given was actually written by the referee, or by the job candidate. Ghost writers of celebrity (auto)biographies would be exposed. It would just be interesting to know that Ian McEwan's style is closer to Margaret Atwood's than to Martin Amis's. And you could take any piece of writing from anywhere and find out who in the world is most likely to have written it.

No comments:

Post a Comment