"Science may be described as the art of systematic over-simplification."
-Karl Popper
A miscellany of under-researched ideas.
Some of these may've been thought of before. If they have been, put a link in a comment and I'll be happy to acknowledge prior art.
All original ideas and inventions here and the text itself are open-source and are licenced under the GNU General Public Licence.
Thursday, 22 July 2010
BaconSpeare
Stylometry is the battery of techniques used to compare the authorship of documents. Stylometrists do things like counting the frequency of the uses of "and" and "the". Similarity in a whole group of such measures and related statistics can say with reasonable certainty whether two documents were written by the same person.
Stylometry is usually carried out for academic or forensic purposes: literary researchers want to identify the authorship of interesting anonymous works; the police want to identify the authorship of the ransom note.
But the web is a vast collection of samples of writing. It ought to be possible to run a crawler over it that logs the stylometric measures of every piece. A search engine could then be constructed that allowed anyone to submit some text for analysis. The engine would report back the most similar pages on the web in terms of writing style.
Would this be intrusive or useful? Probably both - people could check whether the reference they have just been given was actually written by the referee, or by the job candidate. Ghost writers of celebrity (auto)biographies would be exposed. It would just be interesting to know that Ian McEwan's style is closer to Margaret Atwood's than to Martin Amis's. And you could take any piece of writing from anywhere and find out who in the world is most likely to have written it.
No comments:
Post a Comment