Algorithm to distinguish human and machine text?

homeblogtwitterthingiverse



Is there any way to distinguish human written from machine generated text?

If there were, it would sove web-page spamming. As Andrew Clausen points out, Google's page-rank is ultimately based on the cost of domain names. It might better if it were based on unique human written text... it might work quite a lot better.

So, is this doable?

We could look for telltale signs of computer generation, like repeated phrases, or phrasing that doesn't occur in human text. But as soon as we did that, spammers could use that exact same algorithm to produce better text. It would be an arms race, though it is plausible that someone as big as google could stay ahead of the curve.

What would be really neat would be if there were a way to detect human text that was hard to reverse. Some aspect of human writing that takes a *lot* of computation, but can be verified fairly easily. It seems somewhat plausible that there is something like this... the grammar of human languages still confounds natural language researchers, but maybe detecting that some text is natural-languageish is a lesser problem.

Any ideas?




[æ]