01 January, 2011

Assessing the quality of Google's N-gram viewer data

Here is one way to think of quality control for the data from Google's N-gram viewer: look at the relative popularity of the digits two, three, four, etc. I suspect we would expect a fall in popularity as the numbers increase, but that should itself be decelerating, so that three - two should be bigger than six - five. That is, indeed, what we find. This looks fairly stable by that measure.

No comments: