11 January, 2011

Body bits


I'm glad tooth has been overtaken by penis, but brains have recently become wayyyy over-rated.

Wheat in war





Looks to me like wheat underwent a change during WW2 that cotton, and copper did not. Steel did its own thing. Why?

Periodicities in books?


What standard of evidence would we require if we were to claim we had identified a periodicity in these book-based data? We don't see annual cycles any more, of course. What about the frequency of Locke in this philosophical triumverate?

And if we were to make such a claim, what would it mean? Lawfulness (of a "physical" kind) in the abstract world of texts. We would look for corroboration, where possible, but that is hard for a single proper noun like Locke (though there is a related vocabulary, of course). You could use corpus-counts to identify the subset of terms that best triangulates the proper noun in question, and look for similar periods in their aggregated data. You would have to think geometrically in conceptual space.

The sexualities



Metrosexual doesn't quite make it, but homosexuality seems to be grabbing our attention. My take on this is that because hetrosexual is the default assumption, it is a label only necessary in order to contrast itself with homosexuality.

09 January, 2011

02 January, 2011

Physical vs Mental


Without comment.

Colors and Polarity

Here's a puzzle. If we look at colors, we see, as expected, that black and white are more common than the primaries, and the primaries are more common than any other color terms. But something unexpected has happened roughly since the early sixties: black and white together exhibited a rise in prevalence that is independent of the color terms. Has our thinking become more polarized?

(Note: there is an unfortunate Stroop effect here. The curves for black and white are not in black and white. Red, blue, green and yellow are drawn in their own colors. Brown is a light green. Sorry).

01 January, 2011

Assessing the quality of Google's N-gram viewer data

Here is one way to think of quality control for the data from Google's N-gram viewer: look at the relative popularity of the digits two, three, four, etc. I suspect we would expect a fall in popularity as the numbers increase, but that should itself be decelerating, so that three - two should be bigger than six - five. That is, indeed, what we find. This looks fairly stable by that measure.

God vs the Devil

God vs the devil. Oddly, the data for God reported in the Science article looked very very different. It will take a while before anyone has much confidence in these figures. God is actually doing quite well, and has overtaken the devil in popularity.

The N-gram viewer is released.


By now, you will be well aware that Google released the n-gram viewer, looking at 1- to 5-grams (groups of 1 to 5 words) in a corpus containing about 4% of all books published from 1800 to today. There is so much. The Culturomics paper just hit Science. Lets play with it a bit. Here is hell vs heaven.