Jan 31, 2009

Visualizing Google’s bigram data: using data from Google that document how often words follow each other (“cold” and “winter”, say), this visualization was constructed. The closer to one of the two base words (here, hot and cold) a ray is, the more frequently the words in that ray follow the base word in Google queries, and the larger the the text size, the more occurrences of that word. (So the most common query that starts with “cold” would be “cold winter”.) You see this better in the pdf versions on the site.

There was a previous visualization of the same data, but I think these rays are clearer, since they words don’t obscure each other. An interesting way to look at word associations, certainly.

About
Daily Meh is written and edited by Simen (contact me). I live in Norway. This blog is about whatever interests me. Here are some of my favorite posts from the archives. You can subscribe via RSS.