Sep 6, 2009

Entropy of English

Examples of simulated English:

Zeroth-order approximation: the symbols are independent and equiprobable. XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGXYD QPAAMKBZAACIBZLHJQD

First-order approximation: the symbols are independent, but frequency of letters matches English text. OCRO HLI RGWR NMIELWIS EU LL NBNESBEYA TH EEI ALHENHTTPA OOBTTVA NAH BRL

Second-order approximation: the frequency of pairs of letters matches English text. ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE

Third-order approximation: the frequency of triplets of letters matches English text. IN NO IST LAT WHEY CRATICT FROURE BERS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE

Fourth-order approximation: the frequency of quadruplets of letters matches English text. THE GENERATED JOB PROVIDUAL BETTER TRAND THE DISPLAYED CODE ABOVERY UPONDULTS WELL THE CODERST IN THESTICAL IT DO HOCK BOTHE MERG INSTATES CONS ERATION NEVER ANY OF PUBLE AND TO THEORY EVENTIAL CALLEGAND TO ELAST BENERATED IN WITH PIES AS IS WITH THE

It’s fascinating to see how this grows increasingly realistic. I wonder how high-order you need to go before the result would be exclusively actual English words.

About
Daily Meh is written and edited by Simen (contact me). I live in Norway. This blog is about whatever interests me. Here are some of my favorite posts from the archives. You can subscribe via RSS.