
|
|
The Bible Codeby Michael Drosnin (New York: Simon & Schuster, 1997)Reviewed by Ronald N. Kahn "For three thousand years a code in the Bible has remained hidden. Now it has been unlocked by computerand it may reveal our future." So begins the jacket copy for The Bible Code by Michael Drosnin, a book receiving considerable public attention, though not much in the financial press. And yet we couldn't resist reviewing it because its methodologies are surprisingly similar to the worst data mining excesses of investment research. This issue's lead article on data mining discusses Norman Bloom, arguably the world's greatest data miner. He tried to prove the existence of God through baseball statistics and the Dow Jones average. Now Mr. Drosnin, armed with the Bible and a computer, has taken up the cause. The idea that the Bible contains encoded information has been around for quite some time. But in 1994, in Statistical Science, three statisticians reported their analysis of equidistant letter sequences (ELS) in the book of Genesis. An ELS is a fairly simple type of code. For example, a particular ten-letter word may begin with the 3,057th letter and continue with the 3,067th letter, 3,077th letter,..., and the 3,147th letter. Of course, words will appear encoded in the Bible just by random chance. So Doron Witztum, Eliyahu Rips, and Yoav Rosenberg devised a statistical test of whether Genesis contains any meaningful information. They assumed that if meaningfully related words appeared encoded "near" each other, that would imply meaningfully encoded information. So while the word "hammer" might appear at random and the word "anvil" might appear at random, these connected words wouldn't appear near each other unless the text contained meaningful encoded information. With that assertion, they developed a highly convoluted measure of the "closeness" of the encoded appearance of any two given words, chose a list of (according to them) meaningfully related word pairs (names and dates for a list of famous rabbis), and finally, analyzed whether those word pairs appeared closer than expected by random chance in Genesis. According to this test, the probability that random data would generate encoded word pairs as close as they observed was only 16 out of 1 million. Starting from this academic paper, author Michael Drosnin applied his computer to the entire Bible, without regard to any statistical principles. Searching now for individual words of interest, he then looked for other suggestive words nearby, backwards or forwards, after applying liberal interpretive skills. The result in his case is a book full of remarkable coincidences, completely lacking any statistical analysis of significance. For this review, let's consider the original statistical analysis and the popular book separately. The popular book is simply a fantastic example of data mining run amok. If Drosnin didn't find this coincidence, he would find another. If one interpretation of the word didn't fit, he used another. His quest would have been equally successful applied to War and Peace, Men are from Mars, Women are from Venus, or even an old Sears catalog. Proust's insight (see quotation in "Data Mining" article below) clearly applies here. The original statistical paper does include an analysis of significance. So my criticism here is more technical. The author's definition of closeness is so contorted as to defy much intuition, but it may be very sensitive to just a small number of very close observations. Another similar analysis, by Dror Bar-Natan, Alec Gindis, Aryeh Levitan, and Brendan McKay of the Australian National University, found no unusual closeness for the same famous rabbis and their most famous books. And finally, the occasional appearance of encoded word pairs near each other is simply a far cry from finding or proving (let alone decoding) any meaningful information encoded in the Bible. For investment researchers, The Bible Code is just a wonderful example of the seductive appeal of random patterns found in large data sets. The book, if not also the paper, ignores all four guidelines discussed on page 29 of this issue: intuition, restraint, sensibility, and out-of-sample testing. Researchersinvestment and biblicalignore these at their own peril. |
[client support]
[portfolio management]
[investment data]
[trading services] [search] [site map] [contact us] [home]
Any questions or bug reports regarding this service should go to contactus@barra.com. |