Knowledge discovery in an earthquake text database: correlation between significant earthquakes and the time of day

The authors take a real world application from a text database and present a case history. The techniques ultimately led to a discovery contradicting an accepted paradigm in seismology. Using simple, tailored, keyword extraction, they examined a text collection of earthquake data. A discovery was made when an unusual pattern emerged from the text. They then tested a more comprehensive numerical database, treating the the text discovery as a hypothesis. It was verified using a standard /spl chi//sup 2/ statistic. The hypothesis was significant earthquakes in the longitude regions that include California, occur more often in the morning hours than any other time of day.

[1]  A. Palumbo Lunar and solar tidal components in the occurrence of earthquakes in Italy , 1986 .

[2]  The use of time-of-day seismicity maps for earthquake/explosion discrimination by local networks, with an application to the Seismicity of San Diego County , 1990, Bulletin of the Seismological Society of America.

[3]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[4]  Donald E. Knuth,et al.  The art of computer programming. Vol.2: Seminumerical algorithms , 1981 .

[5]  Yiyu Yao,et al.  An analysis of vector space models based on computational geometry , 1992, SIGIR '92.

[6]  Fredric C. Gey,et al.  Experiments in the Probabilistic Retrieval of Full Text Documents , 1994, TREC.

[7]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[8]  Gregory Piatetsky-Shapiro,et al.  KDD-93: Progress and Challenges in Knowledge Discovery in Databases , 1994, AI Mag..

[9]  Ellen Riloff,et al.  Little words can make a big difference for text classification , 1995, SIGIR '95.

[10]  K. J. Lynch Knowledge discovery from historical data: an algorithmic approach , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[11]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[12]  Gerard Salton,et al.  Document Length Normalization , 1995, Inf. Process. Manag..

[13]  Evidence for Higher Seismic Activity During the Night , 1971 .

[14]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[15]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[16]  L. Knopoff,et al.  Lunar–solar periodicities of large earthquakes in southern California , 1983, Nature.

[17]  S. Hartzell,et al.  The fortnightly tide and the tidal triggering of earthquakes , 1989, Bulletin of the Seismological Society of America.

[18]  Haruo Sato,et al.  Statistical test of the tidal triggering of earthquakes: contribution of the ocean tide loading effect , 1995 .

[19]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[20]  S. Malin,et al.  Lunar and solar triggering of earthquakes , 1990 .

[21]  I. Sacks,et al.  On tidal triggering of earthquakes at Campi Flegrei, Italy , 1992 .

[22]  Kenneth Ward Church One term or two? , 1995, SIGIR '95.

[23]  Ricardo A. Baeza-Yates,et al.  Integrating contents and structure in text retrieval , 1996, SGMD.

[24]  G. Papadopoulos,et al.  Newtonian and post-newtonian tidal theory: Variable G and earthquakes , 1993 .

[25]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[26]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration , 1996, KDD.