Data-centric computing with the Netezza architecture.

While relational databases have become critically important in business applications and web services, they have played a relatively minor role in scientific computing, which has generally been concerned with modeling and simulation activities. However, massively parallel database architectures are beginning to offer the ability to quickly search through terabytes of data with hundred-fold or even thousand-fold speedup over server-based architectures. These new machines may enable an entirely new class of algorithms for scientific applications, especially when the fundamental computation involves searching through abstract graphs. Three examples are examined and results are reported for implementations on a novel, massively parallel database computer, which enabled very high performance. Promising results from (1) computation of bibliographic couplings, (2) graph searches for sub-circuit motifs within integrated circuit netlists, and (3) a new approach to word sense disambiguation in natural language processing, all suggest that the computational science community might be able to make good use of these new database machines.

[1]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[2]  Peter J. Haug,et al.  Classifying free-text triage chief complaints into syndromic categories with natural language processing , 2005, Artif. Intell. Medicine.

[3]  Amos Gilat,et al.  Matlab, An Introduction With Applications , 2003 .

[4]  Aravind K. Joshi,et al.  34th Annual Meeting of the Association for Computational Linguistics , 1996 .

[5]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[6]  M. Meyer Of Problematology: Philosophy, Science, and Language , 1995 .

[7]  Yorick Wilks,et al.  Subject-Dependent Co-Occurence and Word Sense Disambiguation , 1991, ACL.

[8]  V. Raskin,et al.  Word sense disambiguation: why statistics when we have these numbers? , 1997, TMI.

[9]  Richard D. Greenblatt,et al.  LISP Machine Progress Report. , 1977 .

[10]  Stephen Wolfram,et al.  The Mathematica Book , 1996 .

[11]  Alan Edelman,et al.  Parallel MATLAB: Doing it Right , 2005, Proceedings of the IEEE.

[12]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[13]  J. Fodor,et al.  The structure of a semantic theory , 1963 .

[14]  Paul Pimsleur Semantic frequency counts , 1957, Mech. Transl. Comput. Linguistics.

[15]  Thomas E. Kurtz,et al.  Back to BASIC: The History, Corruption, and Future of the Language , 1985 .

[16]  Martin C. Cooper A Mathematical Model of Historical Semantics and the Grouping of Word Meanings into Concepts , 2005, Computational Linguistics.

[17]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[18]  Kevin W. Boyack,et al.  * Sandia Is a Multiprogram Laboratory Operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under Contract De-ac04-94al85000. Quantitative Evaluation of Large Maps of Science , 2022 .

[19]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[20]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[21]  Alan Wood,et al.  Structure and Interpretation of Computer Programs, 2nd Ed by Abelson and Sussman, with Sussman, MIT Press, 1996, ISBN 0-262-51087-1, 657pp. , 2001, Journal of Functional Programming.

[22]  Sergei Nirenburg,et al.  A Situated Ontology for Practical NLP , 1995 .

[23]  David A. Moon,et al.  Architecture of the Symbolics 3600 , 1985, ISCA '85.

[24]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[25]  Dean W. Lytle,et al.  A figure of merit technique for the resolution of non-grammatical ambiguity , 1965, Mech. Transl. Comput. Linguistics.

[26]  Sergei Nirenburg,et al.  Semantics in Action , 1999 .