A petabyte size electronic library using the N-Gram memory engine

A model library containing petabytes of data is proposed by Triada, Ltd., Ann Arbor, Michigan. The library uses the newly patented N-Gram Memory Engine (Neurex), for storage, compression, and retrieval. Neurex splits data into two parts: a hierarchical network of associative memories that store 'information' from data and a permutation operator that preserves sequence. Neurex is expected to offer four advantages in mass storage systems. Neurex representations are dense, fully reversible, hence less expensive to store. Neurex becomes exponentially more stable with increasing data flow; thus its contents and the inverting algorithm may be mass produced for low cost distribution. Only a small permutation operator would be recalled from the library to recover data. Neurex may be enhanced to recall patterns using a partial pattern. Neurex nodes are measures of their pattern. Researchers might use nodes in statistical models to avoid costly sorting and counting procedures. Neurex subsumes a theory of learning and memory that the author believes extends information theory. Its first axiom is a symmetry principle: learning creates memory and memory evidences learning. The theory treats an information store that evolves from a null state to stationarity. A Neurex extracts information data without a priori knowledge; i.e., unlike neural networks, neither feedback nor training is required. The model consists of an energetically conservative field of uniformly distributed events with variable spatial and temporal scale, and an observer walking randomly through this field. A bank of band limited transducers (an 'eye'), each transducer in a bank being tuned to a sub-band, outputs signals upon registering events. Output signals are 'observed' by another transducer bank (a mid-brain), except the band limit of the second bank is narrower than the band limit of the first bank. The banks are arrayed as n 'levels' or 'time domains, td.' The banks are the hierarchical network (a cortex) and transducers are (associative) memories. A model Neurex was built and studied. Data were 50 MB to 10 GB samples of text, data base, and images: black/white, grey scale, and high resolution in several spectral bands. Memories at td, S(m(sub td)), were plotted against outputs of memories at td-1. S(m(sub td)) was Boltzman distributed, and memory frequencies exhibited self-organized criticality (SOC); i.e., 'l/f(sup beta)' after long exposures to data. Whereas output signals from level n may be encoded with B(sub output) = O(-log(2)f(sup beta)) bits, and input data encoded with B(sub input) = O((S(td)/S(td-1))(sup n)), B(sup output)/B(sub input) is much less than 1 always, the Neurex determines a canonical code for data and it is a lossless data compressor. Further tests are underway to confirm these results with more data types and larger samples.

[1]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[2]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.

[3]  Paul G. Hoel,et al.  Introduction to Probability Theory , 1972 .

[4]  Allen R. Hanson,et al.  Image understanding architecture: exploiting potential parallelism in machine vision , 1992, Computer.

[5]  Laura M. Haas,et al.  Tapes hold data, too: challenges of tuples on tertiary store , 1993, SIGMOD '93.

[6]  Sompolinsky,et al.  Storing infinite numbers of patterns in a spin-glass model of neural networks. , 1985, Physical review letters.

[7]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[8]  Richard W. Watson,et al.  The emerging storage management paradigm , 1993, [1993] Proceedings Twelfth IEEE Symposium on Mass Storage systems.

[9]  Brian R. Gaines,et al.  Eliciting Knowledge and Transferring It Effectively to a Knowledge-Based System , 1993, IEEE Trans. Knowl. Data Eng..

[10]  Shi-Kuo Chang,et al.  Image Information Systems: Where Do We Go From Here? , 1992, IEEE Trans. Knowl. Data Eng..

[11]  Eugene L. Margulis,et al.  N-Poisson document modelling , 1992, SIGIR '92.

[12]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[13]  David L. Waltz,et al.  Trading MIPS and memory for knowledge engineering , 1992, CACM.

[14]  Joann J. Ordille,et al.  Database challenges in global information systems , 1993, SIGMOD '93.

[15]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Azriel Rosenfeld,et al.  Compact Object Recognition Using Energy-Function-Based Optimization , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Kevin Knight,et al.  Connectionist ideas and algorithms , 1990, CACM.

[18]  Charles D. Benjamin Role of optical storage technology for NASA's image storage and retrieval systems , 1990, Other Conferences.

[19]  Jean Tague-Sutcliffe,et al.  Complete formal model for information retrieval systems , 1991, SIGIR '91.

[20]  Michael Stonebraker,et al.  Database systems: achievements and opportunities , 1990, SGMD.

[21]  Steve Miller,et al.  Mass storage system reference model, Version 4 , 1993 .

[22]  Tommaso Toffoli,et al.  Cellular Automata Machines , 1987, Complex Syst..

[23]  Robert V. Hogg,et al.  Introduction to Mathematical Statistics. , 1966 .

[24]  Michail Zak,et al.  Terminal attractors in neural networks , 1989, Neural Networks.

[25]  Maurice V. Wilkes Artificial Intelligence as the year 2000 approaches , 1992, CACM.

[26]  Barr and Feigenbaum Edward A. Avron,et al.  The Handbook of Artificial Intelligence , 1981 .

[27]  Shun-ichi Amari,et al.  Statistical neurodynamics of associative memory , 1988, Neural Networks.