Cluster of re-configurable nodes for scanning large genomic banks

Genomic data are growing exponentially and are daily scanned by thousands of biologists. To reduce the scan time, efficient parallelism can be exploited by dispatching data among a cluster of processing units able to scan locally and independently their own data. If PC clusters are well suited to support this type of parallelism, we propose to substitute PCs by re-configurable hardware closely connected to a hard disk. We show that low cost FPGA nodes interconnected through a standard Ethernet network may advantageously compete against high performance clusters. A prototype of 48 re-configurable processing nodes has been experimented on two genomic applications: a content-based similarity search and a pattern search.

[1]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[2]  David A. Patterson,et al.  A case for intelligent disks (IDISKs) , 1998, SGMD.

[3]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Ian Page Constructing hardware-software systems from a single description , 1996, J. VLSI Signal Process..

[5]  Jan Gray Hands-on computer architecture: teaching processor and integrated systems design with FPGAs , 2000, WCAE '00.

[6]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Journal of Molecular Biology , 1959, Nature.

[8]  Thorsten von Eicken,et al.  技術解説 IEEE Computer , 1999 .

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Dominique Lavenier,et al.  Linear Encoding Scheme for Weighted Finite Automata , 2004, CIAA.

[11]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[12]  Edward Babb,et al.  Implementing a relational database by means of specialzed hardware , 1979, TODS.

[13]  Jean Vuillemin On Computing Power , 1994, Programming Languages and System Architectures.

[14]  Mahmut T. Kandemir,et al.  Design and Evaluation of a Smart Disk Cluster for DSS Commercial Workloads , 2001, J. Parallel Distributed Comput..

[15]  Qiong Zhang,et al.  An FPGA-based Search Engine for Unstructured Database , 2003 .

[16]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[17]  Alberto Sangiovanni-Vincentelli,et al.  Architecture of field-programmable gate arrays : Field programmable gate arrays , 1993 .

[18]  Yervant Zorian,et al.  2001 Technology Roadmap for Semiconductors , 2002, Computer.

[19]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[20]  A. El Gamal,et al.  Architecture of field-programmable gate arrays , 1993, Proc. IEEE.

[21]  Valentín,et al.  Chapter 2. , 1998, Annals of the ICRP.

[22]  Dominique Lavenier,et al.  A Reconfigurable Parallel Disk System for Filtering Genomic Banks , 2003, Engineering of Reconfigurable Systems and Algorithms.

[23]  Michael Stonebraker,et al.  Readings in Database Systems , 1988 .

[24]  Ilkka Tuomi,et al.  The Lives and Death of Moore's Law , 2002, First Monday.

[25]  Maya Gokhale,et al.  Stream-oriented FPGA computing in the Streams-C high level language , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).