Empowering scientific discovery by distributed data mining on the grid infrastructure

The grid-based computing paradigm has attracted much attention in recent years. The sharing of distributed computing resources (such as software, hardware, data, sensors, etc.) is an important aspect of grid computing. Computational Grids focus on methods for handling compute intensive tasks while Data Grids are geared toward data-intensive computing. Grid-based computing has been put to use in several scientific disciplines such as astronomy, engineering, climate studies, ecology, biology and health sciences. For example, in astronomy, breakthroughs in telescope, detector, and computer technology allow sky surveys to produce terabytes of images and catalogs which are typically stored in data grids. Extraction of meaningful knowledge from data grids require development of sophisticated data mining techniques. This dissertation investigates Distributed Data Mining (DDM) techniques for the grid infrastructure. We study the data grids in the astronomy domain as a representative example. The gigantic, heterogeneous, geographically distributed repositories of the astronomy sky surveys pose challenges to the data miner since most off-the-shelf data mining systems require the data to be downloaded to a single location before further analysis. This imposes serious scalability constraints on the data mining system and fundamentally hinders the scientific discovery process. In order to enable astronomers to tap the richness of sky survey catalogs, we describe a system for Distributed Exploration of Massive Astronomical Catalogs (DEMAC) which contains algorithms for distributed: (1) Principal Component Analysis (PCA) enabling dimension reduction of correlated astrophysical parameters; (2) Outlier Detection for identification of "interesting" galaxies; and (3) Classification of astronomical sources. The demand for catalog data from the astronomy community has been increasing fast. This has ushered in new mechanisms to support scalable performance. One such mechanism allows users to download and locally manage different parts of the overall repository resulting in partial images of the data in distributed environments. Collaboration amongst users with such personalized databases (MyDBs) results in the formation of distributed peer-to-peer (P2P) networks. We investigate strategies for data transfer and mining in peer-to-peer MyDB environments.

[1]  Y. Gil,et al.  A Knowledge-Based Approach to Interactive Workflow Composition , 2004 .

[2]  Francine Berman Viewpoint: From TeraGrid to knowledge grid , 2001, CACM.

[3]  Hillol Kargupta,et al.  Energy Consumption in Data Analysis for On-board and Distributed Applications , 2003 .

[4]  Srinivasan Parthasarathy,et al.  Facilitating interactive distributed data stream processing and mining , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[6]  Ivan Janciak,et al.  The development of a wisdom autonomic grid , 2004 .

[7]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[8]  Alfons Kemper,et al.  StreamGlobe: Processing and Sharing Data Streams in Grid-Based P2P Infrastructures , 2005, VLDB.

[9]  Salvatore J. Stolfo,et al.  Sharing Learned Models among Remote Database Partitions by Local Meta-Learning , 1996, KDD.

[10]  Liang Chen,et al.  GATES: a grid-based middleware for processing distributed data streams , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[11]  Jason Novotny,et al.  Data mining on NASA's Information Power Grid , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[12]  Ian T. Foster,et al.  Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, Journal of Computer Science and Technology.

[13]  Don Middleton Earth System Grid II, Turning Climate Datasets into Community Resources , 2001 .

[14]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[15]  Mario Cannataro,et al.  Design of Distributed Data Mining Applications on the KNOWLEDGE GRID , 2002 .

[16]  Jennifer Widom,et al.  A Data Stream Management System for Network Traffic Management , 2001 .

[17]  Domenico Talia,et al.  Knowledge Discovery Services and Tools on Grids , 2003, ISMIS.

[18]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[19]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[20]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[21]  Eric Martz,et al.  Protein Data Bank (PDB) , 2004 .

[22]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[23]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[24]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[25]  S. McClean,et al.  Conceptual Clustering of Heterogeneous Distributed Databases , 2001 .

[26]  Domenico Talia,et al.  Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids , 2005, PKDD.

[27]  Steven Tuecke,et al.  Protocols and services for distributed data-intensive science , 2002 .

[28]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[29]  Norman W. Paton,et al.  OGSA-DQP: A Service-Based Distributed Query Processor for the Grid , 2003 .

[30]  Peter Z. Kunszt,et al.  The Sloan Digital Sky Survey Science Archive: Migrating a Multi-Terabyte Astronomical Archive from Object to Relational DBMS , 2004, ArXiv.

[31]  Michael J. Pazzani,et al.  A Principal Components Approach to Combining Regression Estimates , 1999, Machine Learning.

[32]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[33]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[34]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[35]  Mahesh Viswanathan,et al.  Testing and Spot-Checking of Data Streams , 2000, SODA '00.

[36]  Javier Jaén Martínez,et al.  Data Management in an International Data Grid Project , 2000, GRID.

[37]  Srinivasan Parthasarathy,et al.  Clustering Distributed Homogeneous Datasets , 2000, PKDD.

[38]  Hillol Kargupta,et al.  A Fourier Analysis Based Approach to Learning Decision Trees in a Distributed Environment , 2001, SDM.

[39]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[40]  Stefan Rennick Egglestone,et al.  e-Science from the Antarctic to the GRID , 2003 .

[41]  Domenico Talia,et al.  Data Integration and Query Reformulation in Service-Based Grids , 2007 .

[42]  Akbar M. Sayeed,et al.  Distributed Multi-target Classification in Wireless Sensor Networks , 2003 .

[43]  Vasant Honavar,et al.  A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees , 2004, Int. J. Hybrid Intell. Syst..

[44]  Miron Livny,et al.  The Design and Implementation of a Sequence Database System , 1996, VLDB.

[45]  Kun Liu,et al.  VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring , 2004, SDM.

[46]  Hans-Peter Kriegel,et al.  DBDC: Density Based Distributed Clustering , 2004, EDBT.

[47]  Rong Chen,et al.  A new algorithm for learning parameters of a Bayesian network from distributed data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[48]  Joydeep Ghosh,et al.  Privacy-preserving distributed clustering using generative models , 2003, Third IEEE International Conference on Data Mining.

[49]  Ying Liu,et al.  Calder query grid service: insights and experimental evaluation , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[50]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[51]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[52]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[53]  Miron Livny,et al.  SEQ: A model for sequence databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[54]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[55]  Anthony Rowe,et al.  Discovery net: towards a grid of knowledge discovery , 2002, KDD.

[56]  Mario Cannataro,et al.  Grid-Based Data Mining and Knowledge Discovery , 2004 .

[57]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[58]  William E. Allcock,et al.  The Globus Striped GridFTP Framework and Server , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[59]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[60]  Quan Z. Sheng,et al.  The Self-Serv Environment for Web Services Composition , 2003, IEEE Internet Comput..

[61]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[62]  Bell Telephone,et al.  ROBUST ESTIMATES, RESIDUALS, AND OUTLIER DETECTION WITH MULTIRESPONSE DATA , 1972 .

[63]  Christopher Olston,et al.  Finding (recently) frequent items in distributed data streams , 2005, 21st International Conference on Data Engineering (ICDE'05).

[64]  Kirk D. Borne Distributed data mining in the National Virtual Observatory , 2003, SPIE Defense + Commercial Sensing.

[65]  Kun Liu,et al.  Communication efficient construction of decision trees over heterogeneously distributed data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[66]  Alexander S. Szalay,et al.  Petabyte Scale Data Mining: Dream or Reality? , 2002, SPIE Astronomical Telescopes + Instrumentation.

[67]  Hillol Kargupta,et al.  Multi-agent Systems and Distributed Data Mining , 2004, CIA.

[68]  Douglas M. Hawkins,et al.  The Detection of Errors in Multivariate Data Using Principal Components , 1974 .

[69]  Haimonti Dutta,et al.  Orthogonal decision trees , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[70]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[71]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[72]  Mario Cannataro,et al.  Semantics and knowledge grids: building the next-generation grid , 2004, IEEE Intelligent Systems.

[73]  Salvatore J. Stolfo,et al.  The application of AdaBoost for distributed, scalable and on-line learning , 1999, KDD '99.

[74]  Hillol Kargupta,et al.  Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining , 2001, J. Parallel Distributed Comput..

[75]  Mohammed J. Zaki Parallel and Distributed Data Mining: An Introduction , 1999, Large-Scale Parallel Data Mining.

[76]  Domenico Talia,et al.  Enabling Knowledge Discovery Services on Grids , 2004, European Across Grids Conference.

[77]  Domenico Talia,et al.  GDIS: A Service-Based Architecture for Data Integration on Grids , 2004, OTM Workshops.

[78]  Mario Cannataro,et al.  PARALLEL AND DISTRIBUTED KNOWLEDGE DISCOVERY ON THE GRID: A REFERENCE ARCHITECTURE , 2000 .

[79]  Salvatore J. Stolfo,et al.  Toward parallel and distributed learning by meta-learning , 1993 .

[80]  Ran Wolff,et al.  Privacy-preserving data mining on data grids in the presence of malicious participants , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..

[81]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[82]  Jason Maassen,et al.  Programming Scientific and Distributed Workflow with Triana Services , 2004 .

[83]  Anne Rogers,et al.  Hancock: a language for extracting signatures from data streams , 2000, KDD '00.

[84]  Rong Chen,et al.  Learning Bayesian Network Structure from Distributed Data , 2003, SDM.

[85]  Hillol Kargupta,et al.  K-Means Clustering Over a Large, Dynamic Network , 2006, SDM.

[86]  Domenico Talia,et al.  Developing Distributed Data Mining Applications in the Knowledge Grid Framework , 2004, VECPAR.

[87]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[88]  Wei Fan,et al.  Using Conflicts Among Multiple Base Classifiers to Measure the Performance of Stacking , 1999 .

[89]  Peter Brezany,et al.  GridMiner: An Infrastructure for Data Mining on Computational Grids , 2003 .

[90]  Haimonti Dutta,et al.  Orthogonal Decision Trees for Resource-Constrained Physiological Data Stream Monitoring Using Mobile Devices , 2005, HiPC.

[91]  Noam Nisan,et al.  Constant depth circuits, Fourier transform, and learnability , 1989, 30th Annual Symposium on Foundations of Computer Science.

[92]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[93]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[95]  Domenico Talia,et al.  VEGA : A Visual Environment for Developing Complex Grid Applications , 2003 .

[96]  Domenico Talia,et al.  Designing Grid services for distributed knowledge discovery , 2003, Web Intell. Agent Syst..

[97]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[98]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[99]  Ying Liu,et al.  Distributed streaming query planner in Calder system , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[100]  Ian Foster,et al.  Grid infrastructure to support science portals for large scale instruments. , 1999 .

[101]  David E. Goldberg,et al.  The Nonuniform Walsh-Schema Transform , 1990, FOGA.

[102]  Vasant Honavar,et al.  Decision Tree Induction from Distributed Heterogeneous Autonomous Data Sources , 2003 .

[103]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[104]  L. Kuncheva ‘ Fuzzy ’ vs ‘ Non-fuzzy ’ in Combining Classifiers Designed by Boosting , 2003 .

[105]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[106]  Salvatore J. Stolfo,et al.  Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.

[107]  Zoran Obradovic,et al.  Distributed clustering and local regression for knowledge discovery in multiple spatial databases , 2000, ESANN.

[108]  Hillol Kargupta,et al.  Collective, Hierarchical Clustering from Distributed, Heterogeneous Data , 1999, Large-Scale Parallel Data Mining.

[109]  Domenico Talia,et al.  Adapting a Pure Decentralized Peer-to-peer Protocol for Grid Services Invocation , 2005, Parallel Process. Lett..

[110]  John P. Huchra,et al.  CfA redshift catalogue , 1992 .

[111]  Ian J. Taylor,et al.  Triana: a graphical Web service composition and execution toolkit , 2004, Proceedings. IEEE International Conference on Web Services, 2004..

[112]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[113]  Kirk D. Borne,et al.  A National Virtual Observatory (NVO) Science Case: Properties of Very Luminous IR Galaxies (VLIRGs) , 2003 .

[114]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[115]  Frank Leymann,et al.  Workflow-Based Applications , 1997, IBM Syst. J..

[116]  Paul Avery,et al.  The griphyn project: towards petascale virtual data grids , 2001 .

[117]  Bin Zhang,et al.  Distributed data clustering can be efficient and exact , 2000, SKDD.

[118]  Domenico Talia,et al.  Toward a Synergy Between P2P and Grids , 2003, IEEE Internet Comput..

[119]  Juergen Hofer DIGIDT : Distributed Classifier Construction in the Grid Data Mining Framework GridMiner-Core , 2004 .

[120]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[121]  Ian J. Taylor,et al.  Distributed computing with Triana on the Grid , 2005, Concurr. Pract. Exp..

[122]  Rahul Ramachandran,et al.  ADaM: a data mining toolkit for scientists and engineers , 2005, Comput. Geosci..

[123]  Matthias Klusch,et al.  Distributed Clustering Based on Sampling Local Density Estimates , 2003, IJCAI.

[124]  Ian Foster,et al.  The Grid: A New Infrastructure for 21st Century Science , 2002 .

[125]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[126]  John Anderson,et al.  Computational Design and Performance of the Fast Ocean Atmosphere Model, Version One , 2001, International Conference on Computational Science.

[127]  Mario Cannataro,et al.  Distributed data mining on grids: services, tools, and applications , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[128]  Peter Brezany,et al.  Towards service collaboration model in grid-based zero latency data stream warehouse (GZLDSWH) , 2004, IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004.

[129]  James Liebert,et al.  The two micron all sky survey , 1994 .

[130]  Florian Schintke,et al.  Concepts and Technologies for a Worldwide Grid Infrastructure , 2002, Euro-Par.

[131]  Paul Watson,et al.  Databases and the Grid , 2003 .

[132]  Jennifer Widom,et al.  Adaptive filters for continuous queries over distributed data streams , 2003, SIGMOD '03.

[133]  Timothy W. Finin,et al.  Service Composition for Mobile Environments , 2005, Mob. Networks Appl..

[134]  Anthony Rowe,et al.  The Design of Discovery Net: Towards Open Grid Services for Knowledge Discovery , 2003, Int. J. High Perform. Comput. Appl..

[135]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[136]  Hillol Kargupta,et al.  A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments , 2004, IEEE Transactions on Knowledge and Data Engineering.

[137]  Yordan Kostov,et al.  Low-cost optical instrumentation for biomedical measurements , 2000 .

[138]  Eleonora Riva Sanseverino,et al.  Distributed, Collaborative Data Analysis from Heterogeneous Sites Using a Scalable Evolutionary Technique , 2001, Applied Intelligence.

[139]  M. Field,et al.  Robust Order Statistics based Ensembles for Distributed Data Mining , 2000 .

[140]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[141]  Beth Plale Using Global Snapshots to Access Data Streams on the Grid , 2004, European Across Grids Conference.

[142]  Paul Avery,et al.  Data Grids: a new computational infrastructure for data-intensive science , 2002, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[143]  W. Allcock,et al.  GridFTP protocol specification , 2002 .

[144]  D. Hawkins,et al.  Exploring Multivariate Data using the Minor Principal Components , 1984 .

[145]  Peter Brezany,et al.  Novel mediator architectures for Grid information systems , 2005, Future Gener. Comput. Syst..

[146]  Mario Cannataro,et al.  A data mining toolset for distributed high- performance platforms , 2002 .

[147]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[148]  Archan Misra,et al.  Middleware architecture for evaluation and selection of 3rd party Web services for service providers , 2005, IEEE International Conference on Web Services (ICWS'05).

[149]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[150]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..

[151]  Hillol Kargupta,et al.  Constructing Simpler Decision Trees from Ensemble Models Using Fourier Analysis , 2002, DMKD.

[152]  A. Prasad Sistla,et al.  DOMINO: databases fOr MovINg Objects tracking , 1999, SIGMOD '99.

[153]  Warrick J. Couch,et al.  A Statistical Comparison of Line Strength Variations in Coma and Cluster Galaxies at z ~ 0·3 , 1998, Publications of the Astronomical Society of Australia.

[154]  Matthias Klusch,et al.  Distributed data mining and agents , 2005, Eng. Appl. Artif. Intell..

[155]  Anthony Rowe,et al.  InfoGrid: providing information integration for knowledge discovery , 2003, Inf. Sci..

[156]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[157]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[158]  María S. Pérez-Hernández,et al.  Adapting the Weka Data Mining Toolkit to a Grid Based Environment , 2005, AWIC.

[159]  Mike Jackson,et al.  Introduction to OGSA-DAI Services , 2004, SAG.

[160]  Foster Provost,et al.  Distributed Data Mining: Scaling up and beyond , 2000 .

[161]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[162]  Yike Guo,et al.  Discovery Processes: Representation And Re-Use , 2002 .

[163]  Nagiza F. Samatova,et al.  RACHET: An Efficient Cover-Based Merging of Clustering Hierarchies from Distributed Datasets , 2002, Distributed and Parallel Databases.

[164]  Ying Liu,et al.  Calder: enabling grid access to data streams , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[165]  Peter Brezany,et al.  Mediators in the Architecture of Grid Information Systems , 2003, PPAM.

[166]  Corinna Cortes,et al.  Boosting Decision Trees , 1995, NIPS.

[167]  Antonio Gonzalez Garcia Elliptical galaxies: merger simulations and the fundamental plane , 2003 .

[168]  David E. Goldberg,et al.  Genetic Algorithms and Walsh Functions: Part I, A Gentle Introduction , 1989, Complex Syst..

[169]  Alexander S. Szalay,et al.  The Sloan Digital Sky Survey , 1999, Comput. Sci. Eng..

[170]  Ivan Janciak,et al.  Virtualization of Heterogeneous Data Sources for Grid Information Systems , 2004 .

[171]  Salvatore J. Stolfo,et al.  A Comparative Evaluation of Voting and Meta-learning on Partitioned Data , 1995, ICML.

[172]  Domenico Talia,et al.  P2P computing and interaction with grids , 2005, Future Gener. Comput. Syst..

[173]  Peter Z. Kunszt,et al.  Large Databases in Astronomy , 2001 .

[174]  Joachim Geiler,et al.  Workflow-based Grid applications , 2006, Future Gener. Comput. Syst..

[175]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[176]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[177]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[178]  Jurgen Hofer,et al.  Distributed Decision Tree Induction within the Grid Data Mining Framework GridMiner-Core , 2004 .

[179]  Salvatore J. Stolfo,et al.  On the Accuracy of Meta-learning for Scalable Data Mining , 2004, Journal of Intelligent Information Systems.

[180]  Timothy W. Finin,et al.  Toward Distributed service discovery in pervasive computing environments , 2006, IEEE Transactions on Mobile Computing.

[181]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[182]  M. Crawford The Human Genome Project. , 1990, Human biology.

[183]  Ian J. Taylor,et al.  Web services composition for distributed data mining , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).

[184]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[185]  Tom Rodden,et al.  Extending the Grid to Support Remote Medical Monitoring , 2003 .

[186]  Ian Taylor,et al.  Triana as a Graphical Web Services Composition Toolkit , 2003 .

[187]  Norman W. Paton,et al.  The design and implementation of Grid database services in OGSA‐DAI , 2005, Concurr. Pract. Exp..

[188]  Hillol Kargupta,et al.  Distributed Data Mining: Algorithms, Systems, and Applications , 2003 .

[189]  Beth Plale Framework for bringing data streams to the grid , 2004, Sci. Program..

[190]  Erwin Laure,et al.  Next-Generation EU DataGrid Data Management Services , 2003 .

[191]  Joel H. Saltz,et al.  DataCutter: Middleware for Filtering Very Large Scientific Datasets on Archival Storage Systems , 2000, IEEE Symposium on Mass Storage Systems.

[192]  Lei Liu,et al.  MobiMine: monitoring the stock market from a PDA , 2002, SKDD.

[193]  Anupam Joshi,et al.  MobiCom poster: Anamika: distributed service composition architecture for pervasive environments , 2003, MOCO.

[194]  Karsten Schwan,et al.  Dynamic Querying of Streaming Data with the dQUOB System , 2003, IEEE Trans. Parallel Distributed Syst..

[195]  Hillol Kargupta,et al.  Knowledge discovery from heterogeneous data streams using fourier spectrum of decision trees , 2001 .

[196]  Gavin McCance Grid Enabled Relational Database Middleware , 2001 .