Management of unspecified semi-structured data in multi-agent environment

Amounts of available heterogeneous semi-structured data grow rapidly on the Web and other data repositories. This raises the need to provide simple and universal ways to access this data. To provide such an interface, we propose to exploit the notion of "unspecified ontologies", describing the data objects as a list of attributes and their respective values. In order to facilitate an efficient management of the unspecified data objects we use a multi-agent channeled multicast communication platform. The data objects are stored distributively, such that each attribute is assigned a designated channel. This allows performing efficient searches by parallel querying of the relevant channels only, and aggregating the partial results. Moreover, the multi-agent platform facilitates advanced data management through extracting metadata from the data objects. We implemented a prototype system and experimented with a corpus of real-life E-Commerce advertisements. Our results demonstrate scalability of the proposed approach and the accuracy of the extracted meta-data.

[1]  Yosi Ben-Asher,et al.  Semantic Data Management in Peer-to-Peer E-Commerce Applications , 2006, J. Data Semant..

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Dan Roth,et al.  Learning Hebrew Roots: Machine Learning with Linguistic Constraints , 2004, EMNLP.

[4]  Gad M. Landau,et al.  An Extension of the Vector Space Model for Querying XML Documents via XML Fragments 1 , 2002 .

[5]  M. Korostishevsky,et al.  EVOLUTIONARY TREE RECONSTRUCTION AND TRAVELING SALESMAN PROBLEM: A POWERFUL ALGORITHM FOR SHAGGY TREES , 2004 .

[6]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[7]  Shuly Wintner,et al.  XFST2FSA: Comparing Two Finite-State Toolboxes , 2005, ACL 2005.

[8]  Gad M. Landau,et al.  On the Complexity of Sparse Exon Assembly , 2006, J. Comput. Biol..

[9]  Tsvi Kuflik,et al.  Identifying Inter-Domain Similarities Through Content-Based Analysis of Hierarchical Web-Directories , 2006, ECAI.

[10]  Hagit Hel-Or,et al.  Texture-Preserving Shadow Removal in Color Images Containing Curved Surfaces , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Dan Gordon,et al.  The Voxel-Sweep: A Boundary-based Algorithm for Object Segmentation and Connected-Components Detection , 2004, VMV.

[12]  Avigdor Gal,et al.  Automatic Ontology Matching Using Application Semantics , 2005, AI Mag..

[13]  J. van Leeuwen,et al.  Job Scheduling Strategies for Parallel Processing , 2003, Lecture Notes in Computer Science.

[14]  Shlomit S. Pinter,et al.  Selective Code Compression Scheme for Embedded Systems , 2007, Trans. High Perform. Embed. Archit. Compil..

[15]  Gad M. Landau,et al.  Permutation Pattern Discovery in Biosequences , 2004, J. Comput. Biol..

[16]  Shlomit S. Pinter,et al.  Profile-driven compression scheme for embedded systems , 2006, CF '06.

[17]  Dan Roth,et al.  Learning to Identify Semitic Roots , 2007 .

[18]  D. Gordon,et al.  The BOXEL framework for 2.5D data with applications to virtual drivethroughs and ray tracing , 2008, Comput. Geom..

[19]  Tsvi Kuflik,et al.  Evaluation of user model effectiveness by simulation , 2007 .

[20]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[21]  Tsvi Kuflik,et al.  Analyzing Museum Visitors' Behavior Patterns , 2007, User Modeling.

[22]  Hananel Hazan,et al.  Differences and Interactions Between Cerebral Hemispheres When Processing Ambiguous Words , 2008, WAPCV.

[23]  Michael Stonebraker,et al.  Mariposa: a wide-area distributed database system , 1996, The VLDB Journal.

[24]  Guy Wolfovitz,et al.  The complexity of depth-3 circuits computing symmetric Boolean functions , 2006, Inf. Process. Lett..

[25]  Yaniv Eytani Concurrent Java Test Generation as a Search Problem , 2006, Electron. Notes Theor. Comput. Sci..

[26]  Shuly Wintner,et al.  Resources for processing Israeli Hebrew , 2003, MTSUMMIT.

[27]  Dror G. Feitelson,et al.  Backfilling with lookahead to optimize the packing of parallel jobs , 2005, J. Parallel Distributed Comput..

[28]  Gad M. Landau,et al.  Scaled and permuted string matching , 2004, Inf. Process. Lett..

[29]  Daniel Keren,et al.  Multi-Camera Topology Recovery from Coherent Motion , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[30]  Gad M. Landau,et al.  A Combinatorial Approach to Automatic Discovery of Cluster-Patterns , 2003, WABI.

[31]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[32]  H. Hel-Or,et al.  A Fast Block Motion Estimation Algorithm Using Gray Code Kernels , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[33]  Shmuel Ur,et al.  Compiling a benchmark of documented multi-threaded bugs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[34]  M. Peleg,et al.  Adaptation of Practice Guidelines for Clinical Decision Support : A Case Study of Diabetic Foot Care , 2006 .

[35]  Oren Ben-Zwi,et al.  Handling Sensed Data in Hostile Environments , 2005, MSN.

[36]  Gad M. Landau,et al.  Approximating the 2-interval pattern problem , 2005, Theor. Comput. Sci..

[37]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[38]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[39]  Paolo Busetta,et al.  Channeled multicast for group communications , 2002, AAMAS '02.

[40]  Gad M. Landau,et al.  Construction of Aho Corasick automaton in linear time for integer alphabets , 2006, Inf. Process. Lett..

[41]  Alon Itai,et al.  A Computational Lexicon of Contemporary Hebrew , 2006, LREC.

[42]  Shuly Wintner,et al.  Highly Constrained Unification Grammars , 2008, J. Log. Lang. Inf..

[43]  Klaus Havelund,et al.  Towards a framework and a benchmark for testing tools for multi-threaded programs: Research Articles , 2007 .

[44]  Gad M. Landau,et al.  Using PQ Trees for Comparative Genomics , 2005, CPM.

[45]  Larry M. Manevitz,et al.  A simulation tool for modeling the influence of anatomy on information flow using discrete integrate and fire neurons , 2008, J. Comb. Optim..

[46]  Alon Lavie,et al.  Rapid prototyping of a transfer-based Hebrew-to-English machine translation system , 2004, TMI.

[47]  Guillaume Fertin,et al.  Fixed-parameter algorithms for protein similarity search under mRNA structure constraints , 2005, J. Discrete Algorithms.

[48]  Eitan Hadar,et al.  The H2 length estimation method: an algorithm for digitized curves for asymmetric 3D grid applied on coronary bypass surgery , 2005, IEEE International Conference on Image Processing 2005.

[49]  Sarit Kraus,et al.  Applying cooperative negotiation methodology to group recommendation problem , 2006 .

[50]  Yosi Ben-Asher,et al.  Towards a source level compiler: source level modulo scheduling , 2006, 2006 International Conference on Parallel Processing Workshops (ICPPW'06).

[51]  K. Selçuk Candan,et al.  Query caching and optimization in distributed mediator systems , 1996, SIGMOD '96.

[52]  Ronen Lev SWAPART : Synthetic Object Creation by Part Substitution , 2004 .

[53]  Tsvi Kuflik,et al.  Case-based to content-based user model mediation , 2006 .

[54]  Gad M. Landau,et al.  Gene Proximity Analysis across Whole Genomes via PQ Trees1 , 2005, J. Comput. Biol..

[55]  Alon Y. Halevy,et al.  Piazza: data management infrastructure for semantic web applications , 2003, WWW '03.

[56]  Gad M. Landau,et al.  Construction of Aho Corasick Automaton in Linear Time for Integer Alphabets , 2005, CPM.

[57]  Yosi Ben-Asher,et al.  UNSO: unspecified ontologies for peer-to-peer E-commerce applications , 2004 .

[58]  Hananel Hazan,et al.  Using Neural Network Models to Model Cerebral Hemispheric Differences in Processing Ambiguous Words , 2007, NeSy.

[59]  Dan Goldwasser,et al.  Resource allocation among development phases: an economic approach , 2006, EDSER '06.

[60]  Alon Lavie,et al.  Cross Lingual and Semantic Retrieval for Cultural Heritage Appreciation , 2007, LaTeCH@ACL 2007.

[61]  Tsvi Kuflik,et al.  Supporting small groups in the museum by context-aware communication services , 2007, IUI '07.

[62]  Philip A. Bernstein,et al.  Meta data management , 2004, Proceedings. 20th International Conference on Data Engineering.

[63]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[64]  Shuly Wintner,et al.  Finite-State Registered Automata and Their Uses in Natural Languages , 2005, FSMNLP.

[65]  Shuly Wintner,et al.  Finite-State Registered Automata for Non-Concatenative Morphology , 2006, Computational Linguistics.

[66]  Gad M. Landau,et al.  Sparse Normalized Local Alignment , 2004, Algorithmica.

[67]  Yaron Denekamp,et al.  Mapping computerized clinical guidelines to electronic medical records: Knowledge-data ontological mapper (KDOM) , 2008, J. Biomed. Informatics.

[68]  Yosi Ben-Asher,et al.  Heuristics for finding concurrent bugs , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[69]  Shlomo Berkovsky,et al.  Semantic Platform for Context-Aware Publish/Subscribe M-Commerce , 2005, 2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops).

[70]  Tsvi Kuflik,et al.  Context Aware Communication Services in "Active Museums" , 2007, IEEE International Conference on Software-Science, Technology & Engineering (SwSTE'07).

[71]  Einat Marhasev,et al.  Non-stationary Hidden Semi Markov Models in Activity Recognition , 2006 .

[72]  Gadi Haber,et al.  Complementing Missing and Inaccurate Profiling Using a Minimum Cost Circulation Algorithm , 2008, HiPEAC.

[73]  Gad M. Landau,et al.  Approximating the 2-interval pattern problem , 2008, Theor. Comput. Sci..

[74]  Yosi Ben-Asher,et al.  Using J2EE/NET Clusters for Parallel Computations of Join Queries in Distributed Databases , 2005, J. Digit. Inf. Manag..

[75]  Tsvi Kuflik,et al.  Privacy-enhanced collaborative filtering , 2005 .

[76]  Tsvi Kuflik,et al.  Collaborative filtering over distributed environment , 2005 .

[77]  Shlomo Berkovsky,et al.  Developing a framework for insurance underwriting expert system , 2004 .

[78]  Ilan Newman,et al.  Lower bounds for testing Euclidean Minimum Spanning Trees , 2007, Inf. Process. Lett..

[79]  Shuly Wintner,et al.  A Finite-State Morphological Grammar of Hebrew , 2005, Natural Language Engineering.

[80]  Shlomit S. Pinter,et al.  Data Sharing Conscious Scheduling for Multi-threaded Applications on SMP Machines , 2006, Euro-Par.

[81]  Avigdor Gal,et al.  Measuring the relative performance of schema matchers , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[82]  Silvia Rossi,et al.  Intra-role Coordination Using Group Communication: A Preliminary Report , 2003, Workshop on Agent Communication Languages.

[83]  David Carmel,et al.  Conversation Detection in Email Systems , 2008, ECIR.