A first step for building a document warehouse: Unification of XML documents

The Web plays a key role for information publication and exchange between organizations. In this context, the XML format becomes a common standard for data representation and exchange. On the other hand, XML documents constitute an important source for decisional analyses since they help decision makers to better understand and control the evolution of their business processes. However, even though several XML documents may belong to a same domain, they may be described by multiple structures. In this paper, we present a method to unify XML document structures in order to build a global and generic perception/view of heterogeneous documents, to store them as a document warehouse, and finally, to query them easily. We also describe our software prototype USD (Unification of Structures of XML Documents) which supports the proposed method. We illustrate its functionalities through an example.

[1]  Karim Djemal,et al.  De La Modelisation A L'exploitation Des Documents A Structures Multiples. (From Modeling To Exploitation Of Multistructured Documents) , 2010 .

[2]  The Maria , 1916, American Journal of International Law.

[3]  Silvana Castano,et al.  A Method for the Unification of XML Schemata , 2002, Inf. Softw. Technol..

[4]  Dan Sullivan,et al.  Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales , 2001 .

[5]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[6]  Ophir Frieder,et al.  On the design and evaluation of a multi-dimensional approach to information retrieval (poster session) , 2000, SIGIR '00.

[7]  Torben Bach Pedersen,et al.  Contextualizing data warehouses with documents , 2008, Decis. Support Syst..

[8]  Kaïs Khrouf Entrepôts de documents : de l'alimentation à l'exploitation , 2004 .

[9]  Rafael Berlanga Llavori,et al.  A relevance model for a data warehouse contextualized with documents , 2009, Inf. Process. Manag..

[10]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[11]  Jamel Feki,et al.  Modélisation multidimensionnelle des documents XML , 2011, EDA.

[12]  Yong-Sung Kim,et al.  Unification of XML DTD for XML Documents with Similar Structure , 2005, ICCSA.

[13]  Jinho Lee,et al.  On the design and evaluation of a multi-dimensional approach to information retrieval (poster session) , 2000, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[14]  Jamel Feki,et al.  Unification of XML Document Structures for Document Warehouse (DocW) , 2011, ICEIS.

[15]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[16]  Ronaldo dos Santos Mello,et al.  An ontology-driven process for unification of XML instances , 2008, WebMedia.

[17]  Hanêne Ben-Abdallah,et al.  Modélisation multidimensionnelle de documents XML centrés-données , 2010, J. Decis. Syst..

[18]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[19]  Weiyi Liu,et al.  Semantic integration of XML Schema , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[20]  Frank S. C. Tseng,et al.  The concept of document warehousing for multi-dimensional modeling of textual-based business intelligence , 2006, Decis. Support Syst..