Similarity analysis on government regulations

Government regulations are semi-structured text documents that are often voluminous, heavily cross-referenced between provisions and even ambiguous. Multiple sources of regulations lead to difficulties in both understanding and complying with all applicable codes. In this work, we propose a framework for regulation management and similarity analysis. An online repository for legal documents is created with the help of text mining tool, and users can access regulatory documents either through the natural hierarchy of provisions or from a taxonomy generated by knowledge engineers based on concepts. Our similarity analysis core identifies relevant provisions and brings them to the user's attention, and this is performed by utilizing both the hierarchical and referential structures of regulations to provide a better comparison between provisions. Preliminary results show that our system reveals hidden similarities that are not apparent between provisions based on node content comparisons.

[1]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[2]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[3]  Prasenjit Mitra,et al.  An algebra for semantic interoperability of information sources , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[4]  Prasenjit Mitra,et al.  Resolving Terminological Heterogeneity In Ontologies , 2002 .

[5]  John Zeleznikow,et al.  Building intelligent legal information systems , 1994 .

[6]  Eduard H. Hovy Using an ontology to simplify data access , 2003, CACM.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[10]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[11]  Frank E. Kidder Kidder-Parker architects' and builders' handbook , 1946 .

[12]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[13]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[14]  J Allan,et al.  Readings in information retrieval. , 1998 .

[15]  Jochen Dörre,et al.  Text mining: finding nuggets in mountains of textual data , 1999, KDD '99.

[16]  James H. Garrett,et al.  JAVA-BASED REGULATION BROKER , 2000 .

[17]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[18]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[19]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..