Towards Automatic Structuring and Semantic Indexing of Legal Documents

Over the last years there has been a great increase on the number of freely available legal resources. Portals that allow users to search for legislation, using keywords are now a common place. However, in the vast majority of those portals, legal documents are not stored in a structured format with a rich set of meta data, but in presentation oriented manifestation, making impossible for the end users to inquiry semantics about the documents, such as date of enactment, date of repeal, jurisdiction, etc. or to reuse information and establish an interconnection with similar repositories. In this paper, we present an approach for extracting a machine readable semantic representation of legislation, from unstructured document formats. Our method exploits common formats of legal documents to identify blocks of structural and semantic information and models them according to a popular legal meta-schema. Our proposed method is highly extensible and achieves high accuracy for a variety of legal and para legal documents, especially legislation. Our evaluation results reveal that our methodology can be of great assistance for the automatic structuring and semantic indexing of legal resources.

[1]  Jack G. Conrad,et al.  Legal document clustering with built-in topic segmentation , 2011, CIKM '11.

[2]  John D. Garofalakis,et al.  Automated analysis of greek legislative texts for version control: limitations, caveats and challenges , 2015, Panhellenic Conference on Informatics.

[3]  Monica Palmirani,et al.  An XML Editor for Legal Information Management , 2003, EGOV.

[4]  Nazim Madhavji,et al.  Organization , 2020, WER.

[5]  Carlo Marchetti,et al.  Automatic mark-up of legislative documents and its application to parallel text generation , 2009 .

[6]  Rinke Hoekstra,et al.  MetaVex: Regulation Drafting Meets the Semantic Web , 2008, Computable Models of the Law, Languages, Dialogues, Games, Ontologies.

[7]  Terence Parr Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages , 2009 .

[8]  Fabio Vitali,et al.  Using XML as a means to access legislative documents: Italian and foreign experiences , 2002, SIAP.

[9]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[10]  Akira Shimazu,et al.  Document Structure Analysis with Syntactic Model and Parsers: Application to Legal Judgments , 2011, JSAI-isAI Workshops.

[11]  Aruna Raja,et al.  Domain Specific Languages , 2010 .

[12]  Terence Parr,et al.  Adaptive LL(*) parsing: the power of dynamic analysis , 2014, OOPSLA 2014.

[13]  Pompeu Casanovas,et al.  Law and the Semantic Web, an Introduction , 2003, Law and the Semantic Web.

[14]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[15]  Fabio Vitali,et al.  MetaLex XML and the Legal Knowledge Interchange Format , 2008, Computable Models of the Law, Languages, Dialogues, Games, Ontologies.

[16]  Bryan Ford,et al.  Parsing expression grammars: a recognition-based syntactic foundation , 2004, POPL '04.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[19]  Fabio Vitali,et al.  Multi-layer Markup and Ontological Structures in Akoma Ntoso , 2009, AICOL Workshops.

[20]  Evans,et al.  Domain-driven design , 2003 .

[21]  Tommaso Agnoloni,et al.  xmLegesEditor: an OpenSource Visual XML Editor for supporting Legal National Standards , 2007 .

[22]  Inter-Parliamentary Union World e-parliament report , 2008 .