Annotations on Documents for Information Retrieval

A huge range of corporations these days generate and proportionate textual descriptions of their products, services, and moves. Such collections of textual records contain widespread quantity of structured facts, which stays buried inside the unstructured text. Whilst records extraction algorithms facilitate the extraction of structured relations, they're often costly and faulty. In particular, while operating on top of textual content that doesn't include any instances of the centered structured data. In this paper, we present an unique alternative approach that facilitates the technology of the established metadata via identifying documents that are probable to comprise the records and this fact in the end will be beneficial for querying the database. Our technique is predicated on the idea that human beings those are more likely to feature the essential metadata all through creation time, if we bring it on by using an interface; or that it is a whole lot simpler for humans (and/or algorithms) to pick out the metadata while such statistics certainly exists in the documents. As a primary contribution of this paper, we are approaching the algorithms, those become aware of established attributes which can be probably to seem inside the report, by collectively utilizing the content material of the textual content and the query workload. Our experimental evaluation suggests that our method generates advanced effects compared to techniques that rely best on the textual content or best on the query workload, to pick out attributes of hobby.

[1]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[2]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[3]  Jayant Madhavan,et al.  Web-scale extraction of structured data , 2009, SGMD.

[4]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[5]  Alon Y. Halevy,et al.  Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.

[6]  Thierry Bertin-Mahieux,et al.  Automatic Generation of Social Tags for Music Recommendation , 2007, NIPS.

[7]  Yi Deng,et al.  Towards a business continuity information network for rapid disaster recovery , 2008, DG.O.

[8]  Panagiotis G. Ipeirotis,et al.  A quality-aware optimizer for information extraction , 2009, TODS.

[9]  Panagiotis G. Ipeirotis,et al.  Facilitating Document Annotation Using Content and Querying Value , 2014, IEEE Transactions on Knowledge and Data Engineering.

[10]  Joseph M. Hellerstein,et al.  USHER: Improving data quality with dynamic forms , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[11]  Joseph M. Hellerstein,et al.  Improving data quality with dynamic forms , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[12]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[13]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.