A Relational Framework for Information Extraction

Information Extraction commonly refers to the task of populating a relational schema, having predefined underlying semantics, from textual content. This task is pervasive in contemporary computational challenges associated with Big Data. In this article we provide an overview of our work on document spanners--a relational framework for Information Extraction that is inspired by rule-based systems such as IBM's SystemT.

[1]  Kyriakos Mouratidis,et al.  Preventing Location-Based Identity Inference in Anonymous Spatial Queries , 2007, IEEE Transactions on Knowledge and Data Engineering.

[2]  Olga Papaemmanouil,et al.  Explore-by-example: an automatic query steering framework for interactive data exploration , 2014, SIGMOD Conference.

[3]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[4]  Nick Mathewson,et al.  Tor: The Second-Generation Onion Router , 2004, USENIX Security Symposium.

[5]  Shyamal Patel,et al.  Mercury: a wearable sensor network platform for high-fidelity motion analysis , 2009, SenSys '09.

[6]  Qinghua Li,et al.  Providing privacy-aware incentives for mobile sensing , 2013, 2013 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[7]  Dan Suciu,et al.  Reverse data management , 2011, Proc. VLDB Endow..

[8]  Charu C. Aggarwal,et al.  Social Sensing , 2013, Managing and Mining Sensor Data.

[9]  Wenfei Fan,et al.  Relative information completeness , 2009, PODS.

[10]  Christopher Ré,et al.  Brainwash: A Data System for Feature Engineering , 2013, CIDR.

[11]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[12]  Rinku Dewri,et al.  Inferring trip destinations from driving habits data , 2013, WPES.

[13]  Ahmed Helmy,et al.  CSI: A paradigm for behavior-oriented profile-cast services in mobile networks , 2012, Ad Hoc Networks.

[14]  Rajeev Rastogi,et al.  A cost-based model and effective heuristic for repairing constraints by value modification , 2005, SIGMOD '05.

[15]  Christopher Ré,et al.  Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.

[16]  John Krumm,et al.  A survey of computational location privacy , 2009, Personal and Ubiquitous Computing.

[17]  Fabian M. Suchanek,et al.  AMIE: association rule mining under incomplete evidence in ontological knowledge bases , 2013, WWW.

[18]  Lucian Popa,et al.  Mapping polymorphism , 2010, ICDT '10.

[19]  George Danezis,et al.  How Much Is Location Privacy Worth? , 2005, WEIS.

[20]  Heather Richter Lipford,et al.  Moving beyond untagging: photo privacy in a tagged world , 2010, CHI.

[21]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[23]  Paliath Narendran,et al.  On Extended Regular Expressions , 2009, LATA.

[24]  RONALD FAGIN,et al.  Document Spanners , 2015, J. ACM.

[25]  Karl Aberer,et al.  Utility-driven data acquisition in participatory sensing , 2013, EDBT '13.

[26]  Alex Pentland,et al.  Sensing and modeling human networks using the sociometer , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[27]  Li Xiong,et al.  Protecting Locations with Differential Privacy under Temporal Correlations , 2014, CCS.

[28]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[29]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[30]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[31]  Amihai Motro,et al.  VAGUE: a user interface to relational databases that permits vague queries , 1988, TOIS.

[32]  Massimo Franceschet,et al.  The role of conference publications in CS , 2010, Commun. ACM.

[33]  Matthew O. Ward,et al.  COLARM: Cost-based Optimization for Localized Association Rule Mining , 2014, EDBT.

[34]  Salil S. Kanhere,et al.  IncogniSense: An anonymity-preserving reputation framework for participatory sensing applications , 2012, 2012 IEEE International Conference on Pervasive Computing and Communications.

[35]  Taro Suzuki,et al.  Disambiguation in Regular Expression Matching via Position Automata with Augmented Transitions , 2010, CIAA.

[36]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[37]  David A. Patterson The health of research conferences and the dearth of big idea papers , 2004, CACM.

[38]  Gautam Das,et al.  A Probabilistic Optimization Framework for the Empty-Answer Problem , 2013, Proc. VLDB Endow..

[39]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[40]  Rares Vernica,et al.  Hyracks: A flexible and extensible foundation for data-intensive computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[41]  Aristides Gionis,et al.  Automated Ranking of Database Query Results , 2003, CIDR.

[42]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[43]  Li Xiong,et al.  A Comprehensive Comparison of Multiparty Secure Additions with Differential Privacy , 2017, IEEE Transactions on Dependable and Secure Computing.

[44]  Landon P. Cox,et al.  YouProve: authenticity and fidelity in mobile sensing , 2011, SenSys.

[45]  Frederick Reiss,et al.  Cleaning inconsistencies in information extraction via prioritized repairs , 2014, PODS.

[46]  Ville Laurikari,et al.  Efficient submatch addressing for regular expressions , 2001 .

[47]  Gerhard Weikum,et al.  Probabilistic information retrieval approach for ranking of database query results , 2006, TODS.

[48]  Martin Theobald,et al.  A Temporal-Probabilistic Database Model for Information Extraction , 2013, Proc. VLDB Endow..

[49]  Cyrus Shahabi,et al.  A privacy-aware framework for participatory sensing , 2011, SKDD.

[50]  Frederick Reiss,et al.  SystemT: An Algebraic Approach to Declarative Information Extraction , 2010, ACL.

[51]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[52]  Diana Maynard,et al.  JAPE: a Java Annotation Patterns Engine , 2000 .

[53]  Ramachandran Ramjee,et al.  PRISM: platform for remote sensing using smartphones , 2010, MobiSys '10.

[54]  Richi Nayak,et al.  Leveraging the network information for evaluating answer quality in a collaborative question answering portal , 2012, Social Network Analysis and Mining.

[55]  Panos Kalnis,et al.  MobiHide: A Mobilea Peer-to-Peer System for Anonymous Location-Based Queries , 2007, SSTD.

[56]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[57]  Dmitry Zelenko,et al.  Kernel Methods for Relation Extraction , 2002, J. Mach. Learn. Res..

[58]  Salil S. Kanhere,et al.  A survey on privacy in mobile participatory sensing applications , 2011, J. Syst. Softw..

[59]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[60]  Emiliano Miluzzo,et al.  The BikeNet mobile sensing system for cyclist experience mapping , 2007, SenSys '07.

[61]  Catuscia Palamidessi,et al.  Geo-indistinguishability: differential privacy for location-based systems , 2012, CCS.

[62]  Ulrich Junker,et al.  QUICKXPLAIN: Preferred Explanations and Relaxations for Over-Constrained Problems , 2004, AAAI.

[63]  Sébastien Gambs,et al.  De-anonymization Attack on Geolocated Data , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[64]  Cyrus Shahabi,et al.  Blind Evaluation of Nearest Neighbor Queries Using Space Transformation to Preserve Location Privacy , 2007, SSTD.

[65]  Tetsuji Satoh,et al.  An anonymous communication technique using dummies for location-based services , 2005, ICPS '05. Proceedings. International Conference on Pervasive Services, 2005..

[66]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[67]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[68]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[69]  Moshé M. Zloof Query-by-example: the invocation and definition of tables and forms , 1975, VLDB '75.

[70]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[71]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[72]  Andreas Krause,et al.  Toward Community Sensing , 2008, 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008).

[73]  Cyrus Shahabi,et al.  A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing , 2014, Proc. VLDB Endow..

[74]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[75]  Daniel S. Weld,et al.  Temporal Information Extraction , 2010, AAAI.

[76]  Subbarao Kambhampati,et al.  Answering Imprecise Queries over Autonomous Web Databases , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[77]  Chen Li,et al.  Inside "Big Data management": ogres, onions, or parfaits? , 2012, EDBT '12.

[78]  Weiping Zhang,et al.  I/O-efficient statistical computing with RIOT , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[79]  Jianzhong Li,et al.  CerFix: A System for Cleaning Data with Certain Fixes , 2011, Proc. VLDB Endow..

[80]  Bahar Qarabaqi,et al.  User-driven refinement of imprecise queries , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[81]  Jianzhong Li,et al.  EIF: A Framework of Effective Entity Identification , 2010, WAIM.

[82]  Chengfei Liu,et al.  A framework for query refinement with user feedback , 2013, J. Syst. Softw..

[83]  John Krumm,et al.  Inference Attacks on Location Tracks , 2007, Pervasive.

[84]  Christopher Ré,et al.  Materialization optimizations for feature selection workloads , 2014, SIGMOD Conference.

[85]  Carlos José Pereira de Lucena,et al.  Assessing the research and education quality of the top Brazilian Computer Science graduate programs , 2008, SGCS.

[86]  Xiaofeng Xu,et al.  STAC: spatial task assignment for crowd sensing with cloaked participant locations , 2015, SIGSPATIAL/GIS.

[87]  Vaidy S. Sunderam,et al.  Spatial Task Assignment for Crowd Sensing with Cloaked Locations , 2014, 2014 IEEE 15th International Conference on Mobile Data Management.

[88]  A. Kansal,et al.  Building a Sensor Network of Mobile Phones , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[89]  Fabrício Benevenuto,et al.  The H-index paradox: your coauthors have a higher H-index than you do , 2015, Scientometrics.

[90]  Jeffrey F. Naughton,et al.  A Survey of the Existing Landscape of ML Systems , 2015 .

[91]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[92]  Ryan Newton,et al.  The pothole patrol: using a mobile sensor network for road surface monitoring , 2008, MobiSys '08.

[93]  Meena Nagarajan,et al.  A CRM system for social media: challenges and experiences , 2013, WWW.

[94]  Christian S. Jensen,et al.  Mining significant semantic locations from GPS data , 2010, Proc. VLDB Endow..

[95]  Quoc Trung Tran,et al.  How to ConQueR why-not questions , 2010, SIGMOD Conference.

[96]  Michael Ley,et al.  The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.

[97]  Deborah Estrin,et al.  Personal data vaults: a locus of control for personal data streams , 2010, CoNEXT.

[98]  Tarek F. Abdelzaher,et al.  SATIRE: a software architecture for smart AtTIRE , 2006, MobiSys '06.

[99]  Tassos Dimitriou,et al.  Privacy-respecting discovery of data providers in crowd-sensing applications , 2013, 2013 IEEE International Conference on Distributed Computing in Sensor Systems.

[100]  Jeffrey F. Naughton,et al.  To Join or Not to Join?: Thinking Twice about Joins before Feature Selection , 2016, SIGMOD Conference.

[101]  Cyrus Shahabi,et al.  TAPAS: Trustworthy privacy-aware participatory sensing , 2012, Knowledge and Information Systems.

[102]  Ramón Cáceres,et al.  Virtual individual servers as privacy-preserving proxies for mobile devices , 2009, MobiHeld '09.

[103]  Jian Ma,et al.  A novel privacy protection scheme for participatory sensing with incentives , 2012, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems.

[104]  Aaron Roth,et al.  Selling privacy at auction , 2010, EC '11.

[105]  Moshe Y. Vardi Scalable conferences , 2014, CACM.

[106]  Minho Shin,et al.  AnonySense: A system for anonymous opportunistic sensing , 2011, Pervasive Mob. Comput..

[107]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[108]  Sriram Raghavan,et al.  Navigating the intranet with high precision , 2007, WWW '07.

[109]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[110]  Eija Kaasinen,et al.  User needs for location-aware mobile services , 2003, Personal and Ubiquitous Computing.

[111]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[112]  Jeffrey F. Naughton,et al.  Learning Generalized Linear Models Over Normalized Data , 2015, SIGMOD Conference.

[113]  Chi-Yin Chow,et al.  Spatial cloaking for anonymous location-based services in mobile peer-to-peer environments , 2011, GeoInformatica.

[114]  Alfred V. Aho,et al.  Algorithms for Finding Patterns in Strings , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[115]  Dan Suciu,et al.  A theory of pricing private data , 2012, ICDT '13.

[116]  Gabriel Ghinita,et al.  Privacy for Location-based Services , 2013, Privacy for Location-based Services.

[117]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[118]  Deborah Estrin,et al.  Recruitment Framework for Participatory Sensing Data Collections , 2010, Pervasive.

[119]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[120]  Jorge E. Mezei,et al.  On Relations Defined by Generalized Finite Automata , 1965, IBM J. Res. Dev..

[121]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[122]  Padraig Cunningham,et al.  Relative status of journal and conference publications in computer science , 2010, Commun. ACM.

[123]  Vaidy S. Sunderam,et al.  Predict: Privacy and Security Enhancing Dynamic Information Collection and Monitoring , 2013, ICCS.

[124]  Moshe Y. Vardi Revisiting the publication culture in computing research , 2010, CACM.

[125]  Andrew T. Campbell,et al.  Fast track article: Bubble-sensing: Binding sensing tasks to the physical world , 2010 .

[126]  Shuai Ma,et al.  Improving Data Quality: Consistency and Accuracy , 2007, VLDB.

[127]  Song Wang,et al.  In-Device Spatial Cloaking for Mobile User Privacy Assisted by the Cloud , 2010, 2010 Eleventh International Conference on Mobile Data Management.

[128]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[129]  Rich Caruana,et al.  Distributed tuning of machine learning algorithms using MapReduce Clusters , 2011, LDMTA '11.

[130]  Ramachandran Ramjee,et al.  Nericell: rich monitoring of road and traffic conditions using mobile smartphones , 2008, SenSys '08.

[131]  Douglas E. Appelt,et al.  The Common Pattern Specification Language , 1998, TIPSTER.

[132]  PapadiasDimitris,et al.  Preventing Location-Based Identity Inference in Anonymous Spatial Queries , 2007 .

[133]  Landon P. Cox,et al.  LiveCompare: grocery bargain hunting through participatory sensing , 2009, HotMobile '09.

[134]  Jan Chomicki,et al.  Consistent query answers in inconsistent databases , 1999, PODS '99.

[135]  Jan Chomicki,et al.  Prioritized repairing and consistent query answering in relational databases , 2012, Annals of Mathematics and Artificial Intelligence.

[136]  Mani B. Srivastava,et al.  SensorSafe: A Framework for Privacy-Preserving Management of Personal Sensory Information , 2011, Secure Data Management.

[137]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[138]  John Krumm,et al.  Exploring end user preferences for location obfuscation, location-based services, and the value of location , 2010, UbiComp.

[139]  Jianzhong Li,et al.  Cleanix: A Big Data Cleaning Parfait , 2014, CIKM.

[140]  Ernesto Damiani,et al.  Location Privacy Protection Through Obfuscation-Based Techniques , 2007, DBSec.

[141]  Cyrus Shahabi,et al.  GeoCrowd: enabling query answering with spatial crowdsourcing , 2012, SIGSPATIAL/GIS.

[142]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..

[143]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[144]  Delphine Reinhardt Privacy in mobile participatory sensing: Current trends and future challenges , 2016, J. Syst. Softw..

[145]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[146]  Ronald Fagin,et al.  Dichotomies in the Complexity of Preferred Repairs , 2015, PODS.

[147]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[148]  Wen Hu,et al.  Towards privacy-sensitive participatory sensing , 2009, 2009 IEEE International Conference on Pervasive Computing and Communications.

[149]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[150]  Cyrus Shahabi,et al.  Towards preserving privacy in participatory sensing , 2011, 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[151]  Li Xiong,et al.  Real-time aggregate monitoring with differential privacy , 2012, CIKM.

[152]  Jean Berstel,et al.  Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[153]  Julien Penders,et al.  Body area network for monitoring autonomic nervous system responses , 2009, 2009 3rd International Conference on Pervasive Computing Technologies for Healthcare.

[154]  Lance Fortnow,et al.  ViewpointTime for computer science to grow up , 2009, Commun. ACM.

[155]  David Wetherall,et al.  Toward trustworthy mobile sensing , 2010, HotMobile '10.

[156]  Deborah Estrin,et al.  Participatory Privacy in Urban Sensing , 2008 .

[157]  Sheng Yu,et al.  A Formal Study Of Practical Regular Expressions , 2003, Int. J. Found. Comput. Sci..

[158]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[159]  N. Bulusu,et al.  Participatory Sensing in Commerce: Using Mobile Camera Phones to Track Market Price Dispersion , 2008 .

[160]  Stanley B. Zdonik,et al.  Query Steering for Interactive Data Exploration , 2013, CIDR.

[161]  Jan Camenisch,et al.  Design and implementation of the idemix anonymous credential system , 2002, CCS '02.

[162]  Fabrício Benevenuto,et al.  The role of research leaders on the evolution of scientific communities , 2013, WWW.

[163]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[164]  Jiawei Han,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[165]  Rong Zheng,et al.  Efficient algorithms for K-anonymous location privacy in participatory sensing , 2012, 2012 Proceedings IEEE INFOCOM.

[166]  Stijn Vansummeren,et al.  Type inference for unique pattern matching , 2006, TOPL.

[167]  Adriane Chapman,et al.  Why Not? , 1965, SIGMOD Conference.

[168]  L. Selavo,et al.  Towards Vehicular Sensor Networks with Android Smartphones for Road Surface Monitoring , 2013 .

[169]  Xing Xie,et al.  Mining Individual Life Pattern Based on Location History , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[170]  Amélie Marian,et al.  Corroborating Information from Web Sources , 2011, IEEE Data Eng. Bull..

[171]  Bradley Malin,et al.  Preserving privacy by de-identifying face images , 2005, IEEE Transactions on Knowledge and Data Engineering.

[172]  Wen Hu,et al.  A privacy-preserving reputation system for participatory sensing , 2012, 37th Annual IEEE Conference on Local Computer Networks.

[173]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[174]  Ted S. Sindlinger,et al.  Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business , 2010 .

[175]  Fan Ye,et al.  Mobile crowdsensing: current state and future challenges , 2011, IEEE Communications Magazine.

[176]  Amir Sadeghian,et al.  Feature Engineering for Knowledge Base Construction , 2014, IEEE Data Eng. Bull..

[177]  Heeyoung Lee,et al.  A Multi-Pass Sieve for Coreference Resolution , 2010, EMNLP.