Technical Perspective
暂无分享,去创建一个
The challenge of extracting structured information from text, or sequential data in general, is prevalent across a multitude of data-science domains. This challenge, known as Information Extraction (IE), instantiates to core components in text analytics, and a plethora of IE paradigms have been developed over the past decades. Rules and rule systems have consistently been key components in such paradigms, yet their roles have varied and evolved over time. Analytics engines such as IBM's SystemT use IE rules for materializing relations inside relational query languages. Machinelearning classifiers and probabilistic graphical models (e.g., Conditional Random Fields) use rules for feature generation. They also serve as weak constraints in Markov Logic Networks (and extensions such as DeepDive), and generators of noisy training data in the state-of-the-art Snorkel system.
[1] Benny Kimelfeld,et al. Joining Extractions of Regular Expressions , 2017, PODS.
[2] Frederick Reiss,et al. SystemT: a system for declarative information extraction , 2009, SGMD.
[3] RONALD FAGIN,et al. Document Spanners , 2015, J. ACM.
[4] Stijn Vansummeren,et al. Constant Delay Algorithms for Regular Document Spanners , 2018, PODS.
[5] Antoine Amarilli,et al. Constant-Delay Enumeration for Nondeterministic Document Spanners , 2019, ICDT.