SLATE: A Sequence Labeling Approach for Task Extraction from Free-form Inked Content

We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or"inked") notes on a virtual whiteboard. Our approach allows us to create a single, low-latency model to simultaneously perform sentence segmentation and classification of these sentences into task/non-task sentences. SLATE greatly outperforms a baseline two-model (sentence segmentation followed by classification model) approach, achieving a task F1 score of 84.4%, a sentence segmentation (boundary similarity) score of 88.4% and three times lower latency compared to the baseline. Furthermore, we provide insights into tackling challenges of performing NLP on the inking domain. We release both our code and dataset for this novel task.

[1]  Wei Wei,et al.  A Survey on Recent Advances in Sequence Labeling from Deep Learning Models , 2020, ArXiv.

[2]  Michael Crawshaw,et al.  Multi-Task Learning with Deep Neural Networks: A Survey , 2020, ArXiv.

[3]  Xinyue Liu,et al.  SeqVAT: Virtual Adversarial Training for Semi-Supervised Sequence Labeling , 2020, ACL.

[4]  Josef Ruppenhofer,et al.  Improving Sentence Boundary Detection for Spoken Language Transcripts , 2020, LREC.

[5]  The Anh Le,et al.  Sequence Labeling Approach to the Task of Sentence Boundary Detection , 2020, ICMLSC.

[6]  Teven Le Scao,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[7]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[8]  Paul N. Bennett,et al.  Context-Aware Intent Identification in Email Conversations , 2019, SIGIR.

[9]  Olivier Ferret,et al.  Evaluation of a Sequence Tagging Tool for Biomedical Texts , 2018, Louhi@EMNLP.

[10]  Victor Carbune,et al.  Multi-Language Online Handwriting Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Chris Fournier,et al.  Evaluating Text Segmentation using Boundary Edit Distance , 2013, ACL.

[12]  Paloma Martínez,et al.  SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013) , 2013, *SEMEVAL.

[13]  Christoph Meinel,et al.  Handwriting recognition for a digital whiteboard collaboration platform , 2012, 2012 International Conference on Collaboration Technologies and Systems (CTS).

[14]  Diana Inkpen,et al.  Segmentation Similarity and Agreement , 2012, NAACL.

[15]  Manmohan Singh Documents , 2006, The Jews in Sicily, Volume 8 (1490-1497).

[16]  Michael Shilman,et al.  Grouping text lines in freeform handwritten notes , 2005, IEEE International Conference on Document Analysis and Recognition.

[17]  Paul N. Bennett,et al.  Detecting action-items in e-mail , 2005, SIGIR '05.

[18]  Ming Ye,et al.  Learning to parse hierarchical lists and outlines using conditional random fields , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[19]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[20]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[21]  Mark Stevenson,et al.  Experiments on Sentence Boundary Detection , 2000, ANLP.

[22]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[23]  Beth Sundheim,et al.  MUC-5 Evaluation Metrics , 1993, MUC.

[24]  Richard M. Karp,et al.  An algorithm to solve the m × n assignment problem in expected time O(mn log n) , 1980, Networks.

[25]  Linlin Li,et al.  DM_NLP at SemEval-2018 Task 8: neural sequence labeling with linguistic features , 2018, *SEMEVAL.