Summary of the 4th workshop on analytics for noisy unstructured text data (AND)

Noisy unstructured text data is ubiquitous in real-world communication. Natural language and the creative ways that humans use it can create problems for computational techniques. Electronic text from the Internet (emails, message boards, newsgroups, blogs, microblogs, wikis, chatlogs and web pages), contact centers (complaints, emails, call transcriptions, message summaries), and mobile phones (SMS) is often noisy – contains spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuation, missing case information and special characters. Informal communications are not the only source of noisy text; Text produced by processing signals intended for human use such as printed/handwritten documents, spontaneous speech, and camera-captured scene images, are also noisy.