Isolating Natural Problem Environments in Unconstrained Natural Language Processing: Corruption and Skew

This work examines the full range of commonly available natural language processors' behaviors in a natural, unconstrained, and unguided environment.  While permissible for typical research to constrain the language environment and to use in-depth knowledge to guide the processor for enhanced accuracy, this work purposefully avoids a clean laboratory in favor of a natural, chaotic, and uncontrollable environment. This shifts the focus towards natural processor behaviors in natural, unknown environments. This work provides a standardized comparison framework to compare and contrast each of a full range of processors' theoretical strengths.  It continues to examine empirical behaviors on a full range of environments from typically used baseline sample documents, to actual raw natural texts used in an intent marketing business, to a series of increasingly corrupted and inconsistent sample documents to further differentiate processor behaviors.  In all cases, the texts are unconstrained and the processors operate in their most naive, default forms.  Results complement and extend prior work.  It adds that accuracy-centric processors like artificial neural networks or support vector machines require both highly constrained environments and in-depth knowledge of the processor to operate.  Descriptive-centric processors like k-nearest neighbors, Rocchio, and naive Bayes require only highly constrained environments.  An explanatory-centric neurocognitive processor like Adaptive Resonance Theory can operate robustly with neither environmental constraint nor in-depth processing knowledge, but exposes operations to basic human temporal neurocognitive behaviors.