Bayesian networks for incomplete data analysis in form processing

In this paper, we study Bayesian network (BN) for form identification based on partially filled fields. It uses electronic ink-tracing files without having any information about form structure. Given a form format, the ink-tracing files are used to build the BN by providing the possible relationships between corresponding fields using conditional probabilities, that goes from individual fields up to the complete model construction. To simplify the BN, we sub-divide a single form into three different areas: header, body and footer, and integrate them together, where we study three fundamental BN learning algorithms: Naive, Peter & Clark and maximum weighted spanning tree. Under this framework, we validate it with a real-world industrial problem i.e., electronic note-taking in form processing. The approach provides satisfactory results, attesting the interest of BN for exploiting the incomplete form analysis problems, in particular.

[1]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[2]  Olivier François,et al.  Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets , 2006, Probabilistic Graphical Models.

[3]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[4]  Yu-Lin He,et al.  Non-Naive Bayesian Classifiers for Classification Problems With Continuous Attributes , 2014, IEEE Transactions on Cybernetics.

[5]  Benjamin Piwowarski,et al.  Un modèle pour la recherche d’information sur des documents structurés , 2002 .

[6]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[7]  Ludovic Denoyer,et al.  Bayesian network model for semi-structured document classification , 2004, Inf. Process. Manag..

[8]  Laurence Likforman-Sulem,et al.  Recognition of degraded characters using dynamic Bayesian networks , 2008, Pattern Recognit..

[9]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[10]  Abdessamad Kobi,et al.  Multivariate control charts with a bayesian network , 2007, ICINCO-ICSO.

[11]  Davy Weissenbacher,et al.  Understand the effects of erroneous annotations produced by NLP pipelines, a case study on the pronominal anaphora resolution , 2011, Trait. Autom. des Langues.

[12]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  Luis Enrique Sucar,et al.  Introduction to Bayesian Networks and Influence Diagrams , 2012 .

[14]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[15]  Laurence Likforman-Sulem,et al.  Combination of dynamic Bayesian network classifiers for the recognition of degraded characters , 2009, Electronic Imaging.

[16]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[17]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[18]  Liangxiao Jiang,et al.  Learning Tree Augmented Naive Bayes for Ranking , 2005, DASFAA.

[19]  Bart Lamiroy,et al.  Relative Positioning of Stroke-Based Clustering: a New Approach to Online Handwritten Devanagari Character Recognition , 2012, Int. J. Image Graph..

[20]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[21]  Saddok Kebairi,et al.  A Statistical Method for an Automatic Detection of Form Types , 1998, Document Analysis Systems.

[22]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[23]  Yung C. Shin,et al.  A variational Bayesian framework for group feature selection , 2012, International Journal of Machine Learning and Cybernetics.

[24]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[25]  Moises Goldszmidt Bayesian Network Classifiers , 2011 .

[26]  Steffen L. Lauritzen,et al.  Bayesian updating in causal probabilistic networks by local computations , 1990 .

[27]  Marc Parizeau,et al.  Bayesian networks classifiers applied to documents , 2002, Object recognition supported by user interaction for service robots.

[28]  Mohamed Ali Mahjoub,et al.  Indexation de structures de documents par réseaux bayésiens , 2010, CORIA.

[29]  Laurence Likforman-Sulem,et al.  A comparative study between decision fusion and data fusion in Markovian printed character recognition , 2002, Object recognition supported by user interaction for service robots.

[30]  Jin Hyung Kim,et al.  Bayesian network modeling of Hangul characters for online handwriting recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[31]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[32]  Kwong-Sak Leung,et al.  An efficient data mining method for learning Bayesian networks using an evolutionary algorithm-based hybrid approach , 2004, IEEE Transactions on Evolutionary Computation.

[33]  Jean-Marc Ogier,et al.  Form recognition from ink strokes on tablet , 2010, DAS '10.

[34]  LIANGXIAO JIANG,et al.  Discriminatively Weighted Naive Bayes and its Application in Text Classification , 2012, Int. J. Artif. Intell. Tools.

[35]  Emilie Philippot,et al.  Bayesian Networks Learning Algorithms for Online Form Classification , 2010, 2010 20th International Conference on Pattern Recognition.

[36]  Nir Friedman,et al.  Building Classifiers Using Bayesian Networks , 1996, AAAI/IAAI, Vol. 2.

[37]  Takeshi Nagasaki,et al.  Development of Template-Free Form Recognition System , 2011, 2011 International Conference on Document Analysis and Recognition.

[38]  Davy Weissenbacher,et al.  Bayesian Network, a Model for NLP? , 2006, EACL.

[39]  Liangxiao Jiang,et al.  Scaling Up the Accuracy of Bayesian Network Classifiers by M-Estimate , 2007, ICIC.

[40]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[41]  Abdel Belaïd Recognition of table of contents for electronic library consulting , 2001, International Journal on Document Analysis and Recognition.

[42]  Liangxiao Jiang,et al.  Bayesian Citation-KNN with distance weighting , 2014, Int. J. Mach. Learn. Cybern..

[43]  Yu-Lin He,et al.  Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis , 2014, Inf. Sci..