Recursive Tower of Knowledge for Learning to Interpret Scenes

The Tower of Knowledge architecture integrates probability theory and logic for making decisions. The scheme models the causal dependencies between the functionalities of objects and their descriptions, and then employs the maximum expected utility principle, which combines probability theory and logic, to select the most appropriate label for the object. S ince most existing scene interpretation methods rely heavily on training data, we develop in this paper a recursive version of ToK to avoid such dependency. Recursive ToK learns the prior distributions iteratively from the decisi ons of labelling components made in the last iteration, partly by functionaliti es of components, and partly by the already learnt prior distributions in prev ious iterations. To validate our method in the domain of 3D outdoor scene interpretation, we compare ToK against a state-of-the-art method, Expandable Bayesian Networks (EBN), for labelling components of buildings. Experimental results then show that the labelling accuracy of ToK is superior to that of EBN. Also, these results reveal that recursive ToK improves the accuracy of ToK for labelling 3D components in the worst case when lacking any training data.

[1]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[2]  Nikos Komodakis,et al.  Fast, Approximately Optimal Solutions for Single and Dynamic MRFs , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Maria Petrou,et al.  The Tower of Knowledge Scheme for Learning in Computer Vision , 2007, 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (DICTA 2007).

[4]  Maria Petrou,et al.  Non-Gibbsian Markov Random Field Models for Contextual Labelling of Structured Scenes , 2007, BMVC.

[5]  Tod S. Levitt,et al.  Evidential Reasoning for Object Recognition , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[7]  Heiko Wersing,et al.  Evolutionary optimization of a hierarchical object recognition model , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Bernd Neumann,et al.  Navigating through Logic-Based Scene Models for High-Level Scene Interpretations , 2003, ICVS.

[9]  Li Cheng,et al.  Component optimization for image understanding: a Bayesian approach , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[11]  Maria Petrou,et al.  Learning in Computer Vision: Some Thoughts , 2007, CIARP.

[12]  Roberto Cipolla,et al.  Modelling and Interpretation of Architecture from Several Images , 2004, International Journal of Computer Vision.

[13]  Logic and Probability for Scene Interpretation, 24.02. - 29.02.2008 , 2008, Logic and Probability for Scene Interpretation.

[14]  Lucas Paletta,et al.  Urban Object Recognition from Informative Local Features , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[15]  M. Thonnat,et al.  Symbol Grounding for Semantic Image Interpretation: From Image Data to Semantics , 2005, Tenth IEEE International Conference on Computer Vision Workshops (ICCVW'05).

[16]  Ramakant Nevatia,et al.  Expandable Bayesian Networks for 3D Object Description from Multiple Views and Multiple Mode Inputs , 2003, IEEE Trans. Pattern Anal. Mach. Intell..