Learning spatial relations in object recognition

This paper studies two types of spatial relationships that can be learned from training examples for object recognition. The first one employs deformable relationships between object parts with a Gaussian model, while the second one describes pairwise relationships between pixel intensity values using Bayesian networks. We perform experiments on a human face dataset and a horse dataset, imposing the same amount of annotation of training data, which can be seen as sending knowledge to the learning algorithms. The result indicates that the Bayesian network method compares favorably to the deformable model, as it can capture long-distance stable relations in the object appearance. We also conclude that both methods are superior to strictly spatial matching by template and strictly non-spatial classifiers.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Donald Geman,et al.  An Active Testing Model for Tracking Roads in Satellite Images , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  H. Bunke,et al.  Graph matching for visual object recognition. , 2000, Spatial vision.

[6]  Gang Wei,et al.  Face detection for image annotation , 1999, Pattern Recognition Letters.

[7]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[8]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[9]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[10]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[11]  G. H. Landeweerd,et al.  Classification of normal and abnormal samples of peripheral blood by linear mapping of the feature space , 1983, Pattern Recognit..

[12]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[13]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Andrew Zisserman,et al.  Viewpoint invariant texture matching and wide baseline stereo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Bernhard Schölkopf,et al.  From Regularization Operators to Support Vector Kernels , 1997, NIPS.

[16]  Michel Minoux,et al.  Graphs and Algorithms , 1984 .

[17]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[18]  James S. Duncan,et al.  Arrangement: A Spatial Relation Between Parts for Evaluating Similarity of Tomographic Section , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Andrew Blake,et al.  A probabilistic contour discriminant for object localisation , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[20]  Philip N. Klein,et al.  Recognition of shapes by editing their shock graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  David A. Forsyth,et al.  Automatic Detection of Human Nudes , 1999, International Journal of Computer Vision.

[22]  Shree K. Nayar,et al.  Automatic generation of RBF networks using wavelets , 1996, Pattern Recognit..

[23]  Marcel Worring,et al.  Face detection by aggregated Bayesian network classifiers , 2001, Pattern Recognit. Lett..

[24]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[25]  Alan L. Yuille,et al.  An A* perspective on deterministic optimization for deformable templates , 2000, Pattern Recognit..

[26]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[27]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  S. Mallat A wavelet tour of signal processing , 1998 .

[31]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[32]  Akram Aldroubi,et al.  B-SPLINE SIGNAL PROCESSING: PART II-EFFICIENT DESIGN AND APPLICATIONS , 1993 .

[33]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Hermann Ney,et al.  Discriminative training for object recognition using image patches , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Thomas S. Huang,et al.  Face detection with information-based maximum discrimination , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[38]  Michael Isard,et al.  Bayesian Object Localisation in Images , 2001, International Journal of Computer Vision.

[39]  Arnold W. M. Smeulders,et al.  Strings: Variational Deformable Models of Multivariate Ordered Features , 2001 .

[40]  Hermann Ney,et al.  On the Probabilistic Interpretation of Neural Network Classifiers and Discriminative Training Criteria , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Anuj Srivastava,et al.  Analysis of planar shapes using geodesic paths on shape spaces , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Timothy F. Cootes,et al.  Training Models of Shape from Sets of Examples , 1992, BMVC.

[43]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[44]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[45]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[46]  Paul A. Viola,et al.  Boosting Image Retrieval , 2004, International Journal of Computer Vision.

[47]  Vladimir Pavlovic,et al.  Multimodal speaker detection using error feedback dynamic Bayesian networks , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[48]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[49]  Anuj Srivastava,et al.  Optimal linear representations of images for object recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[51]  Arnold W. M. Smeulders,et al.  Statistical strategy for object class recognition using part detectors , 2001 .

[52]  Roberto Cipolla,et al.  Scale and Orientation Invariance in Human Face Detection , 1996, BMVC.

[53]  Max J. Egenhofer,et al.  Query Processing in Spatial-Query-by-Sketch , 1997, J. Vis. Lang. Comput..

[54]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  D L Streiner,et al.  An Introduction to Multivariate Statistics , 1993, Canadian journal of psychiatry. Revue canadienne de psychiatrie.

[56]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[57]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[58]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[59]  Pietro Perona,et al.  Recognition of planar object classes , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  Theodosios Pavlidis,et al.  A shape analysis model with applications to a character recognition system , 1992, [1992] Proceedings IEEE Workshop on Applications of Computer Vision.

[61]  Vladimir Vapnik Estimations of dependences based on statistical data , 1982 .

[62]  Nuno Vasconcelos,et al.  The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition , 2004, ECCV.

[63]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[64]  P. Jonathon Phillips Matching pursuit filters applied to face identification , 1998, IEEE Trans. Image Process..

[65]  Anuj Srivastava,et al.  Probability Models for Clutter in Natural Images , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[66]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[67]  Akram Aldroubi,et al.  B-SPLINE SIGNAL PROCESSING: PART I-THEORY , 1993 .

[68]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[69]  Peter Auer,et al.  Object recognition using segmentation for feature detection , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[70]  K. Mardia,et al.  General shape distributions in a plane , 1991, Advances in Applied Probability.

[71]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[74]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .

[75]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[76]  James S. Duncan,et al.  Boundary Finding with Parametrically Deformable Models , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[77]  Lucas J. van Vliet,et al.  Recursive Gaussian derivative filters , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[78]  Ali Shokoufandeh,et al.  Shock Graphs and Shape Matching , 1998, International Journal of Computer Vision.

[79]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[80]  van Marie-Colette Lieshout,et al.  Recognition of overlapping objects using Markov spatial processes , 1991 .

[81]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[82]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[83]  Joseph A. O'Sullivan,et al.  Automatic target recognition organized via jump-diffusion algorithms , 1997, IEEE Trans. Image Process..

[84]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[85]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[86]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[87]  Franz Josef Radermacher,et al.  Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[88]  William M. Wells,et al.  Efficient Synthesis of Gaussian Filters by Cascaded Uniform Filters , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[90]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[91]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[92]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[93]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[94]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[95]  Michael C. Burl,et al.  Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[96]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[97]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[98]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[99]  J. Andrade-Cetto Object Recognition , 2003 .

[100]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[101]  Robert E. Tarjan,et al.  Finding optimum branchings , 1977, Networks.

[102]  Joost van de Weijer,et al.  Fast Anisotropic Gauss Filtering , 2002, ECCV.

[103]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[104]  Song-Chun Zhu,et al.  Prior Learning and Gibbs Reaction-Diffusion , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[105]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[106]  Dan Roth,et al.  Learning a Sparse Representation for Object Detection , 2002, ECCV.

[107]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).