Boosting and structure learning in dynamic Bayesian networks for audio-visual speaker detection

Bayesian networks are an attractive modeling tool for human sensing, as they combine an intuitive graphical representation with efficient algorithms for inference and learning. Earlier work has demonstrated that boosted parameter learning could be used to improve the performance of Bayesian network, classifiers for complex multi-modal inference problems such as speaker detection. In speaker detection, the goal is to use video and audio cites to infer when a person is speaking to a user interface. In this paper we introduce a new boosted structure learning algorithm based on AdaBoost. Given labeled data, our algorithm modifies both the network structure and parameters so as to improve classification accuracy. We compare its performance to both standard structure learning and boosted parameter learning on a fixed structure. We present results for speaker detection and for the UCI "chess" dataset.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Avi Pfeffer,et al.  Object-Oriented Bayesian Networks , 1997, UAI.

[3]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[5]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[6]  Trevor Darrell,et al.  Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.

[7]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[8]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[9]  James M. Rehg,et al.  Vision for a smart kiosk , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[11]  Vladimir Pavlovic,et al.  Audio-visual speaker detection using dynamic Bayesian networks , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[12]  Vladimir Pavlovic,et al.  Multimodal speaker detection using error feedback dynamic Bayesian networks , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Kevin P. Murphy,et al.  Learning the Structure of Dynamic Probabilistic Networks , 1998, UAI.

[14]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[15]  Intille,et al.  Representation and Visual Recognition of Complex , Multi-agent Actions using Belief , 1998 .

[16]  James M. Rehg,et al.  Vision-based speaker detection using Bayesian networks , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[17]  Larry S. Davis,et al.  Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[18]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.