大数据时代下随着计算机数据处理能力的提高,传感技术、音频技术、自动化控制技术得到不断地发展,视频帧和图像信息作为人类通过客观世界获得信息的主要来源之一更是得到了诸多的重视。如今计算机视觉作为当下研究的热潮之一,拥有诸如识别、运动、场景重建、图像恢复等众多技术挑战。其中又以物体识别最为重要。与此同时,众多的物体识别系统却仅仅侧重于物体识别的精度而缺乏其他辅助功能的实现,如何拥有更好的人机交互以及更广阔的市场前景的物体自动识别系统是当下众多开发者所探讨的。在本文中我们将物体识别与语音处理相结合,首先在物体识别算法Tracking-Learning-Detection (TLD)的基础上进行改进,以给定的一类物体的图片数据集为基础,训练出适合于识别该类物体的分类器,从而判断新的物体是否为目标物体,实现对指定一类物体的识别;同时该系统将以语音识别作为人机交互的基础,使用户可以利用语音将图片数据添加到训练集中并更新分类器,同时采用动态规划的方式(DTW)对语音特征进行匹配从而保证了语音识别的准确度。 With the enhancement of the data processing ability of computer, the technology on sensor, audio and automation control has been developed continuously, and the information in video frames and image has got a lot of attention, which is one of the main sources that human obtain information from the world. Computer vision, as one of the present research upsurges, has many technical challenges such as detection, motion, scene reconstruction and image restoration. Object detection is one of the most important challenges. Although there are plenty of object detection systems with high accuracy rate of detection in the market, they lack realization on auxiliary functions so that they provide poor experience on man-machine interaction. Therefore, many developers focus on the topic that how to design a better man-machine interaction of detection system for human so that the detection system can be accepted widely. In this paper, we propose a system framework which contains the technology on object detection and voice processing. Firstly, we make improvement on the algorithm of Tracking-Learning-Detection (TLD). We use the image sets of the object which we want to detect to get a suitable classifier by training algorithm. Then, we can use the classifier to determine whether the new object is the target object and get the aim of detecting the specified object. Then, the system contains the module of speech recognition for a better man- machine interaction so that the user can add the image data to the data set and update the classifier by voice. In order to guarantee the accuracy of speech recognition, we use the Dynamic Time Warping (DTW) to match the phonetic characteristics.
[1]
Jeffrey L. Posdamer.
Computer Geometric Modeling For Machine Perception Of Three-Dimensional Solids
,
1981,
Other Conferences.
[2]
Takeo Kanade,et al.
Algorithms for cooperative multisensor surveillance
,
2001,
Proc. IEEE.
[3]
Yi Wu,et al.
Online Object Tracking: A Benchmark
,
2013,
2013 IEEE Conference on Computer Vision and Pattern Recognition.
[4]
Jitendra Malik,et al.
Robust Multiple Car Tracking with Occlusion Reasoning
,
1994,
ECCV.
[5]
F. L. Engel.
Visual conspicuity, visual search and fixation tendencies of the eye
,
1977,
Vision Research.
[6]
Zdenek Kalal,et al.
Tracking-Learning-Detection
,
2012,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7]
Franco Gori,et al.
Partially polarized Gaussian Schell-model beams
,
2001
.
[8]
Dorin Comaniciu,et al.
Kernel-Based Object Tracking
,
2003,
IEEE Trans. Pattern Anal. Mach. Intell..
[9]
David J. Fleet,et al.
Performance of optical flow techniques
,
1992,
Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.