Tell me what

“Tell Me What” is smart phone based image recognition system, and it is also an automatic pipeline for generating image recognition systems to recognize an arbitrary set of entities. For any given set of entities, “Tell Me What” backend system automatically fetches related image data from the Internet for each entity, and then run a comprehensive data cleaning process to purify the data. A multi-class classifier and inverted index are then built based on the cleaned data. For an unknown new image captured by a camera, the user is allowed to optionally highlight regions and then a classification process and a search process are applied to get recognition results. Distributed computing techniques are applied to ensure that the backend model and index generation processes can be done in a few hours.

[1]  Jason Weston,et al.  Label Partitioning For Sublinear Ranking , 2013, ICML.

[2]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[3]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.