论文信息 - Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning

Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning

Building on recent prior work that combines Google Street View (GSV) and crowdsourcing to remotely collect information on physical world accessibility, we present the first 'smart' system, Tohme, that combines machine learning, computer vision (CV), and custom crowd interfaces to find curb ramps remotely in GSV scenes. Tohme consists of two workflows, a human labeling pipeline and a CV pipeline with human verification, which are scheduled dynamically based on predicted performance. Using 1,086 GSV scenes (street intersections) from four North American cities and data from 403 crowd workers, we show that Tohme performs similarly in detecting curb ramps compared to a manual labeling approach alone (F- measure: 84% vs. 86% baseline) but at a 13% reduction in time cost. Our work contributes the first CV-based curb ramp detection system, a custom machine-learning based workflow controller, a validation of GSV as a viable curb ramp data source, and a detailed examination of why curb ramp detection is a hard problem along with steps forward.

[1] Walter S. Lasecki,et al. Real-time captioning by groups of non-experts , 2012, UIST.

[2] Daniel J. Hruschka,et al. Reliability in Coding Open-Ended Data: Lessons Learned from HIV Behavioral Research , 2004 .

[3] J B Kirschbaum,et al. DESIGNING SIDEWALKS AND TRAILS FOR ACCESS, PART 2, BEST PRACTICES DESIGN GUIDE , 2001 .

[4] Jon Froehlich,et al. Combining crowdsourcing and google street view to identify street-level accessibility problems , 2013, CHI.

[5] Jon Froehlich,et al. Exploring Early Solutions for Automatically Identifying Inaccessible Sidewalks in the Physical World using Google Street View , 2014 .

[6] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[7] Afshin Dehghan,et al. GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[8] Luc Van Gool,et al. The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[9] Janet M Barlow,et al. Pedestrian Mobility and Safety Audit Guide , 2008 .

[10] Julien O. Teitler,et al. Using Google Street View to audit neighborhood environments. , 2011, American journal of preventive medicine.

[11] Acsw John T. Pardeck. Americans with Disabilities Act of 1990 , 1997 .

[12] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13] Jonathon Shlens,et al. Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Mubarak Shah,et al. Street View Challenge: Identification of Commercial Entities in Street View Imagery , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[15] H. Badland,et al. Can Virtual Streetscape Audits Reliably Replace Physical Streetscape Audits? , 2010, Journal of Urban Health.

[16] Pietro Perona,et al. Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[17] Christian Früh,et al. Google Street View: Capturing the World at Street Level , 2010, Computer.

[18] Alexei A. Efros,et al. Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19] Richard Szeliski,et al. Street slide: browsing street level imagery , 2010, ACM Trans. Graph..

[20] Bohn Stafleu van Loghum,et al. Online … , 2002, LOG IN.

[21] J. White. The Americans with Disabilities Act. , 1996, Ohio nurses review.

[22] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Peng Dai,et al. Artificial Intelligence for Artificial Artificial Intelligence , 2011, AAAI.

[24] Jennifer Ailshire,et al. Using Google Earth to conduct a neighborhood audit: reliability of a virtual audit instrument. , 2010, Health & place.

[25] Mubarak Shah,et al. Accurate Image Localization Based on Google Maps Street View , 2010, ECCV.

[26] Richard E. Ladner,et al. The design of human-powered access technology , 2011, ASSETS.

[27] D Ellek,et al. The Americans with Disabilities Act of 1990. , 1991, The American journal of occupational therapy : official publication of the American Occupational Therapy Association.

[28] Frédéric Jurie,et al. Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[29] Alexei A. Efros,et al. Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[30] ZissermanAndrew,et al. The Pascal Visual Object Classes Challenge , 2015 .

[31] Jon Froehlich,et al. A feasibility study of crowdsourcing and google street view to determine sidewalk accessibility , 2012, ASSETS '12.

[32] Jianxiong Xiao,et al. Image-based street-side city modeling , 2009, ACM Trans. Graph..

[33] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[34] Pietro Perona,et al. Visual Recognition with Humans in the Loop , 2010, ECCV.

[35] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[36] Eric Horvitz,et al. Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[37] Benjamin B. Bederson,et al. Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[38] Jon Froehlich,et al. An Initial Study of Automatic Curb Ramp Detection with Crowdsourced Verification Using Google Street View Images , 2013, HCOMP.

[39] Michael S. Bernstein,et al. Scalable multi-label annotation , 2014, CHI.

[40] Khai N. Truong,et al. CrossingGuard: exploring information content in navigation aids for visually impaired pedestrians , 2012, CHI.

[41] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Jon Froehlich,et al. Improving public transit accessibility for blind riders by crowdsourcing bus stop landmark locations with Google street view , 2013, ASSETS.

[43] Kristen Grauman,et al. Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[44] Hao Su,et al. Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[45] Mausam,et al. Dynamically Switching between Synergistic Workflows for Crowdsourcing , 2012, AAAI.

[46] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[47] Rob Miller,et al. VizWiz: nearly real-time answers to visual questions , 2010, UIST.