Internet visual media processing: a survey with graphics and vision applications

In recent years, the computer graphics and computer vision communities have devoted significant attention to research based on Internet visual media resources. The huge number of images and videos continually being uploaded by millions of people have stimulated a variety of visual media creation and editing applications, while also posing serious challenges of retrieval, organization, and utilization. This article surveys recent research as regards processing of large collections of images and video, including work on analysis, manipulation, and synthesis. It discusses the problems involved, and suggests possible future directions in this emerging research area.

[1]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[2]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[3]  John K. Tsotsos Analyzing vision at the complexity level , 1990, Behavioral and Brain Sciences.

[4]  Toshikazu Kato,et al.  Query by Visual Example - Content based Image Retrieval , 1992, EDBT.

[5]  William A. Barrett,et al.  Intelligent scissors for image composition , 1995, SIGGRAPH.

[6]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[7]  Alberto Del Bimbo,et al.  Visual Image Retrieval by Elastic Matching of User Sketches , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Alex M. Andrew,et al.  Level Set Methods and Fast Marching Methods: Evolving Interfaces in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science (2nd edition) , 2000 .

[10]  Touradj Ebrahimi,et al.  The JPEG2000 still image coding system: an overview , 2000, IEEE Trans. Consumer Electron..

[11]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[13]  Klaus Mueller,et al.  Transferring color to greyscale images , 2002, ACM Trans. Graph..

[14]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[15]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[16]  Björn Stenger,et al.  Shape context and chamfer matching in cluttered scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[18]  Ramesh C. Jain,et al.  Content Based Image Synthesis , 2004, CIVR.

[19]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  Jian Sun,et al.  Poisson matting , 2004, ACM Trans. Graph..

[21]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[22]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[23]  Dani Lischinski,et al.  Colorization using optimization , 2004, ACM Trans. Graph..

[24]  Pietro Perona,et al.  Is bottom-up attention useful for object recognition? , 2004, CVPR 2004.

[25]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[28]  Zeev Farbman,et al.  Interactive local adjustment of tonal values , 2006, ACM Trans. Graph..

[29]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[30]  Roberto Cipolla,et al.  Semantic Photo Synthesis , 2006, Comput. Graph. Forum.

[31]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[33]  Gareth Funka-Lea,et al.  Graph Cuts and Efficient N-D Image Segmentation , 2006, International Journal of Computer Vision.

[34]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[35]  Laurent Itti,et al.  An Integrated Model of Top-Down and Bottom-Up Attention for Optimizing Detection Speed , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[37]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[38]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[39]  Sylvain Paris,et al.  Real-time edge-aware image processing with the bilateral grid , 2007, ACM Trans. Graph..

[40]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[41]  Richard Szeliski,et al.  Modeling the World from Internet Photo Collections , 2008, International Journal of Computer Vision.

[42]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[43]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[44]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[46]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[47]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Leo Grady,et al.  A Seeded Image Segmentation Framework Unifying Graph Cuts And Random Walker Which Yields A New Algorithm , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  Richard Szeliski,et al.  Finding paths through the world's photos , 2008, ACM Trans. Graph..

[50]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[51]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Stephen Lin,et al.  Intrinsic colorization , 2008, ACM Trans. Graph..

[53]  Michal Irani,et al.  What Is a Good Image Segment? A Unified Approach to Segment Extraction , 2008, ECCV.

[54]  Shree K. Nayar,et al.  Priors for Large Photo Collections and What They Reveal about Cameras , 2008, ECCV.

[55]  Shree K. Nayar,et al.  Face swapping: automatically replacing faces in photographs , 2008, SIGGRAPH 2008.

[56]  Luc Van Gool,et al.  World-scale mining of objects and events from community photo collections , 2008, CIVR '08.

[57]  Fabio Pellacini,et al.  AppProp: all-pairs appearance-space edit propagation , 2008, ACM Trans. Graph..

[58]  Shumeet Baluja,et al.  VisualRank: Applying PageRank to Large-Scale Image Search , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[60]  Longin Jan Latecki,et al.  Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[62]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[63]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Maneesh Agrawala,et al.  Edge-based image coarsening , 2009, TOGS.

[65]  Zeev Farbman,et al.  Coordinates for instant image cloning , 2009, ACM Trans. Graph..

[66]  Hans-Peter Seidel,et al.  Relighting objects from image collections , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[68]  Jian Sun,et al.  SkyFinder: attribute-based sky image search , 2009, ACM Trans. Graph..

[69]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[70]  Dariu Gavrila,et al.  Monocular Pedestrian Detection: Survey and Experiments , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Ping Tan,et al.  Photometric stereo and weather estimation using internet images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[73]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[74]  Daniel P. Huttenlocher,et al.  Landmark classification in large-scale image collections , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[75]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[76]  Hartwig Adam,et al.  Tour the world: Building a web-scale landmark recognition engine , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Longin Jan Latecki,et al.  Shape band: A deformable object detection approach , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[78]  Shi-Min Hu,et al.  Efficient affinity-based edit propagation using K-D tree , 2009, SIGGRAPH 2009.

[79]  Hua Huang,et al.  Example-based contrast enhancement by gradient mapping , 2010, The Visual Computer.

[80]  Shi-Min Hu,et al.  Instant Propagation of Sparse Edits on Images and Videos , 2010, Comput. Graph. Forum.

[81]  Shi-Min Hu,et al.  RepFinder: finding approximately repeated scene elements for image editing , 2010, ACM Trans. Graph..

[82]  Hans-Peter Seidel,et al.  Contrast prescription for multiscale image editing , 2010, The Visual Computer.

[83]  Yongdong Zhang,et al.  Graph-based multi-space semantic correlation propagation for video retrieval , 2010, The Visual Computer.

[84]  Richard Szeliski,et al.  Towards Internet-scale multi-view stereo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[85]  Leonidas J. Guibas,et al.  Image webs: Computing and exploiting connectivity in image collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[86]  Jan-Michael Frahm,et al.  Building Rome on a Cloudless Day , 2010, ECCV.

[87]  Xiaodong Wu,et al.  Optimal multiple-seams search for image resizing with smoothness and shape prior , 2010, The Visual Computer.

[88]  Hua Huang,et al.  Example-based painting guided by color features , 2010, The Visual Computer.

[89]  Jinxiang Dong,et al.  Selective image abstraction , 2010, The Visual Computer.

[90]  Ruofeng Tong,et al.  Content-aware copying and pasting in images , 2010, The Visual Computer.

[91]  Noah A. Smith,et al.  Proceedings of NIPS , 2010, NIPS 2010.

[92]  Lizhuang Ma,et al.  Seamless video composition using optimized mean-value cloning , 2010, The Visual Computer.

[93]  Chun Chen,et al.  Data-driven image color theme enhancement , 2010, SIGGRAPH 2010.

[94]  Ligang Liu,et al.  Nonhomogeneous scaling optimization for realtime image resizing , 2010, The Visual Computer.

[95]  Liqing Zhang,et al.  MindFinder: interactive sketch-based image search on millions of images , 2010, ACM Multimedia.

[96]  Zhuowen Tu,et al.  Learning Context-Sensitive Shape Similarity by Graph Transduction , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[97]  Fang Liu,et al.  A GPU-based matting Laplacian solver for high resolution image matting , 2010, The Visual Computer.

[98]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[99]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[100]  Dong Wang,et al.  Saliency-driven scaling optimization for image retargeting , 2011, The Visual Computer.

[101]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[102]  Jizhou Sun,et al.  Video dehazing with spatial and temporal coherence , 2011, The Visual Computer.

[103]  Ming-Ming Cheng,et al.  Connectedness of Random Walk Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  Yong Jae Lee,et al.  ShadowDraw: real-time user guidance for freehand drawing , 2011, ACM Trans. Graph..

[105]  Marc Alexa,et al.  Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors , 2011, IEEE Transactions on Visualization and Computer Graphics.

[106]  Hong Liu,et al.  Web-image driven best views of 3D shapes , 2011, The Visual Computer.

[107]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[108]  Qunsheng Peng,et al.  Robust image segmentation against complex color distribution , 2011, The Visual Computer.

[109]  Hua Huang,et al.  Painterly rendering with content-dependent natural paint strokes , 2011, The Visual Computer.

[110]  Brian Curless,et al.  Candid portrait selection from video , 2011, ACM Trans. Graph..

[111]  Wojciech Matusik,et al.  CG2Real: Improving the Realism of Computer Generated Images Using a Large Collection of Photographs , 2011, IEEE Transactions on Visualization and Computer Graphics.

[112]  Jan-Michael Frahm,et al.  Modeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs , 2008, International Journal of Computer Vision.

[113]  Rynson W. H. Lau,et al.  A gradient-domain-based edge-preserving sharpen filter , 2011, The Visual Computer.

[114]  Stephen Lin,et al.  Semantic colorization with internet images , 2011, ACM Trans. Graph..

[115]  Noah Snavely,et al.  Scene Reconstruction and Visualization from Internet Photo Collections: A Survey , 2011, IPSJ Trans. Comput. Vis. Appl..

[116]  Hua Huang,et al.  Arcimboldo-like collage using internet images , 2011, ACM Trans. Graph..

[117]  Yun Zhang,et al.  Environment-Sensitive cloning in images , 2011, The Visual Computer.

[118]  Marc Alexa,et al.  Photosketcher: Interactive Sketch-Based Image Synthesis , 2011, IEEE Computer Graphics and Applications.

[119]  Shi-Min Hu,et al.  ImageAdmixture: Putting Together Dissimilar Objects from Groups , 2012, IEEE Transactions on Visualization and Computer Graphics.

[120]  Xun Wang,et al.  Adaptive tone-preserved image detail enhancement , 2012, The Visual Computer.

[121]  Chunxia Xiao,et al.  Fast image dehazing using guided joint bilateral filter , 2012, The Visual Computer.

[122]  C. Theobalt,et al.  Videoscapes: exploring sparse, unstructured video collections , 2012, ACM Trans. Graph..

[123]  Ligang Liu,et al.  Interactive two-scale color-to-gray , 2012, The Visual Computer.

[124]  Julie Dorsey,et al.  Understanding and improving the realism of image composites , 2012, ACM Trans. Graph..

[125]  Shih-Fu Chang,et al.  Mobile product search with Bag of Hash Bits and boundary reranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[126]  Shi-Min Hu,et al.  Data‐Driven Object Manipulation in Images , 2012, Comput. Graph. Forum.

[127]  Masahiro Toyoura,et al.  Automatic generation of accentuated pencil drawing with saliency map and LIC , 2012, The Visual Computer.

[128]  Alexei A. Efros,et al.  What makes Paris look like Paris? , 2015, Commun. ACM.

[129]  Jin Wei,et al.  Timeline Editing of Objects in Video , 2013, IEEE Transactions on Visualization and Computer Graphics.

[130]  Shi-Min Hu,et al.  PoseShop: Human Image Database Construction and Personalized Content Synthesis , 2013, IEEE Transactions on Visualization and Computer Graphics.