Data-driven methods for interactive visual content creation and manipulation

Software tools for creating and manipulating visual content --- be they for images, video or 3D models --- are often difficult to use and involve a lot of manual interaction at several stages of the process. Coupled with long processing and acquisition times, content production is rather costly and poses a potential barrier to many applications. Although cameras now allow anyone to easily capture photos and video, tools for manipulating such media demand both artistic talent and technical expertise. However, at the same time, vast corpuses with existing visual content such as Flickr, YouTube or Google 3D Warehouse are now available and easily accessible. This thesis proposes a data-driven approach to tackle the above mentioned problems encountered in content generation. To this end, statistical models trained on semantic knowledge harvested from existing visual content corpuses are created. Using these models, we then develop tools which are easy to learn and use, even by novice users, but still produce high-quality content. These tools have intuitive interfaces, and enable the user to have precise and flexible control. Specifically, we apply our models to create tools to simplify the tasks of video manipulation, 3D modeling and material assignment to 3D objects. Softwarewerkzeuge zum Erstellen und Bearbeiten von visuellen Inhalten --- seien es Bilder, Videos oder 3D-Modelle --- sind haufig schwierig zu bedienen und erfordern viel manuelle Interaktion an verschiedenen Stellen des Verfahrens. In Verbindung mit langen Bearbeitungs- und Erfassungszeiten ist die Erzeugung von Inhalten eher aufwendig und stellt ein potentielles Hindernis fur viele Anwendungen dar. Obwohl heute Kameras jedem Anwender auf einfache Art und Weise erlauben Bilder und Videos aufzunehmen, erfordern Werkzeuge zur Bearbeitung dieser sowohl kunstlerisches Talent, als auch technische Kompetenz. Gleichzeitig sind riesige Korpora mit bereits vorhandenen visuellen Inhalten, wie zum Beispiel Flickr, Youtube oder Google 3D Warehouse, verfugbar und leicht zuganglich. Diese Arbeit stellt einen datengetriebenen Ansatz vor, der die erwahnten Probleme der Inhaltserzeugung behandelt. Zu diesem Zweck werden statistische Modelle erzeugt, die auf semantischem Wissen trainiert worden sind, welches aus bestehenden Korpora von visuellen Inhalten gesammelt worden ist. Durch die Verwendung dieser Modelle ist es moglich Werkzeuge zu entwickeln, die sogar von unerfahrenen Anwendern einfach zu erlernen und zu benutzen sind, aber dennoch qualitativ hochwertige Inhalte produzieren. Diese Werkzeuge haben intuitive Benutzeroberflachen und geben dem Benutzer eine prazise und flexible Kontrolle. Insbesondere werden die Modelle eingesetzt, um Werkzeuge zu erzeugen, die Aufgaben Videobearbeitung, 3D-Modellerstellung und Materialzuweisung zu 3D-Modellen vereinfachen.

[1]  Jun Li,et al.  Symmetry Hierarchy of Man‐Made Objects , 2011, Comput. Graph. Forum.

[2]  Leif Kobbelt,et al.  Character animation from 2D pictures and 3D motion data , 2007, TOGS.

[3]  George Wolberg,et al.  Image morphing: a survey , 1998, The Visual Computer.

[4]  David Salesin,et al.  A sketching interface for articulated figure animation , 2006, SIGGRAPH 2006.

[5]  Fabio Pellacini,et al.  Toward evaluating material design interface paradigms for novice users , 2010, ACM Trans. Graph..

[6]  R. Florida The Rise of the Creative Class : And How It's Transforming Work, Leisure, Community and Everyday Life , 2003 .

[7]  Hans-Peter Seidel,et al.  A Graph-Based Approach to Symmetry Detection , 2008, VG/PBG@SIGGRAPH.

[8]  Sylvain Lefebvre,et al.  Assisted texture assignment , 2010, I3D '10.

[9]  W. Heidrich,et al.  Texture Replacement of Garments in Monocular Video Sequences , 2022 .

[10]  Frédo Durand,et al.  Image-driven navigation of analytical BRDF models , 2006, EGSR '06.

[11]  Alexei A. Efros,et al.  Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships , 2009, NIPS.

[12]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[13]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  M. Teschner,et al.  Meshless deformations based on shape matching , 2005, SIGGRAPH 2005.

[15]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Daniel Cohen-Or,et al.  SnapPaste: an interactive technique for easy mesh composition , 2006, The Visual Computer.

[17]  M. V. D. Panne,et al.  Joint-aware manipulation of deformable models , 2009, SIGGRAPH 2009.

[18]  Leonidas J. Guibas,et al.  Exploration of continuous variability in collections of 3D shapes , 2011, ACM Trans. Graph..

[19]  Nadia Magnenat-Thalmann,et al.  An example-based approach to human body manipulation , 2004, Graph. Model..

[20]  Rómer Rosales,et al.  Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation , 2006, International Journal of Computer Vision.

[21]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  H. Seidel,et al.  A connection between partial symmetry and inverse procedural modeling , 2010, SIGGRAPH 2010.

[23]  Daniel Cohen-Or,et al.  Three-dimensional distance field metamorphosis , 1998, TOGS.

[24]  D. Cohen-Or,et al.  Style-content separation by anisotropic part scales , 2010, ACM Trans. Graph..

[25]  Jorge Stolfi,et al.  Oriented Projective Geometry: A Framework for Geometric Computations , 2014 .

[26]  Dariu Gavrila,et al.  A mixed generative-discriminative framework for pedestrian classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Paul Merrell,et al.  Example-based model synthesis , 2007, SI3D.

[28]  Jovan Popović,et al.  Mesh-based inverse kinematics , 2005, SIGGRAPH 2005.

[29]  Leonidas J. Guibas,et al.  Probabilistic reasoning for assembly-based 3D modeling , 2011, SIGGRAPH 2011.

[30]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, SIGGRAPH 2008.

[31]  Ming Ouhyoung,et al.  On Visual Similarity Based 3D Model Retrieval , 2003, Comput. Graph. Forum.

[32]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[33]  Marc Alexa,et al.  As-rigid-as-possible shape interpolation , 2000, SIGGRAPH.

[34]  Dani Lischinski,et al.  Data-driven enhancement of facial attractiveness , 2008, ACM Trans. Graph..

[35]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[36]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[37]  Pat Hanrahan,et al.  Context-based search for 3D models , 2010, ACM Trans. Graph..

[38]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH 2005.

[39]  Aaron Hertzmann,et al.  Eurographics/ Acm Siggraph Symposium on Computer Animation (2006) Learning a Correlated Model of Identity and Pose-dependent Body Shape Variation for Real-time Synthesis , 2022 .

[40]  Sylvain Lefebvre,et al.  By-example synthesis of architectural textures , 2010, ACM Trans. Graph..

[41]  Jason Lawrence,et al.  AppWand: editing measured materials using appearance-driven optimization , 2007, SIGGRAPH 2007.

[42]  Szymon Rusinkiewicz,et al.  Modeling by example , 2004, SIGGRAPH 2004.

[43]  Marius Dan Leordeanu Spectral graph matching, learning, and inference for computer vision , 2010 .

[44]  Frédo Durand,et al.  Interactive editing and modeling of bidirectional texture functions , 2007, ACM Trans. Graph..

[45]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[46]  Yizhou Yu,et al.  Data-driven image color theme enhancement , 2010, ACM Trans. Graph..

[47]  David P. Dobkin,et al.  Multiresolution mesh morphing , 1999, SIGGRAPH.

[48]  Sung Yong Shin,et al.  On pixel-based texture synthesis by non-parametric sampling , 2006, Comput. Graph..

[49]  Volker Scholz,et al.  Editing Object Behaviour in Video Sequences , 2009, Comput. Graph. Forum.

[50]  Jiawen Chen,et al.  Texture transfer using geometry correlation , 2006, EGSR '06.

[51]  David Salesin,et al.  Automated generation of interactive 3D exploded view diagrams , 2008, ACM Trans. Graph..

[52]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[53]  Niloy J. Mitra,et al.  Abstraction of man-made shapes , 2009, SIGGRAPH 2009.

[54]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[55]  Maneesh Agrawala,et al.  The cartoon animation filter , 2006, ACM Trans. Graph..

[56]  D. Cohen-Or,et al.  Parametric reshaping of human bodies in images , 2010, ACM Trans. Graph..

[57]  Masayuki Nakajima,et al.  Spherical Wavelet Descriptors for Content-based 3D Model Retrieval , 2006, IEEE International Conference on Shape Modeling and Applications 2006 (SMI'06).

[58]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[59]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[60]  Yong Jae Lee,et al.  ShadowDraw: real-time user guidance for freehand drawing , 2011, ACM Trans. Graph..

[61]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[62]  M. Kilian,et al.  Geometric modeling in shape space , 2007, SIGGRAPH 2007.

[63]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[64]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[65]  Ning Xu,et al.  Videoshop: A new framework for spatio-temporal video editing in gradient domain , 2005, Graph. Model..

[66]  Leonard McMillan,et al.  Proscenium: a framework for spatio-temporal video editing , 2003, ACM Multimedia.

[67]  Alla Sheffer,et al.  Model Composition from Interchangeable Components , 2007, 15th Pacific Conference on Computer Graphics and Applications (PG'07).

[68]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  William A. Barrett,et al.  Object-based image editing , 2002, ACM Trans. Graph..

[70]  Hans-Peter Seidel,et al.  Interactive reflection editing , 2009, SIGGRAPH 2009.

[71]  Dinggang Shen,et al.  A General Fast Registration Framework by Learning Deformation–Appearance Correlation , 2012, IEEE Transactions on Image Processing.

[72]  D. Cohen-Or,et al.  Upright orientation of man-made objects , 2008, SIGGRAPH 2008.

[73]  Ghassan Hamarneh,et al.  A Survey on Shape Correspondence , 2011, Comput. Graph. Forum.

[74]  A. Torralba,et al.  Motion magnification , 2005, SIGGRAPH 2005.

[75]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[76]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[77]  Rama Chellappa,et al.  View independent human body pose estimation from a single perspective image , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[78]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[79]  Daniel Cohen-Or,et al.  Component‐wise Controllers for Structure‐Preserving Shape Manipulation , 2011, Comput. Graph. Forum.

[80]  Jian Sun,et al.  Video object cut and paste , 2005, SIGGRAPH 2005.

[81]  Donald P. Greenberg,et al.  Toward a psychophysically-based light reflection model for image synthesis , 2000, SIGGRAPH.

[82]  Gino van den Bergen Efficient Collision Detection of Complex Deformable Models using AABB Trees , 1997, J. Graphics, GPU, & Game Tools.

[83]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[84]  Daniel Cohen-Or,et al.  iWIRES: an analyze-and-edit approach to shape manipulation , 2009, ACM Trans. Graph..

[85]  Baining Guo,et al.  Context-aware textures , 2007, TOGS.

[86]  Stefano Soatto,et al.  Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images , 2008, ECCV.

[87]  Markus Gross,et al.  A system for retargeting of streaming video , 2009, SIGGRAPH 2009.

[88]  Sebastian Thrun,et al.  Video-based reconstruction of animatable human characters , 2010, ACM Trans. Graph..

[89]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[90]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[91]  David Vázquez,et al.  Learning appearance in virtual scenarios for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[92]  Aaron Hertzmann,et al.  Learning 3D mesh segmentation and labeling , 2010, SIGGRAPH 2010.

[93]  Chi-Keung Tang,et al.  Make it home: automatic optimization of furniture arrangement , 2011, ACM Trans. Graph..

[94]  Leonidas J. Guibas,et al.  Discovering structural regularity in 3D geometry , 2008, SIGGRAPH 2008.

[95]  Hanspeter Pfister,et al.  Face transfer with multilinear models , 2005, SIGGRAPH 2005.

[96]  Wilmot Li,et al.  Illustrating how mechanical assemblies work , 2010, CACM.

[97]  Thomas A. Funkhouser,et al.  Consistent segmentation of 3D models , 2009, Comput. Graph..

[98]  Hans-Peter Seidel,et al.  Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[99]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1998 .

[100]  Pat Hanrahan,et al.  Characterizing structural relationships in scenes using graph kernels , 2011, SIGGRAPH 2011.

[101]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[102]  Iasonas Kokkinos,et al.  Unsupervised Learning of Object Deformation Models , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[103]  Olga Sorkine-Hornung,et al.  Bounded biharmonic weights for real-time deformation , 2011, Commun. ACM.

[104]  Scott Schaefer,et al.  Image deformation using moving least squares , 2006, ACM Trans. Graph..

[105]  Leonidas J. Guibas,et al.  Shape Google: a computer vision approach to isometry invariant shape retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[106]  Jinxiang Chai,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, ACM Trans. Graph..

[107]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.