Reconstructing Storyline Graphs for Image Recommendation from Web Community Photos

In this paper, we investigate an approach for reconstructing storyline graphs from large-scale collections of Internet images, and optionally other side information such as friendship graphs. The storyline graphs can be an effective summary that visualizes various branching narrative structure of events or activities recurring across the input photo sets of a topic class. In order to explore further the usefulness of the storyline graphs, we leverage them to perform the image sequential prediction tasks, from which photo recommendation applications can benefit. We formulate the storyline reconstruction problem as an inference of sparse time-varying directed graphs, and develop an optimization algorithm that successfully addresses a number of key challenges of Web-scale problems, including global optimality, linear complexity, and easy parallelization. With experiments on more than 3.3 millions of images of 24 classes and user studies via Amazon Mechanical Turk, we show that the proposed algorithm improves other candidate methods for both storyline reconstruction and image prediction tasks.

[1]  Eric P. Xing,et al.  Modeling and Analysis of Dynamic Behaviors of Web Image Collections , 2010, ECCV.

[2]  Dafna Shahaf,et al.  Trains of thought: generating information maps , 2012, WWW.

[3]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.

[4]  Frank Hopfgartner,et al.  TV News Story Segmentation Based on Semantic Coherence and Content Similarity , 2010, MMM.

[5]  Kristen Grauman,et al.  Clues from the beaten path: Location estimation with bursty sequences of tourist photos , 2011, CVPR 2011.

[6]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[7]  Ben Taskar,et al.  Discovering Diverse and Salient Threads in Document Collections , 2012, EMNLP.

[8]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Tao Li,et al.  Generating Pictorial Storylines Via Minimum-Weight Connected Dominating Set Approximation in Multi-View Graphs , 2012, AAAI.

[10]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[11]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[12]  Eric P. Xing,et al.  Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Robert Michael Young,et al.  From linear story generation to branching story graphs , 2005, IEEE Computer Graphics and Applications.

[14]  N. S. Johnson,et al.  Remembrance of things parsed: Story structure and recall , 1977, Cognitive Psychology.

[15]  Nuria Oliver,et al.  Supporting personal photo storytelling for social albums , 2010, ACM Multimedia.

[16]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[17]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[18]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[19]  Alexei A. Efros,et al.  Image sequence geolocation with human travel priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Alexander J. Smola,et al.  Unified analysis of streaming news , 2011, WWW.

[21]  Wenjiang J. Fu Penalized Regressions: The Bridge versus the Lasso , 1998 .

[22]  Le Song,et al.  Time-Varying Dynamic Bayesian Networks , 2009, NIPS.

[23]  Dafna Shahaf,et al.  Connecting the dots between news articles , 2011, IJCAI 2011.

[24]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[25]  Michael Goesele,et al.  Scene Reconstruction and Visualization From Community Photo Collections , 2010, Proceedings of the IEEE.

[26]  Le Song,et al.  Estimating time-varying networks , 2008, ISMB 2008.