Mining and querying multimedia data

The emerging popularity of multimedia data, as digital representation of text, image, video and countless other milieus, with prodigious volumes and wild diversity, exhibits the phenomenal impact of modern technologies in reforming the way information is accessed, disseminated, digested and retained. This has iteratively ignited the data-driven perspective of research and development, to characterize perspicuous patterns, crystallize informative insights, and realize elevated experience for end-users, where innovations in a spectrum of areas of computer science, including databases, distributed systems, machine learning, vision, speech and natural languages, has been incessantly absorbed and integrated to elicit the extent and efficacy of contemporary and future multimedia applications and solutions. Under the theme of pattern mining and similarity querying, this manuscript presents a number of pieces of research concerning multimedia data, to address an array of practical tasks encompassing automatic annotation, outlier detection, community discovery, multi-modal retrieval and learning to rank, in their respective contexts including satellite image analysis, interne traffic surveillance, image bioinformatics, and Web search. A repertoire of extant and novel techniques pertaining to graph mining, clustering analysis, tensor decomposition and probabilistic graphical models has been developed or adapted, which satisfactorily met differing quality and efficiency requisites postulated by specific application settings, best exemplified by the 40 times speed-up in annotating satellite images and the up to 30% performance improvement in predicting web search user clicks, yet without the loss of generality to similar and related scenarios.

[1]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[2]  E. Frise,et al.  Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape , 2010, Molecular systems biology.

[3]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[4]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[5]  Christos Faloutsos,et al.  Tailoring click models to user goals , 2009, WSCD '09.

[6]  Hanchuan Peng,et al.  Automatic recognition and annotation of gene expression patterns of fly embryos , 2007, Bioinform..

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  Rasmus Bro,et al.  The N-way Toolbox for MATLAB , 2000 .

[9]  Amr Ahmed,et al.  On Tight Approximate Inference of the Logistic-Normal Topic Admixture Model , 2007 .

[10]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[11]  Chao Liu,et al.  BBM: bayesian browsing model from petabyte-scale data , 2009, KDD.

[12]  Philip S. Yu,et al.  Colibri: fast mining of large static and dynamic graphs , 2008, KDD.

[13]  Zheng Chen,et al.  A novel click model and its applications to online advertising , 2010, WSDM '10.

[14]  David Maxwell Chickering,et al.  Here or there: preference judgments for relevance , 2008 .

[15]  Jason Lee,et al.  The devil and packet trace anonymization , 2006, CCRV.

[16]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[17]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[18]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[19]  Shengcai Liao,et al.  Flickr group recommendation based on tensor decomposition , 2010, SIGIR.

[20]  Chao Liu,et al.  Click chain model in web search , 2009, WWW '09.

[21]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[22]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[23]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[24]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[25]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[26]  Filip Radlinski,et al.  Personalizing web search using long term browsing history , 2011, WSDM '11.

[27]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[28]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[29]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[30]  Donna Haverkamp,et al.  CASSIE: contextual analysis for spectral and spatial information extraction , 2009, Defense + Commercial Sensing.

[31]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[32]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[33]  K. Avrachenkov,et al.  Quick Detection of Top-k Personalized PageRank Lists , 2011, WAW.

[34]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[36]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations with Missing Data , 2010, SDM.

[37]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[38]  Jason Lee,et al.  A first look at modern enterprise traffic , 2005, IMC '05.

[39]  Christos Faloutsos,et al.  Enhanced max margin learning on multimodal data mining in a multimedia database , 2007, KDD '07.

[40]  Ciya Liao,et al.  A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine , 2010, WSDM '10.

[41]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[42]  Wei-Ying Ma,et al.  Optimizing web search using web click-through data , 2004, CIKM '04.

[43]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[44]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[45]  David Osumi-Sutherland,et al.  FlyBase: enhancing Drosophila Gene Ontology annotations , 2008, Nucleic Acids Res..

[46]  Ryen W. White,et al.  Mining the search trails of surfing crowds: identifying relevant websites from user activity , 2008, WWW.

[47]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[48]  E. Myers,et al.  Automatic image analysis for gene expression patterns of fly embryos , 2007, BMC Cell Biology.

[49]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[50]  Christos Faloutsos,et al.  GCap: Graph-based Automatic Image Captioning , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[51]  A. Childs,et al.  Exact sampling from nonattractive distributions using summary states. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[53]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[54]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[55]  Christos Faloutsos,et al.  Automatic mining of fruit fly embryo images , 2006, KDD '06.

[56]  Paul Over,et al.  TRECVID: Benchmarking the Effectivenss of Information Retrieval Tasks on Digital Video , 2003, CIVR.

[57]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[58]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[59]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[60]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[61]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[62]  Yuan Qi,et al.  Bayesian Conditional Random Fields , 2005, AISTATS.

[63]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[64]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[65]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[66]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[67]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[68]  Giovanni Cenci,et al.  Identification of Drosophila Mitotic Genes by Combining Co-Expression Analysis and RNA Interference , 2008, PLoS genetics.

[69]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[70]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[71]  Mohammad Al Hasan,et al.  SPARCL: Efficient and Effective Shape-Based Clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[72]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[73]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[74]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[75]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[76]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[77]  Philip S. Yu,et al.  Incremental tensor analysis: Theory and applications , 2008, TKDD.

[78]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[79]  Chao Liu,et al.  Post-rank reordering: resolving preference misalignments between search engines and end users , 2009, CIKM.

[80]  Tamara G. Kolda,et al.  MATLAB Tensor Toolbox , 2006 .

[81]  E. Xing,et al.  Bayesian Exponential Family Harmoniums , 2004 .

[82]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[83]  Filip Radlinski,et al.  Active exploration for learning rankings from clickthrough data , 2007, KDD '07.

[84]  Tamir Hazan,et al.  Non-negative tensor factorization with applications to statistics and computer vision , 2005, ICML.

[85]  S YuPhilip,et al.  Outlier detection for high dimensional data , 2001 .

[86]  Umeshwar Dayal,et al.  K-Harmonic Means - A Spatial Clustering Algorithm with Boosting , 2000, TSDM.

[87]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[88]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[89]  S. Lazebnik,et al.  An empirical Bayes approach to contextual region classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[91]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[92]  J. Sexton,et al.  Hamiltonian evolution for the hybrid Monte Carlo algorithm , 1992 .

[93]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[94]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[95]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[96]  Charalampos E. Tsourakakis MACH: Fast Randomized Tensor Decompositions , 2009, SDM.

[97]  Johan Håstad,et al.  Tensor Rank is NP-Complete , 1989, ICALP.

[98]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007 .

[99]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[100]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.

[102]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[103]  Christos Faloutsos,et al.  C-DEM: a multi-modal query system for Drosophila Embryo databases , 2008, Proc. VLDB Endow..

[104]  Ambuj K. Singh,et al.  ViVo: visual vocabulary construction for mining biomedical images , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[105]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[106]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[107]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[108]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[109]  Christos Faloutsos,et al.  Finding Clusters in subspaces of very large, multi-dimensional datasets , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[110]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[111]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[112]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[113]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[114]  Christos Faloutsos,et al.  QMAS: Querying, Mining and Summarization of Multi-modal Databases , 2010, 2010 IEEE International Conference on Data Mining.