Distance-based positive and unlabeled learning for ranking

Learning to rank -- producing a ranked list of items specific to a query and with respect to a set of supervisory items -- is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes a good ranking is available. Instead, we have a collection of representations and supervisory information consisting of a (target item, interesting items set) pair. We demonstrate -- analytically, in simulation, and in real data examples -- that learning to rank via combining representations using an integer linear program is effective when the supervision is as light as "these few items are similar to your item of interest." While this nomination task is of general interest, for specificity we present our methodology from the perspective of vertex nomination in graphs. The methodology described herein is model agnostic.

[1]  Casey M. Schneider-Mizell,et al.  Quantitative neuroanatomy for connectomics in Drosophila , 2015, bioRxiv.

[2]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[3]  Carey E. Priebe,et al.  On a two-truths phenomenon in spectral graph clustering , 2018, Proceedings of the National Academy of Sciences.

[4]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[5]  Ronald Fagin,et al.  Combining fuzzy information: an overview , 2002, SGMD.

[6]  Robin Lougee,et al.  The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research community , 2003, IBM J. Res. Dev..

[7]  Anders M. Dale,et al.  An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest , 2006, NeuroImage.

[8]  Yoshinori Aso,et al.  Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila , 2018, Nature Communications.

[9]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[10]  Carey E. Priebe,et al.  Bayesian Vertex Nomination Using Content and Context , 2015 .

[11]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[12]  Tobias Achterberg,et al.  SCIP: solving constraint integer programs , 2009, Math. Program. Comput..

[13]  Carey E. Priebe,et al.  A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs , 2011, 1108.2228.

[14]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[15]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[16]  Carey E. Priebe,et al.  On Consistent Vertex Nomination Schemes , 2017, J. Mach. Learn. Res..

[17]  Vince D. Calhoun,et al.  A High-Throughput Pipeline Identifies Robust Connectomes But Troublesome Variability , 2017, bioRxiv.

[18]  Feng Li,et al.  The complete connectome of a learning and memory centre in an insect brain , 2017, Nature.

[19]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[20]  Henry Pao,et al.  Vertex nomination: The canonical sampling and the extended spectral nomination schemes , 2018, Comput. Stat. Data Anal..

[21]  Carey E. Priebe,et al.  On spectral embedding performance and elucidating network structure in stochastic blockmodel graphs , 2018, Network Science.

[22]  Carey E. Priebe,et al.  Vertex Nomination, Consistent Estimation, and Adversarial Modification , 2019, Electronic Journal of Statistics.

[23]  Carey E. Priebe,et al.  Statistical Inference on Random Dot Product Graphs: a Survey , 2017, J. Mach. Learn. Res..

[24]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[25]  Carey E. Priebe,et al.  Vertex Nomination via Content and Context , 2012, ArXiv.

[26]  C. Priebe,et al.  Vertex nomination via attributed random dot product graphs , 2011 .

[27]  Carey E. Priebe,et al.  Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity , 2020, 2004.12908.

[28]  Carey E. Priebe,et al.  A Comparison of Graph Embedding Methods for Vertex Nomination , 2012, 2012 11th International Conference on Machine Learning and Applications.

[29]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[30]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[31]  Harris Wu,et al.  Evaluating Web-based Question Answering Systems , 2002, LREC.

[32]  Kristin Branson,et al.  A multilevel multimodal circuit enhances action selection in Drosophila , 2015, Nature.

[33]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[34]  C. E. Priebe,et al.  Vertex nomination schemes for membership prediction , 2013, 1312.2638.

[35]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[36]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .