Deep Submodular Functions

We start with an overview of a class of submodular functions called SCMMs (sums of concave composed with non-negative modular functions plus a final arbitrary modular). We then define a new class of submodular functions we call {\em deep submodular functions} or DSFs. We show that DSFs are a flexible parametric family of submodular functions that share many of the properties and advantages of deep neural networks (DNNs). DSFs can be motivated by considering a hierarchy of descriptive concepts over ground elements and where one wishes to allow submodular interaction throughout this hierarchy. Results in this paper show that DSFs constitute a strictly larger class of submodular functions than SCMMs. We show that, for any integer $k>0$, there are $k$-layer DSFs that cannot be represented by a $k'$-layer DSF for any $k'<k$. This implies that, like DNNs, there is a utility to depth, but unlike DNNs, the family of DSFs strictly increase with depth. Despite this, we show (using a "backpropagation" like method) that DSFs, even with arbitrarily large $k$, do not comprise all submodular functions. In offering the above results, we also define the notion of an antitone superdifferential of a concave function and show how this relates to submodular functions (in general), DSFs (in particular), negative second-order partial derivatives, continuous submodularity, and concave extensions. To further motivate our analysis, we provide various special case results from matroid theory, comparing DSFs with forms of matroid rank, in particular the laminar matroid. Lastly, we discuss strategies to learn DSFs, and define the classes of deep supermodular functions, deep difference of submodular functions, and deep multivariate submodular functions, and discuss where these can be useful in applications.

[1]  Jan Vondrák,et al.  Maximizing a Monotone Submodular Function Subject to a Matroid Constraint , 2011, SIAM J. Comput..

[2]  Andreas Krause,et al.  Efficient Minimization of Decomposable Submodular Functions , 2010, NIPS.

[3]  Brigitte Maier,et al.  Supermodularity And Complementarity , 2016 .

[4]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[5]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[6]  Ben Taskar,et al.  Near-Optimal MAP Inference for Determinantal Point Processes , 2012, NIPS.

[7]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[8]  Maria-Florina Balcan,et al.  Learning submodular functions , 2010, STOC '11.

[9]  Huy L. Nguyen,et al.  A New Framework for Distributed Submodular Maximization , 2015, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[11]  Jun Wang,et al.  Fast Graph Construction Using Auction Algorithm , 2012, UAI.

[12]  Matt Post,et al.  Explicit and Implicit Syntactic Features for Text Classification , 2013, ACL.

[13]  M. Studený,et al.  The Multiinformation Function as a Tool for Measuring Stochastic Dependence , 1998, Learning in Graphical Models.

[14]  Mirella Lapata,et al.  Unsupervised Semantic Role Induction with Graph Partitioning , 2011, EMNLP.

[15]  Rishabh K. Iyer,et al.  Submodular Hamming Metrics , 2015, NIPS.

[16]  Gholamreza Haffari,et al.  Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation , 2013, ACL.

[17]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[18]  Dan Klein,et al.  Coreference Semantics from Web Features , 2012, ACL.

[19]  Hermann Ney,et al.  Bag-of-visual-words models for adult image classification and filtering , 2008, 2008 19th International Conference on Pattern Recognition.

[20]  Jan Vondrák,et al.  Optimal Bounds on Approximation of Submodular and XOS Functions by Juntas , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[21]  Chandra Chekuri,et al.  Submodular function maximization via the multilinear relaxation and contention resolution schemes , 2011, STOC '11.

[22]  Vahab S. Mirrokni,et al.  Non-monotone submodular maximization under matroid and knapsack constraints , 2009, STOC '09.

[23]  Tao Mei,et al.  Contextual Bag-of-Words for Visual Categorization , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[25]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[26]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[27]  M. Golummc Algorithmic graph theory and perfect graphs , 1980 .

[28]  Jeff A. Bilmes,et al.  Deep Submodular Functions: Definitions and Learning , 2016, NIPS.

[29]  Pravesh Kothari,et al.  Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees , 2013, COLT.

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  Hui Lin An Application of the Submodular Principal Partition to Training Data Subset Selection , 2010 .

[32]  William H. Cunningham,et al.  Decomposition of submodular functions , 1983, Comb..

[33]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[34]  William Stafford Noble,et al.  Bipartite matching generalizations for peptide identification in tandem mass spectrometry , 2016, BCB.

[35]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[36]  R. Rockafellar Characterization of the subdifferentials of convex functions , 1966 .

[37]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[38]  Rafael García,et al.  Automatic Visual Bag-of-Words for Online Robot Navigation and Mapping , 2012, IEEE Transactions on Robotics.

[39]  Jeff A. Bilmes,et al.  On Bisubmodular Maximization , 2012, AISTATS.

[40]  Deniz Yuret,et al.  Instance Selection for Machine Translation using Feature Decay Algorithms , 2011, WMT@EMNLP.

[41]  Constantin P. Niculescu,et al.  Convex Functions and Their Applications: A Contemporary Approach , 2005 .

[42]  ChengXiang Zhai,et al.  Structural Parse Tree Features for Text Representation , 2013, 2013 IEEE Seventh International Conference on Semantic Computing.

[43]  Carina Silberer,et al.  UHD: Cross-Lingual Word Sense Disambiguation Using Multilingual Co-Occurrence Graphs , 2010, *SEMEVAL.

[44]  Kazuo Murota,et al.  Discrete convex analysis , 1998, Math. Program..

[45]  Zhen Zhang,et al.  A non-Shannon-type conditional inequality of information quantities , 1997, IEEE Trans. Inf. Theory.

[46]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[47]  Andreas Krause,et al.  Distributed Submodular Cover: Succinctly Summarizing Massive Data , 2015, NIPS.

[48]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[49]  Yong Ren,et al.  Sentiment Classification in Under-Resourced Languages Using Graph-Based Semi-Supervised Learning Methods , 2014, IEICE Trans. Inf. Syst..

[50]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Katrin Kirchhoff,et al.  Graph-based Learning for Statistical Machine Translation , 2009, NAACL.

[52]  Hui Lin,et al.  On fast approximate submodular minimization , 2011, NIPS.

[53]  David Filliat,et al.  A visual bag of words method for interactive qualitative localization and mapping , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[54]  R. Rockafellar,et al.  On the maximal monotonicity of subdifferential mappings. , 1970 .

[55]  Pierre Tirilly,et al.  Language modeling for bag-of-visual words image categorization , 2008, CIVR '08.

[56]  Suvrit Sra,et al.  Reflection methods for user-friendly submodular optimization , 2013, NIPS.

[57]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[58]  Krzysztof Choromanski,et al.  Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering , 2014, ArXiv.

[59]  Vahab S. Mirrokni,et al.  Optimal marketing strategies over social networks , 2008, WWW.

[60]  G. Nemhauser,et al.  On the Uncapacitated Location Problem , 1977 .

[61]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[62]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[63]  Vitaly Feldman,et al.  Optimal bounds on approximation of submodular and XOS functions by juntas , 2014, ITA.

[64]  Mirella Lapata,et al.  Graph Connectivity Measures for Unsupervised Word Sense Disambiguation , 2007, IJCAI.

[65]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[66]  Thorsten Joachims,et al.  Large-Margin Learning of Submodular Summarization Models , 2012, EACL.

[67]  P. Stobbe,et al.  Convex Analysis for Minimizing and Learning Submodular Set Functions , 2013 .

[68]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[69]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[70]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[72]  Huy L. Nguyen,et al.  The Power of Randomization: Distributed Submodular Maximization on Massive Datasets , 2015, ICML.

[73]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[74]  Satoru Iwata,et al.  Minimizing a Submodular Function Arising From a Concave Function , 1999, Discret. Appl. Math..

[75]  William H. Cunningham,et al.  Optimal attack and reinforcement of a network , 1985, JACM.

[76]  Satoru Iwata,et al.  Computational geometric approach to submodular function minimization for multiclass queueing systems , 2012 .

[77]  Yuji Matsumoto,et al.  Using the Mutual k-Nearest Neighbor Graphs for Semi-supervised Classification on Natural Language Data , 2011, CoNLL.

[78]  Maria-Florina Balcan,et al.  Submodular Functions: Learnability, Structure, and Optimization , 2010, SIAM J. Comput..

[79]  Dekang Lin,et al.  Creating Robust Supervised Classifiers via Web-Scale N-Gram Data , 2010, ACL.

[80]  Yusuke Shinohara A submodular optimization approach to sentence set selection , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[81]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[82]  Kaizhu Huang,et al.  Fast kNN Graph Construction with Locality Sensitive Hashing , 2013, ECML/PKDD.

[83]  Zhi-Hua Zhou,et al.  Distributional features for text categorization , 2006 .

[84]  Guodong Zhou,et al.  Exploring syntactic structured features over parse trees for relation extraction using kernel methods , 2008, Inf. Process. Manag..

[85]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[86]  Vahab S. Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[87]  Rishabh Iyer Submodular Point Processes , 2014 .

[88]  P. Bisegna,et al.  A potential theory for monotone multivalued operators , 1993 .

[89]  Baobao Chang,et al.  Max-Margin Tensor Neural Network for Chinese Word Segmentation , 2014, ACL.

[90]  Avinatan Hassidim,et al.  Submodular Optimization under Noise , 2016, COLT.

[91]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[92]  H. B. McMahan,et al.  Robust Submodular Observation Selection , 2008 .

[93]  藤重 悟 Submodular functions and optimization , 1991 .

[94]  Jennifer Gillenwater Approximate inference for determinantal point processes , 2014 .

[95]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[96]  Jeff A. Bilmes,et al.  Active Semi-Supervised Learning using Submodular Functions , 2011, UAI.

[97]  Hui Lin,et al.  Word Alignment via Submodular Maximization over Matroids , 2011, ACL.

[98]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[99]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[100]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[101]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[102]  Jeff A. Bilmes,et al.  Submodularity beyond submodular energies: Coupling edges in graph cuts , 2011, CVPR 2011.

[103]  Jeff A. Bilmes,et al.  Submodularity for Data Selection in Machine Translation , 2014, EMNLP.

[104]  Jan Vondrák,et al.  Submodularity in Combinatorial Optimization , 2007 .

[105]  Vahab Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2007, FOCS 2007.

[106]  Qun Liu,et al.  A Novel Graph-based Compact Representation of Word Alignment , 2013, ACL.

[107]  Huaiyu Zhu,et al.  Information geometric measurements of generalisation , 1995 .

[108]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[109]  Jeff A. Bilmes,et al.  Q-Clustering , 2005, NIPS.

[110]  Anna Huber,et al.  Towards Minimizing k-Submodular Functions , 2012, ISCO.

[111]  R. Möhring Algorithmic graph theory and perfect graphs , 1986 .

[112]  Jeff A. Bilmes,et al.  Unsupervised submodular subset selection for speech data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[113]  Said Salhi,et al.  Discrete Location Theory , 1991 .

[114]  Shibamouli Lahiri,et al.  Using N-gram and Word Network Features for Native Language Identification , 2013, BEA@NAACL-HLT.

[115]  Robert E. Kass,et al.  Canonical Parameterizations and Zero Parameter‐Effects Curvature , 1984 .

[116]  Kristen Grauman,et al.  Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[117]  Sergei Vassilvitskii,et al.  Fast greedy algorithms in mapreduce and streaming , 2013, SPAA.

[118]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[119]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[120]  Rishabh K. Iyer,et al.  Algorithms for Approximate Minimization of the Difference Between Submodular Functions, with Applications , 2012, UAI.

[121]  László Lovász,et al.  Matroid matching and some applications , 1980, J. Comb. Theory, Ser. B.

[122]  G. Owen Multilinear Extensions of Games , 1972 .

[123]  Rada Mihalcea,et al.  Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.

[124]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[125]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[126]  Peter L. Hammer,et al.  Boolean Functions - Theory, Algorithms, and Applications , 2011, Encyclopedia of mathematics and its applications.

[127]  Vladimir Kolmogorov,et al.  Submodularity on a Tree: Unifying $L^\natural$ -Convex and Bisubmodular Functions , 2010, MFCS.

[128]  Jan Vondrák,et al.  Fast algorithms for maximizing submodular functions , 2014, SODA.

[129]  George G. Lorentz,et al.  An Inequality for Rearrangements , 1953 .

[130]  Lisa Fleischer,et al.  Submodular Approximation: Sampling-based Algorithms and Lower Bounds , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[131]  Florentina Hristea,et al.  Unsupervised word sense disambiguation with N-gram features , 2012, Artificial Intelligence Review.

[132]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[133]  Donald M. Topkis,et al.  Minimizing a Submodular Function on a Lattice , 1978, Oper. Res..

[134]  Rishabh K. Iyer,et al.  Fast Semidifferential-based Submodular Function Optimization , 2013, ICML.

[135]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[136]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[137]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[138]  Jeff A. Bilmes,et al.  Deep Submodular Functions : Definitions & Learning , 2016 .

[139]  Michael I. Jordan,et al.  On the Convergence Rate of Decomposable Submodular Function Minimization , 2014, NIPS.

[140]  Kent Quanrud,et al.  Streaming Algorithms for Submodular Function Maximization , 2015, ICALP.

[141]  P. Samuelson Complementarity-An Essay on the 40th Anniversary of the Hicks-Allen Revolution in Demand Theory , 1974 .

[142]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[143]  Jeff A. Bilmes,et al.  Submodular feature selection for high-dimensional acoustic score spaces , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[144]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[145]  P. Samuelson,et al.  Foundations of Economic Analysis. , 1948 .

[146]  Rishabh K. Iyer,et al.  Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints , 2013, NIPS.

[147]  Alkis Gotovos,et al.  Sampling from Probabilistic Submodular Models , 2015, NIPS.

[148]  Rudolf Auspitz,et al.  Untersuchungen über die Theorie des Preises , 2022 .

[149]  William H. Cunningham,et al.  Testing membership in matroid polyhedra , 1984, J. Comb. Theory, Ser. B.

[150]  Vahab S. Mirrokni,et al.  Approximating submodular functions everywhere , 2009, SODA.

[151]  F. Bruce Shepherd,et al.  Multi-Agent and Multivariate Submodular Optimization , 2016, ArXiv.