Exploiting Structure in combinatorial Problems with Applications in Computational Sustainability

Combinatorial decision and optimization problems are at the core of many tasks with practical importance in areas as diverse as planning and scheduling, supply chain management, hardware and software verification, electronic commerce, and computational biology. Another important source of combinatorial problems is the newly emerging field of computational sustainability, which addresses decision-making aimed at balancing social, economic and environmental needs to guarantee the long-term prosperity of life on our planet. This dissertation studies different forms of problem structure that can be exploited in developing scalable algorithmic techniques capable of addressing large real-world combinatorial problems. There are three major contributions in this work: (1) We study a form of hidden problem structure called a backdoor, a set of key decision variables that captures the combinatorics of the problem, and reveal that many real-world problems encoded as Boolean satisfiability or mixed-integer linear programs contain small backdoors. We study backdoors both theoretically and empirically and characterize important tradeoffs between the computational complexity of finding backdoors and their effectiveness in capturing problem structure succinctly. (2) We contribute several domain-specific mathematical formulations and algorithmic techniques that exploit specific aspects of problem structure arising in budget-constrained conservation planning for wildlife habitat connectivity. Our solution approaches scale to real-world conservation settings and provide important decision-support tools for cost–benefit analysis. (3) We propose a new survey-planning methodology to assist in the construction of accurate predictive models, which are especially relevant in sustainability areas such as species-distribution prediction and climate-change impact studies. In particular, we design a technique that takes advantage of submodularity, a structural property of the function to be optimized, and results in a polynomial-time procedure with approximation guarantees.

[1]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[2]  Yuhong Guo,et al.  Active Instance Sampling via Matrix Partition , 2010, NIPS.

[3]  Reed F. Noss,et al.  A Regional Landscape Approach to Maintain Diversity , 1983 .

[4]  Ashish Kapoor,et al.  Active learning for large multi-class problems , 2009, CVPR.

[5]  Thorsten Koch,et al.  Konrad-zuse-zentrum F ¨ Ur Informationstechnik Berlin Miplib 2003 , 2022 .

[6]  Bistra N. Dilkina,et al.  Solving Connected Subgraph Problems in Wildlife Conservation , 2010, CPAIOR.

[7]  Rina Dechter,et al.  Constraint Processing , 1995, Lecture Notes in Computer Science.

[8]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[9]  Paul Erdös,et al.  On random graphs, I , 1959 .

[10]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[11]  Kathleen E. Franzreb,et al.  The Red-cockaded Woodpecker: Surviving in a Fire-Maintained Ecosystem , 2002 .

[12]  E. T. S. Informática On 2-SAT and Renamable Horn , 2000 .

[13]  George B. Dantzig,et al.  Solution of a Large-Scale Traveling-Salesman Problem , 1954, Oper. Res..

[14]  Eugene C. Freuder A sufficient condition for backtrack-bounded search , 1985, JACM.

[15]  Eugene C. Freuder A Sufficient Condition for Backtrack-Free Search , 1982, JACM.

[16]  Chaitanya Swamy,et al.  Approximation algorithms for 2-stage stochastic optimization problems , 2006, SIGA.

[17]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[18]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[19]  David B. Shmoys,et al.  Maximizing the Spread of Cascades Using Network Design , 2010, UAI.

[20]  John N. Hooker,et al.  Detecting Embedded Horn Structure in Propositional Logic , 1992, Inf. Process. Lett..

[21]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[22]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[23]  Henry A. Kautz,et al.  Towards Understanding and Harnessing the Potential of Clause Learning , 2004, J. Artif. Intell. Res..

[24]  Ashish Sabharwal,et al.  The Impact of Network Topology on Pure Nash Equilibria in Graphical Games , 2007, AAAI.

[25]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[26]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[27]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[28]  Matteo Fischetti,et al.  Solving the Prize-Collecting Steiner Tree Problem to Optimality , 2005, ALENEX/ANALCO.

[29]  Inês Lynce,et al.  Hidden structure in unsatisfiable random 3-SAT: an empirical study , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[30]  Ashish Sabharwal,et al.  Tradeoffs in the Complexity of Backdoor Detection , 2007, CP.

[31]  Theodoros Damoulas,et al.  Pattern Recognition , 1998, Encyclopedia of Information Systems.

[32]  Adnan Darwiche,et al.  RSat 2.0: SAT Solver Description , 2006 .

[33]  Bart Selman,et al.  Boosting Combinatorial Search Through Randomization , 1998, AAAI/IAAI.

[34]  Ashish Sabharwal,et al.  Connections in Networks: Hardness of Feasibility Versus Optimality , 2007, CPAIOR.

[35]  Daniel R. Dooly,et al.  Decomposition algorithms for the maximum-weight connected graph problem , 1998 .

[36]  Michael Kaufmann,et al.  Computation of Renameable Horn Backdoors , 2008, SAT.

[37]  Leslie G. Valiant,et al.  The Complexity of Enumeration and Reliability Problems , 1979, SIAM J. Comput..

[38]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[39]  Pascal Van Hentenryck,et al.  An Efficient Arc Consistency Algorithm for a Class of CSP Problems , 1991, IJCAI.

[40]  Philip N. Klein,et al.  Node-Weighted Steiner Tree and Group Steiner Tree in Planar Graphs , 2009, ICALP.

[41]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[42]  Asaf Levin A better approximation algorithm for the budget prize collecting tree problem , 2004, Oper. Res. Lett..

[43]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[44]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[45]  Jörg Hoffmann,et al.  Structure and Problem Hardness: Goal Asymmetry and DPLL Proofs in SAT-Based Planning , 2006, Log. Methods Comput. Sci..

[46]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[47]  M. D. Devine,et al.  A Modified Benders' Partitioning Algorithm for Mixed Integer Programming , 1977 .

[48]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[49]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[50]  Michael R. Fellows,et al.  Parameterized Complexity , 1998 .

[51]  Eugene C. Freuder Complexity of K-Tree Structured Constraint Satisfaction Problems , 1990, AAAI.

[52]  Bengt Aspvall,et al.  Recognizing Disguised NR(1) Instances of the Satisfiability Problem , 1980, J. Algorithms.

[53]  Hubie Chen,et al.  Beyond Hypertree Width: Decomposition Methods Without Decompositions , 2005, CP.

[54]  Joao Marques-Silva,et al.  GRASP-A new search algorithm for satisfiability , 1996, Proceedings of International Conference on Computer Aided Design.

[55]  Jörg Flum,et al.  Parameterized Complexity Theory , 2006, Texts in Theoretical Computer Science. An EATCS Series.

[56]  Stefan Szeider,et al.  Backdoor Sets for DLL Subsolvers , 2005, Journal of Automated Reasoning.

[57]  Steve Kelling,et al.  Data-Intensive Science: A New Paradigm for Biodiversity Studies , 2009 .

[58]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[59]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[60]  Naomi Nishimura,et al.  Detecting Backdoor Sets with Respect to Horn and Binary Clauses , 2004, SAT.

[61]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[62]  Yuval Rabani,et al.  Approximation Algorithms for Constrained Node Weighted Steiner Tree Problems , 2007, SIAM J. Comput..

[63]  Ashish Sabharwal,et al.  An Empirical Study of Optimization for Maximizing Diffusion in Networks , 2010, CP.

[64]  Jacques F. Benders,et al.  Partitioning procedures for solving mixed-variables programming problems , 2005, Comput. Manag. Sci..

[65]  John F. Lehmkuhl,et al.  Landscape permeability for large carnivores in Washington: a geographic information system weighted-distance and least-cost corridor assessment. , 2002 .

[66]  Liana N. Joseph,et al.  Optimal Allocation of Resources among Threatened Species: a Project Prioritization Protocol , 2009, Conservation biology : the journal of the Society for Conservation Biology.

[67]  Barry O'Sullivan,et al.  Almost 2-SAT is Fixed-Parameter Tractable , 2008, J. Comput. Syst. Sci..

[68]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[69]  William J. Cook,et al.  Solution of a Large-Scale Traveling-Salesman Problem , 1954, 50 Years of Integer Programming.

[70]  Toby Walsh,et al.  Backbones and Backdoors in Satisfiability , 2005, AAAI.

[71]  Wolfgang Küchlin,et al.  Formal methods for the validation of automotive product configuration data , 2003, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[72]  R. Ravi,et al.  Improving minimum cost spanning trees by upgrading nodes , 1998 .

[73]  Sanjiv Kapoor,et al.  Bounded-Diameter Minimum-Cost Graph Problems , 2007, Theory of Computing Systems.

[74]  Hilary Putnam,et al.  A Computing Procedure for Quantification Theory , 1960, JACM.

[75]  Joseph C. Culberson,et al.  Camouflaging independent sets in quasi-random graphs , 1993, Cliques, Coloring, and Satisfiability.

[76]  Noga Alon,et al.  Algorithmic construction of sets for k-restrictions , 2006, TALG.

[77]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[78]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[79]  H. Andrén,et al.  Effects of habitat fragmentation on birds and mammals in landscapes with different proportions of suitable habitat: a review , 1994 .

[80]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[81]  Andreas Krause,et al.  Near-optimal Observation Selection using Submodular Functions , 2007, AAAI.

[82]  R. May,et al.  Infectious Diseases of Humans: Dynamics and Control , 1991, Annals of Internal Medicine.

[83]  Michael L. Littman,et al.  Graphical Models for Game Theory , 2001, UAI.

[84]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[85]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas , 1979, Inf. Process. Lett..

[86]  Adnan Darwiche,et al.  On the Power of Clause-Learning SAT Solvers with Restarts , 2009, CP.

[87]  Jianhua Xu,et al.  Robust placement of sensors in dynamic water distribution systems , 2010, Eur. J. Oper. Res..

[88]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[89]  Alexander Shapiro,et al.  The Sample Average Approximation Method Applied to Stochastic Routing Problems: A Computational Study , 2003, Comput. Optim. Appl..

[90]  Thomas Stützle,et al.  SATLIB: An Online Resource for Research on SAT , 2000 .

[91]  O. Ovaskainen,et al.  Spatially structured metapopulation models: global and local assessment of metapopulation capacity. , 2001, Theoretical population biology.

[92]  Yi Zhen,et al.  SED: supervised experimental design and its application to text classification , 2010, SIGIR.

[93]  Sartaj Sahni,et al.  Network upgrading problems , 1995, Networks.

[94]  Thomas L. Magnanti,et al.  Network Design and Transportation Planning: Models and Algorithms , 1984, Transp. Sci..

[95]  Ashish Sabharwal,et al.  Connections in Networks: A Hybrid Approach , 2008, CPAIOR.

[96]  Chu Min Li,et al.  Heuristics Based on Unit Propagation for Satisfiability Problems , 1997, IJCAI.

[97]  S. Cushman,et al.  Use of Empirically Derived Source‐Destination Models to Map Regional Conservation Corridors , 2009, Conservation biology : the journal of the Society for Conservation Biology.

[98]  Alysson M. Costa,et al.  Models and branch-and-cut algorithms for the Steiner tree problem with revenues, budget and hop constraints , 2009 .

[99]  L. Fahrig,et al.  Connectivity is a vital element of landscape structure , 1993 .

[100]  Alysson M. Costa,et al.  Steiner Tree Problems With Profits , 2006 .

[101]  Stefan Wrobel,et al.  Multi-class Ensemble-Based Active Learning , 2006, ECML.

[102]  Bistra N. Dilkina,et al.  Upgrading Shortest Paths in Networks , 2011, CPAIOR.

[103]  Gregory D. Hayward,et al.  Viability analysis in biological evaluations: Concepts of population viability analysis, biological population, and ecological scale , 1994 .

[104]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[105]  Sharad Malik,et al.  Efficient conflict driven learning in a Boolean satisfiability solver , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[106]  Otso Ovaskainen,et al.  The metapopulation capacity of a fragmented landscape , 2000, Nature.

[107]  Theodoros Damoulas,et al.  AL 2 : Learning for Active Learning , 2011 .

[108]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[109]  Martin E. Dyer,et al.  Computational complexity of stochastic programming problems , 2006, Math. Program..

[110]  Robert E. Bixby,et al.  Solving Real-World Linear Programs: A Decade and More of Progress , 2002, Oper. Res..

[111]  J. Gamarra,et al.  Metapopulation Ecology , 2007 .

[112]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[113]  Lawrence J. Henschen,et al.  Unit Refutations and Horn Sets , 1974, JACM.

[114]  Larry D. Harris,et al.  Nodes, networks, and MUMs: Preserving diversity at all scales , 1986 .

[115]  G. Brundtland,et al.  Our common future , 1987 .

[116]  Lakhdar Sais,et al.  Computing Horn Strong Backdoor Sets Thanks to Local Search , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[117]  Richard G. Pearson,et al.  Arctic greening under future climate change predicted using machine learning , 2011 .

[118]  Paul C. Paquet,et al.  Conservation Biology and Carnivore Conservation in the Rocky Mountains , 1996 .

[119]  Rina Dechter,et al.  Enhancement Schemes for Constraint Processing: Backjumping, Learning, and Cutset Decomposition , 1990, Artif. Intell..

[120]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[121]  Marko Samer,et al.  Constraint satisfaction with bounded treewidth revisited , 2010, J. Comput. Syst. Sci..

[122]  C. Gomes Computational Sustainability: Computational methods for a sustainable environment, economy, and society , 2009 .

[123]  U. Feige,et al.  Maximizing Non-monotone Submodular Functions , 2011 .

[124]  WESLEY M. HOCHACHKA,et al.  Data-Mining Discovery of Pattern and Process in Ecological Systems , 2007 .

[125]  Harry R. Lewis,et al.  Renaming a Set of Clauses as a Horn Set , 1978, JACM.

[126]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[127]  Jean H. Gallier,et al.  Linear-Time Algorithms for Testing the Satisfiability of Propositional Horn Formulae , 1984, J. Log. Program..

[128]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[129]  Yuri Malitsky,et al.  Backdoors to Combinatorial Optimization: Feasibility and Optimality , 2009, CPAIOR.

[130]  Peter van Beek,et al.  On the minimality and global consistency of row-convex constraint networks , 1995, JACM.

[131]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[132]  S. Polasky,et al.  Integrating economic costs into conservation planning. , 2006, Trends in ecology & evolution.

[133]  Eve McDonald-Madden,et al.  Making robust decisions for conservation with restricted money and knowledge , 2008 .

[134]  Bart Selman,et al.  Backdoors To Typical Case Complexity , 2003, IJCAI.

[135]  Larry B. Crowder,et al.  An individual-based, spatially-explicit simulation model of the population dynamics of the endangered red-cockaded woodpecker, Picoides borealis , 1998 .

[136]  Ashish Sabharwal,et al.  Backdoors in the Context of Learning , 2009, SAT.

[137]  Rina Dechter,et al.  Network-based heuristics for constraint satisfaction problems , 1988 .

[138]  D. Fink,et al.  Spatiotemporal exploratory models for broad-scale survey data. , 2010, Ecological applications : a publication of the Ecological Society of America.

[139]  Bart Selman,et al.  On the connections between backdoors, restarts, and heavy-tailedness in combinatorial search , 2003 .

[140]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.