DeepDive: A Data Management System for Automatic Knowledge Base Construction

[1]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, The VLDB Journal.

[2]  Christopher Ré,et al.  Materialization optimizations for feature selection workloads , 2014, SIGMOD Conference.

[3]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[4]  Ewen Callaway,et al.  Computers read the fossil record , 2015, Nature.

[5]  Christopher R'e,et al.  Caffe con Troll: Shallow Ideas to Speed Up Deep Learning , 2015, DanaC@SIGMOD.

[6]  Stephen J. Wright,et al.  An asynchronous parallel stochastic coordinate descent algorithm , 2013, J. Mach. Learn. Res..

[7]  C. Ré,et al.  A Machine Reading System for Assembling Synthetic Paleontological Databases , 2014, PloS one.

[8]  Amir Sadeghian,et al.  Feature Engineering for Knowledge Base Construction , 2014, IEEE Data Eng. Bull..

[9]  Daisy Zhe Wang,et al.  Knowledge expansion over probabilistic knowledge bases , 2014, SIGMOD Conference.

[10]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[11]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..

[12]  Rahul Gupta,et al.  Knowledge base completion via search-based question answering , 2014, WWW.

[13]  Christopher Ré,et al.  DimmWitted: A Study of Main-Memory Statistical Analytics , 2014, Proc. VLDB Endow..

[14]  Milos Nikolic,et al.  LINVIEW: incremental view maintenance for complex analytical queries , 2014, SIGMOD Conference.

[15]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[16]  Keshav Pingali,et al.  Deterministic galois: on-demand, portable and parameterless , 2014, ASPLOS.

[17]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[18]  Lise Getoor,et al.  Lifted graphical models: a survey , 2011, Machine Learning.

[19]  Matthew J. Johnson,et al.  Analyzing Hogwild Parallel Gaussian Gibbs Sampling , 2013, NIPS.

[20]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[21]  Eddie Kohler,et al.  Speedy transactions in multicore in-memory databases , 2013, SOSP.

[22]  Kevin Murphy From big data to big knowledge , 2013, CIKM.

[23]  Tim Kraska,et al.  MLI: An API for Distributed Machine Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[24]  Gustavo Alonso,et al.  Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..

[25]  Jignesh M. Patel,et al.  Design and Evaluation of Storage Organizations for Read-Optimized Main Memory Databases , 2013, Proc. VLDB Endow..

[26]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[27]  S. Finnegan,et al.  Climate Change and the Past, Present, and Future of Biotic Interactions , 2013, Science.

[28]  Christopher Ré,et al.  Understanding Tables in Context Using Standard NLP Toolkits , 2013, ACL.

[29]  Christopher Ré,et al.  Towards high-throughput gibbs sampling at scale: a study across storage managers , 2013, SIGMOD '13.

[30]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[31]  Michael Stonebraker,et al.  Intel "big data" science and technology center vision and execution plan , 2013, SGMD.

[32]  Dan Suciu,et al.  The dichotomy of probabilistic inference for unions of conjunctive queries , 2012, JACM.

[33]  Christos Boutsidis,et al.  Near-Optimal Coresets for Least-Squares Regression , 2012, IEEE Transactions on Information Theory.

[34]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[36]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[37]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[38]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[39]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[40]  Christopher Ré,et al.  Big Data versus the Crowd: Looking for Relationships in All the Right Places , 2012, ACL.

[41]  Christopher Ré,et al.  Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..

[42]  Alfons Kemper,et al.  Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems , 2012, Proc. VLDB Endow..

[43]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[44]  S. Peters,et al.  Climate change and the selective signature of the Late Ordovician mass extinction , 2012, Proceedings of the National Academy of Sciences.

[45]  Min Wang,et al.  Optimizing Statistical Information Extraction Programs over Evolving Text , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[46]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[47]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[48]  Andrew McCallum,et al.  Query-Aware MCMC , 2011, NIPS.

[49]  Peter J. Haas,et al.  The monte carlo database system: Stochastic analysis close to the data , 2011, TODS.

[50]  Anna Liu,et al.  Optimizing probabilistic query processing on continuous uncertain data , 2011, Proc. VLDB Endow..

[51]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[52]  Frederick Reiss,et al.  SystemT: A Declarative Information Extraction System , 2011, ACL.

[53]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[54]  Alexander T. Ihler,et al.  Multicore Gibbs Sampling in Dense, Unstructured Graphs , 2011, AISTATS.

[55]  Daisy Zhe Wang,et al.  Hybrid in-database inference for declarative information extraction , 2011, SIGMOD '11.

[56]  David F. Gleich,et al.  Tall and skinny QR factorizations in MapReduce architectures , 2011, MapReduce '11.

[57]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.

[58]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[59]  Zhiyuan Liu,et al.  PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing , 2011, TIST.

[60]  Christopher Ré,et al.  Queries and materialized views on probabilistic databases , 2011, J. Comput. Syst. Sci..

[61]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[62]  Web information extraction using Markov logic networks , 2011, WWW.

[63]  Lars Bergstrom,et al.  Measuring NUMA effects with the STREAM benchmark , 2011, ArXiv.

[64]  Srinivasan Parthasarathy,et al.  Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining , 2011, Proc. VLDB Endow..

[65]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[66]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[67]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[68]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[69]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[70]  J. Alroy The Shifting Balance of Diversity Among Major Marine Animal Groups , 2010, Science.

[71]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[72]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[73]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[74]  Pedro M. Domingos,et al.  Efficient Belief Propagation for Utility Maximization and Repeated Inference , 2010, AAAI.

[75]  Paul G. Brown,et al.  Overview of sciDB: large scale array storage, processing and analysis , 2010, SIGMOD Conference.

[76]  Peter J. Haas,et al.  Ricardo: integrating R and Hadoop , 2010, SIGMOD Conference.

[77]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[78]  Hoifung Poon,et al.  Joint Inference for Knowledge Extraction from Biomedical Literature , 2010, NAACL.

[79]  Andrew McCallum,et al.  Scalable probabilistic databases with factor graphs and MCMC , 2010, Proc. VLDB Endow..

[80]  Gunnar Rätsch,et al.  The SHOGUN Machine Learning Toolbox , 2010, J. Mach. Learn. Res..

[81]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[82]  Rada Chirkova,et al.  Materialized Views , 2012, Found. Trends Databases.

[83]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[84]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[85]  Lise Getoor,et al.  PrDB: managing and exploiting rich correlations in probabilistic databases , 2009, The VLDB Journal.

[86]  Jennifer Widom,et al.  Representing uncertain data: models, properties, and algorithms , 2009, The VLDB Journal.

[87]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[88]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[89]  Shirish Tatikonda,et al.  Mining Tree-Structured Data on Multicore Systems , 2009, Proc. VLDB Endow..

[90]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[91]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[92]  Andrew McCallum,et al.  Joint Inference for Natural Language Processing , 2009, CoNLL.

[93]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[94]  Gerhard Weikum,et al.  SOFIE: a self-organizing framework for information extraction , 2009, WWW '09.

[95]  Dan Olteanu,et al.  SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[96]  Gerhard Weikum,et al.  The YAGO-NAGA approach to knowledge discovery , 2009, SGMD.

[97]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[98]  Estevam R. Hruschka,et al.  Toward Never Ending Language Learning , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[99]  Christopher Ré,et al.  Probabilistic databases , 2011, SIGA.

[100]  Grigorios Tsoumakas,et al.  An adaptive personalized news dissemination system , 2009, Journal of Intelligent Information Systems.

[101]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[102]  Elizabeth L. Wilmer,et al.  Markov Chains and Mixing Times , 2008 .

[103]  Bin Yu,et al.  Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of boldmathell_1-regularized MLE , 2008, NIPS 2008.

[104]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[105]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[106]  Frederick Reiss,et al.  Main-memory scan sharing for multi-core CPUs , 2008, Proc. VLDB Endow..

[107]  Umut A. Acar,et al.  Adaptive inference on general graphical models , 2008, UAI.

[108]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[109]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[110]  Karen M. Layou,et al.  Phanerozoic Trends in the Global Diversity of Marine Invertebrates , 2008, Science.

[111]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[112]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[113]  Jun Yang,et al.  Efficient Information Extraction over Evolving Text Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[114]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[115]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[116]  Michael Garland,et al.  Efficient Sparse Matrix-Vector Multiplication on CUDA , 2008 .

[117]  Umut A. Acar,et al.  Adaptive Bayesian inference , 2007, NIPS 2007.

[118]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[119]  Dan Olteanu,et al.  Query language support for incomplete information in the MayBMS system , 2007, VLDB.

[120]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[121]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[122]  Pedro M. Domingos,et al.  Joint Inference in Information Extraction , 2007, AAAI.

[123]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[124]  Shirish Tatikonda,et al.  Toward terabyte pattern mining: an architecture-conscious solution , 2007, PPoPP.

[125]  Gene H. Golub,et al.  Numerical methods for solving linear least squares problems , 1965, Milestones in Matrix Computation.

[126]  J. Beirlant,et al.  Actuarial statistics with generalized linear mixed models , 2007 .

[127]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[128]  Srinivasan Parthasarathy,et al.  Adaptive Parallel Graph Mining for CMP Architectures , 2006, Sixth International Conference on Data Mining (ICDM'06).

[129]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[130]  J. W. Valentine,et al.  Out of the Tropics: Evolutionary Dynamics of the Latitudinal Diversity Gradient , 2006, Science.

[131]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[132]  Martin J. Wainwright,et al.  Log-determinant relaxation for approximate inference in discrete Markov random fields , 2006, IEEE Transactions on Signal Processing.

[133]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[134]  Padhraic Smyth,et al.  Scalable Parallel Topic Models , 2006 .

[135]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[136]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[137]  Eduardo F. D'Azevedo,et al.  Vectorized Sparse Matrix Multiply for Compressed Row Storage Format , 2005, International Conference on Computational Science.

[138]  K. Brøsen,et al.  Paroxetine, a Cytochrome P450 2D6 Inhibitor, Diminishes the Stereoselective O‐demethylation and Reduces the Hypoalgesic Effect of Tramadol , 2005, Clinical pharmacology and therapeutics.

[139]  Shanan E. Peters,et al.  A revised macroevolutionary history for Ordovician–Early Silurian crinoids , 2005, Paleobiology.

[140]  W. Kiessling Long-term relationships between ecological stability and biodiversity in Phanerozoic reefs , 2005, Nature.

[141]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[142]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[143]  Georg Gottlob,et al.  The Lixto data extraction project: back and forth between theory and practice , 2004, PODS.

[144]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[145]  Norman Fenton,et al.  Combining Evidence in Risk Analysis using Bayesian Networks , 2004 .

[146]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[147]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[148]  Ruoming Jin,et al.  Shared Memory Paraellization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. , 2002 .

[149]  Russ B. Altman,et al.  PharmGKB: the Pharmacogenetics Knowledge Base , 2002, Nucleic Acids Res..

[150]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[151]  Philip M. Novack-Gottshall,et al.  Effects of sampling standardization on estimates of Phanerozoic marine diversification , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[152]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[153]  M. Foote Origination and extinction components of taxonomic diversity: general problems , 2000, Paleobiology.

[154]  J M Adrain,et al.  An empirical assessment of taxic paleobiology. , 2000, Science.

[155]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[156]  Mohammed J. Zaki,et al.  Parallel classification for data mining on shared-memory multiprocessors , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[157]  J. Alroy Cope's rule and the dynamics of body mass evolution in North American fossil mammals. , 1998, Science.

[158]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[159]  J. Sepkoski,et al.  Rates of speciation in the fossil record. , 1998, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[160]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[161]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[162]  Arnold I. Miller,et al.  Calibrating the Ordovician Radiation of marine life: implications for Phanerozoic diversity trends , 1996, Paleobiology.

[163]  James Demmel,et al.  ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[164]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[165]  Simon Kasif,et al.  Logarithmic-Time Updates and Queries in Probabilistic Networks , 1995, UAI.

[166]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[167]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[168]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[169]  Mark Jerrum,et al.  Polynomial-Time Approximation Algorithms for the Ising Model , 1990, SIAM J. Comput..

[170]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[171]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[172]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[173]  Robert P. Goldman,et al.  From knowledge bases to decision models , 1992, The Knowledge Engineering Review.

[174]  Jeffrey F. Naughton,et al.  A stochastic approach for clustering in object bases , 1991, SIGMOD '91.

[175]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[176]  Christine T. Iwaskiw,et al.  Knowledge Base Compilation , 1989, IJCAI.

[177]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[178]  J. Sepkoski,et al.  A factor analytic description of the Phanerozoic marine fossil record , 1981, Paleobiology.

[179]  Azriel Rosenfeld,et al.  Picture languages: Formal models for picture recognition , 1979 .

[180]  R. Bambach,et al.  Species richness in marine benthic habitats through the Phanerozoic , 1977, Paleobiology.

[181]  David M. Raup,et al.  Species diversity in the Phanerozoic: a tabulation , 1976, Paleobiology.

[182]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[183]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..