Deep Learning: Systems and Responsibility

Deep learning enables numerous applications across diverse areas. Data systems researchers are also increasingly experimenting with deep learning to enhance data systems performance. We present a tutorial on deep learning, highlighting the data systems nature of neural networks as well as research opportunities for advancements through data management techniques. We focus on three critical aspects: (1) classic design tradeoffs in neural networks which we can enrich through a systems and data management perspective, e.g., thinking critically about storage, data movement, and computation; (2) classic design problems in data systems which we can reconsider with neural networks as a viable design option, e.g., to replace or help system components that make complex decisions such as database optimizers; and (3) essential considerations for responsible application of neural networks in critical human-facing problems in society and how these also link to data management and performance considerations. While these are seemingly a diverse set of rich topics, they are strongly interconnected through data management, and their combination offers rich opportunities for future research.

[1]  Wojciech Samek,et al.  Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond , 2020, ArXiv.

[2]  Neil Band MemFlow: Memory-Aware Distributed Deep Learning , 2020, SIGMOD Conference.

[3]  Nathan Srebro,et al.  From Fair Decision Making To Social Equality , 2018, FAT.

[4]  Chenchen Liu,et al.  How convolutional neural networks see the world - A survey of convolutional neural network visualization methods , 2018, Math. Found. Comput..

[5]  Yoav Goldberg,et al.  Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[6]  Luca Benini,et al.  Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers , 2019, MLSys.

[7]  William J. Dally,et al.  Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.

[8]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[9]  Bill Howe,et al.  Nutritional Labels for Data and Models , 2019, IEEE Data Eng. Bull..

[10]  Abdul Quamar,et al.  Natural Language Querying of Complex Business Intelligence Queries , 2019, SIGMOD Conference.

[11]  Emily Denton,et al.  Towards a critical race methodology in algorithmic fairness , 2019, FAT*.

[12]  Michael Inouye,et al.  Green Algorithms: Quantifying the Carbon Footprint of Computation , 2020, Advanced science.

[13]  Cuntai Guan,et al.  A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[15]  Serge Abiteboul,et al.  Data Responsibly: Fairness, Neutrality and Transparency in Data Analysis , 2016, EDBT.

[16]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[17]  Guy Lemieux,et al.  Full Deep Neural Network Training On A Pruned Weight Budget , 2018, MLSys.

[18]  Jaime S. Cardoso,et al.  Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[19]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[20]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[21]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[23]  Michael Cogswell,et al.  Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks , 2015, ArXiv.

[24]  Xia Hu,et al.  Fairness in Deep Learning: A Computational Perspective , 2019, IEEE Intelligent Systems.

[25]  David Li,et al.  Design Continuums and the Path Toward Self-Designing Key-Value Stores that Know and Learn , 2019, CIDR.

[26]  Miguel Á. Carreira-Perpiñán,et al.  "Learning-Compression" Algorithms for Neural Net Pruning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Michael Skirpan,et al.  The Authority of "Fair" in Machine Learning , 2017, arXiv.org.

[28]  Nick Koudas,et al.  Deep Learning Models for Selectivity Estimation of Multi-Attribute Queries , 2020, SIGMOD Conference.

[29]  Julia Stoyanovich,et al.  FairPrep: Promoting Data to a First-Class Citizen in Studies on Fairness-Enhancing Interventions , 2019, EDBT.

[30]  Natalia Gimelshein,et al.  vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[31]  Swagath Venkataramani,et al.  Accurate and Efficient 2-bit Quantized Neural Networks , 2019, MLSys.

[32]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[33]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[34]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[35]  Babak Salimi,et al.  MobilityMirror: Bias-Adjusted Transportation Datasets , 2018, BiDU@VLDB.

[36]  Krishna P. Gummadi,et al.  A Moral Framework for Understanding of Fair ML through Economic Models of Equality of Opportunity , 2018, ArXiv.

[37]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[38]  Abdul Wasay,et al.  The Periodic Table of Data Structures , 2018, IEEE Data Eng. Bull..

[39]  Evaggelia Pitoura,et al.  Diversity in Big Data: A Review , 2017, Big Data.

[40]  Donald C. Wunsch,et al.  Neural network explanation using inversion , 2007, Neural Networks.

[41]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[42]  Elad Eban,et al.  MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Lily Hu,et al.  What's sex got to do with machine learning? , 2020, FAT*.

[44]  Kurt Keutzer,et al.  Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.

[45]  Cyrus Shahabi,et al.  DeepTRANS , 2020, Proc. VLDB Endow..

[46]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Nisheeth K. Vishnoi,et al.  How to be Fair and Diverse? , 2016, ArXiv.

[48]  Zoran Obradovic,et al.  Effective pruning of neural network classifier ensembles , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[49]  Ben Green,et al.  Algorithmic realism: expanding the boundaries of algorithmic thought , 2020, FAT*.

[50]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[51]  Kilian Q. Weinberger,et al.  Snapshot Ensembles: Train 1, get M for free , 2017, ICLR.

[52]  Guoliang Li,et al.  QTune: A Query-Aware Database Tuning System with Deep Reinforcement Learning , 2019, Proc. VLDB Endow..

[53]  Gustavo Alonso,et al.  Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning , 2019, Proc. VLDB Endow..

[54]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[55]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[56]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[57]  Xinyuan Lu Learning to Generate Questions with Adaptive Copying Neural Networks , 2019, SIGMOD Conference.

[58]  Gennady Pekhimenko,et al.  Priority-based Parameter Propagation for Distributed DNN Training , 2019, SysML.

[59]  Samuel Madden,et al.  MISTIQUE: A System to Store and Query Model Intermediates for Model Diagnosis , 2018, SIGMOD Conference.

[60]  R. Stuart Geiger,et al.  Garbage in, garbage out?: do machine learning application papers in social computing report where human-labeled training data comes from? , 2019, FAT*.

[61]  Amir Ilkhechi,et al.  DeepSqueeze: Deep Semantic Compression for Tabular Data , 2020, SIGMOD Conference.

[62]  Hanqing Lu,et al.  Recent advances in efficient computation of deep convolutional neural networks , 2018, Frontiers of Information Technology & Electronic Engineering.

[63]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[64]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[65]  Eugene Wu,et al.  DeepBase: Deep Inspection of Neural Networks , 2018, SIGMOD Conference.

[66]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[67]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[68]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[69]  Chris Yakopcic,et al.  A State-of-the-Art Survey on Deep Learning Theory and Architectures , 2019, Electronics.

[70]  AnHai Doan,et al.  Data Curation with Deep Learning , 2020, EDBT.

[71]  Shwetak N. Patel,et al.  Riptide: Fast End-to-End Binarized Neural Networks , 2020, MLSys.

[72]  Gustavo Alonso,et al.  ColumnML: Column-Store Machine Learning with On-The-Fly Data Transformation , 2018, Proc. VLDB Endow..

[73]  Shuang Wu,et al.  Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.

[74]  Tova Milo,et al.  Automatically Generating Data Exploration Sessions Using Deep Reinforcement Learning , 2020, SIGMOD Conference.

[75]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[76]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[77]  Abdul Wasay,et al.  Data Canopy: Accelerating Exploratory Statistical Analysis , 2017, SIGMOD Conference.

[78]  Oded Shmueli,et al.  Improved Cardinality Estimation by Learning Queries Containment Rates , 2019, EDBT.

[79]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[80]  Vidushi Marda,et al.  Data in New Delhi's predictive policing system , 2020, FAT*.

[81]  Abdul Wasay,et al.  Learning Data Structure Alchemy , 2019, IEEE Data Eng. Bull..

[82]  Tim Kraska,et al.  SageDB: A Learned Database System , 2019, CIDR.

[83]  Stratos Idreos,et al.  The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models , 2018, SIGMOD Conference.

[84]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[85]  Ryan Hamerly,et al.  Large-Scale Optical Neural Networks based on Photoelectric Multiplication , 2018, Physical Review X.

[86]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[87]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[88]  Albert Gural,et al.  Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware , 2019, ArXiv.

[89]  Alexander Aiken,et al.  Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.

[90]  Abdul Wasay,et al.  More or Less: When and How to Build Convolutional Neural Network Ensembles , 2021, ICLR.

[91]  Andreas Griewank,et al.  Algorithm 799: revolve: an implementation of checkpointing for the reverse or adjoint mode of computational differentiation , 2000, TOMS.

[92]  Karima Echihabi,et al.  High-Dimensional Vector Similarity Search: From Time Series to Deep Network Embeddings , 2020, SIGMOD Conference.

[93]  Tin Vu,et al.  Deep Query Optimization , 2019, SIGMOD Conference.

[94]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[95]  Abdul Wasay,et al.  Queriosity: Automated Data Exploration , 2015, 2015 IEEE International Congress on Big Data.

[96]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[97]  J. Reidenberg,et al.  Accountable Algorithms , 2016 .