Data-Centric AI Requires Rethinking Data Notion

The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, data is viewed as a mathematical structure that we act upon via morphisms to preserve this structure. As for cochain notion, data can be viewed as a function defined in a discrete domain of interest and acted upon via operators. While these notions are almost orthogonal, they provide a unifying definition to view data, ultimately impacting the way machine learning packages are developed, implemented, and utilized by practitioners.

[1]  Santiago Segarra,et al.  Principled Simplicial Neural Networks for Trajectory Prediction , 2021, ICML.

[2]  Glenn M. Fung,et al.  Simplicial 2-Complex Convolutional Neural Nets. , 2020, 2012.06010.

[3]  Oleg Kachan Persistent Homology-based Projection Pursuit , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  David I. Spivak Ologs: A Categorical Framework for Knowledge Representation , 2011, PloS one.

[5]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[6]  Yiying Tong,et al.  Discrete differential forms for computational modeling , 2005, SIGGRAPH Courses.

[7]  Gard Spreemann,et al.  Simplicial Neural Networks , 2020, ArXiv.

[8]  Pierre Vandergheynst,et al.  Graph Signal Processing: Overview, Challenges, and Applications , 2017, Proceedings of the IEEE.

[9]  Fernando de Goes,et al.  Discrete differential operators on polygonal meshes , 2020, ACM Trans. Graph..

[10]  Joan Bruna,et al.  Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , 2021, ArXiv.

[11]  Santiago Segarra,et al.  Signal Processing on Higher-Order Networks: Livin' on the Edge ... and Beyond , 2021, Signal Process..

[12]  Austin R. Benson,et al.  Random Walks on Simplicial Complexes and the normalized Hodge Laplacian , 2018, SIAM Rev..

[13]  Santiago Segarra,et al.  HodgeNet: Graph Neural Networks for Edge Data , 2019, 2019 53rd Asilomar Conference on Signals, Systems, and Computers.

[14]  Luiz Velho,et al.  A simple and complete discrete exterior calculus on general polygonal meshes , 2021, Comput. Aided Geom. Des..

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[17]  Anil N. Hirani,et al.  Discrete exterior calculus , 2005, math/0508341.

[18]  Heather A. Harrington,et al.  What are higher-order networks? , 2021, ArXiv.

[19]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.