Array programming with NumPy

Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.

[1]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[2]  Kenneth E. Iverson,et al.  Notation as a tool of thought , 1980, APLQ.

[3]  D. Munro Using the Yorick interpreted language , 1995 .

[4]  Konrad Hinsen,et al.  Numerical Python , 1996 .

[5]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[6]  Paul F. Dubois,et al.  Steering object-oriented scientific computations , 1997, Proceedings of TOOLS USA 97. International Conference on Technology of Object Oriented Systems and Languages.

[7]  Takuji Nishimura,et al.  Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator , 1998, TOMC.

[8]  Yannis Manolopoulos,et al.  The BASIS System: A Benchmarking Approach for Spatial Index Structures , 1999, Spatio-Temporal Database Management.

[9]  G. Marsaglia,et al.  The Ziggurat Method for Generating Random Variables , 2000 .

[10]  Richard L. White,et al.  numarray : A New Scientific Array Package for Python , 2003 .

[11]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[12]  Greg Wilson,et al.  Software Carpentry: Getting Scientists to Write Better Code by Making Them More Productive , 2006, Computing in Science & Engineering.

[13]  Leonid Ryzhyk,et al.  The ARM Architecture , 2006 .

[14]  K. Jarrod Millman,et al.  Analysis of Functional Magnetic Resonance Imaging in Python , 2007, Computing in Science & Engineering.

[15]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[16]  Paul F. Dubois,et al.  Guest Editor's Introduction: Python: Batteries Included , 2007, Computing in Science & Engineering.

[17]  Paul F. D Ubois Python: Batteries Included , 2007 .

[18]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[19]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[20]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[21]  Gene H. Golub,et al.  Netlib and NA-Net: Building a Scientific Computing Community , 2008, IEEE Annals of the History of Computing.

[22]  David Goldsmith,et al.  Progress Report: NumPy and SciPy Documentation in 2009 , 2009 .

[23]  Pearu Peterson,et al.  F2PY: a tool for connecting Fortran and Python programs , 2009, Int. J. Comput. Sci. Eng..

[24]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[25]  Janice Singer,et al.  How do scientists develop and use scientific software? , 2009, 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering.

[26]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  K. Jarrod Millman,et al.  Python for Scientists and Engineers , 2011, Comput. Sci. Eng..

[29]  Mark A. Moraes,et al.  Parallel random numbers: As easy as 1, 2, 3 , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[30]  Fernando Pérez,et al.  Python: An Ecosystem for Scientific Computing , 2011, Computing in Science & Engineering.

[31]  Stefan Behnel,et al.  Cython: The Best of Both Worlds , 2011, Computing in Science & Engineering.

[32]  Gaël Varoquaux,et al.  The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.

[33]  Zhang Yunquan,et al.  Model-driven Level 3 BLAS Performance Optimization on Loongson 3A Processor , 2012, ICPADS.

[34]  Prasanth H. Nair,et al.  Astropy: A community Python package for astronomy , 2013, 1307.6212.

[35]  Qian Wang,et al.  AUGEM: Automatically generate high performance Dense Linear Algebra kernels on x86 CPUs , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[36]  Felix S. Klock,et al.  The rust language , 2014 .

[37]  Nicholas D. Matsakis,et al.  The rust language , 2014, HILT '14.

[38]  Emmanuelle Gouillart,et al.  scikit-image: image processing in Python , 2014, PeerJ.

[39]  Melissa E. O'Neill PCG : A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation , 2014 .

[40]  Valcir João da Cunha Farias,et al.  Analysis of Functional Magnetic Resonance Images , 2014 .

[41]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[42]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[43]  Simon Liedtke,et al.  SunPy—Python for solar physics , 2015, 1505.02563.

[44]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[45]  Mehdi Amini,et al.  Pythran: Enabling Static Optimization of Scientific Python Programs , 2013, SciPy.

[46]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[47]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[48]  Von Welch,et al.  Reproducing GW150914: The First Observation of Gravitational Waves From a Binary Black Hole Merger , 2016, Computing in Science & Engineering.

[49]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[50]  K. Bouman,et al.  HIGH-RESOLUTION LINEAR POLARIMETRIC IMAGING FOR THE EVENT HORIZON TELESCOPE , 2016, 1605.06156.

[51]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[52]  Stephan Hoyer,et al.  xarray: N-D labeled arrays and datasets in Python , 2017 .

[53]  Joseph Hamman,et al.  Pangeo: A Big-data Ecosystem for Scalable Earth System Science , 2018 .

[54]  Miguel de Val-Borro,et al.  The Astropy Project: Building an Open-science Project and Status of the v2.0 Core Package , 2018, The Astronomical Journal.

[55]  Adrian M. Price-Whelan,et al.  Binary Companions of Evolved Stars in APOGEE DR14: Search Method and Catalog of ∼5000 Companions , 2018, The Astronomical Journal.

[56]  John K. Parejko,et al.  LSST data management software development practices and tools , 2018, Astronomical Telescopes + Instrumentation.

[57]  K. Jarrod Millman,et al.  Developing Open-Source Scientific Practice * , 2018, Implementing Reproducible Research.

[58]  J. Dongarra,et al.  Evolution of Numerical Software for Dense Linear Algebra , 2018 .

[59]  K. Jarrod Millman,et al.  Teaching Computational Reproducibility for Neuroimaging , 2018, Front. Neurosci..

[60]  Daniel Lemire,et al.  Fast Random Integer Generation in an Interval , 2018, ACM Trans. Model. Comput. Simul..

[61]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[62]  L. Blackburn,et al.  ehtim: Imaging, analysis, and simulation software for radio interferometry , 2019 .

[63]  bashtage/randomgen: Release 1.16.2 , 2019 .

[64]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.