A Common GPU n-Dimensional Array for Python and C

Currently there are multiple incompatible array/matrix/n-dimensional base object implementations for GPUs. This hinders the sharing of GPU code and causes duplicate development work. This paper proposes and presents a first version of a common GPU n-dimensional array (tensor) named GpuNdArray [1] that works with both CUDA and OpenCL. It will be usable from Python, C, and possibly other programming languages.