statacpp: An interface between Stata and C++, with big data and machine-learning applications

Stata and Mata are very powerful and flexible for data processing and analysis, but there are some problems that can be fixed faster or more easily by using a lower-level programming language. statacpp is a command that allows users to write a C++ program, have Stata add your data, matrices, or globals into it, compile it to an executable program, run it, and return the results back into Stata as more variables, matrices, or globals in a do-file. The most important use cases are likely to be around big data and MapReduce (where data can be filtered and processed according to parameters from Stata and reduced results passed into Stata) and machine learning (where existing powerful libraries such as TensorFlow can be utilised). Short examples will be shown of both these aspects. Future directions for development will also be outlined, in particular calling Stata from C++ (useful for real-time responsive analysis) and calling CUDA from Stata (useful for massively parallel processing on GPU chips).