Transformative Opportunities from Data Science and Big Data Analytics: Applied to Photovoltaics

57 U nderstanding the overall lifetime performance of photovoltaic (PV) modules is essential to continue the cost reduction of solar energy, thereby increasing its contribution to the world’s electricity needs and sustainability goals.1,2 In order to reach the 2030 SunShot goal of $0.03/kWh,3,4 the power degradation rates (representing the annual reduction in power output by a PV module) must be lowered to the 0.2%/year goal, so as to increase the lifetime of PV modules installed in diverse climates zones. Predicting the PV module performance over their 20to 30-year product warranty or lifetime is typically done using traditional reliability approaches such as pass/ fail testing and materials qualification, yet this has proven insufficient. Current PV qualification tests have led to failures in real-world PV applications.5 Historical data from installed PV systems is the ideal source for understanding the magnitude and causes of module power loss and degradation, and for identifying how to extend the lifetime of PV modules to 50 years. The largest set of historical data available is time series power data (typically one or five minute interval datastreams) from commercial and research PV power plants. Handling such extensive PV data sets, which can extend for 20 years, required new informatics and analytical approaches to derive scientific insights. A big data analytics approach, utilizing Hadoop6 and a non-relational data warehouse was developed to handle the large volume real-time data streams.7-10 Graduate and undergraduate students, across numerous academic departments, needed to learn data cleaning and assembly, statistical analysis, coding, data-driven modeling, and statistical and machine learning. This also requires the use of open and reproducible science methods, with shared code and data augmenting traditional journal publications.11 In this paper, we describe some of the challenges and opportunities associated with acquiring the data, structuring the data, and performance analytics in meaningful ways while also respecting the privacy concerns of collaborators across the PV value chain. The approach we take, which we refer to as engineering epidemiology, draws upon medical research study designs and protocols for understanding PV modules, components, and materials under accelerated exposures and real-world, in-use, conditions. Domain knowledge of materials science, combined with network models of materials, components, and systems, allows the capture of multiple phenomena as a system of equations, for the understanding of which particular mechanisms are induced by multivariate stressors, and how those relate to meaningful overall performance metrics across dimensions and temporal scales. We believe as data science and big data analytics grow in the solar field, the cost of PV electricity will continue to decrease by improving module lifetimes, performance, and decreasing the operational and maintenance burden on commercial PV plant owners. We also articulate some of the key emerging needs, such as greater use of image analysis of modules during manufacturing, large-scale image analysis of installed PV, greater use of current and voltage (IV) curves, and improved solar forecasting. Transformative Opportunities from Data Science and Big Data Analytics: Applied to Photovoltaics