A Nonrelational Data Warehouse for the Analysis of Field and Laboratory Data From Multiple Heterogeneous Photovoltaic Test Sites

A nonrelational, distributed computing, data warehouse, and analytics environment (Energy-CRADLE) was developed for the analysis of field and laboratory data from multiple heterogeneous photovoltaic (PV) test sites. This data informatics and analytics infrastructure was designed to process diverse formats of PV performance data and climatic telemetry time-series data collected from a PV outdoor test network, i.e., the Solar Durability and Lifetime Extension global SunFarm network, as well as point-in-time laboratory spectral and image measurements of PV material samples. Using Hadoop/HBase for the distributed data warehouse, Energy-CRADLE does not have a predefined data table schema, which enables ingestion of data in diverse and changing formats. For easy data ingestion and data retrieval, Energy-CRADLE utilizes Hadoop streaming to enable Python MapReduce and provides a graphical user interface, i.e., py-CRADLE. By developing the Hadoop distributed computing platform and the HBase NoSQL database schema for solar energy, Energy-CRADLE exemplifies an integrated, scalable, secure, and user-friendly data informatics and analytics system for PV researchers. An example of Energy-CRADLE enabled scalable, data-driven, analytics is presented, where machine learning is used for anomaly detection across 2.2 million real-world current-voltage (I-V) curves of PV modules in three distinct Köppen-Geiger climatic zones.

[1]  Miriam A. M. Capretz,et al.  Data management in cloud environments: NoSQL and NewSQL data stores , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[2]  Roger H. French,et al.  Microinverter Thermal Performance in the Real-World: Measurements and Modeling , 2015, PloS one.

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  Massimo Carro,et al.  NoSQL Databases , 2014, ArXiv.

[5]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[6]  Wilfried van Sark,et al.  Analysis of Long-Term Performance of PV Systems , 2015 .

[7]  Roger H. French,et al.  Detecting heterogeneity in PV modules from massive real-world “step” I-V curves: A machine learning approach , 2016, 2016 IEEE 43rd Photovoltaic Specialists Conference (PVSC).

[8]  Wolfgang Barth,et al.  Nagios: System and Network Monitoring , 2006 .

[9]  GhemawatSanjay,et al.  The Google file system , 2003 .

[10]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX Annual Technical Conference.

[11]  S. Wilcox,et al.  Users Manual for TMY3 Data Sets (Revised) , 2008 .

[12]  John L. Sarrao,et al.  From Quanta to the Continuum: Opportunities for Mesoscale Science , 2012 .

[13]  thE cOvER FROM QUANTA TO THE CONTINUUM : OPPORTUNITIES FOR MESOSCALE SCIENCE , 2012 .

[14]  Roger H. French,et al.  Comparison of multi-crystalline silicon PV modules' performance under augmented solar irradiation , 2013 .

[15]  David Moser,et al.  Monitoring of Photovoltaic Systems: Good Practices and Systematic Analysis , 2013 .

[16]  Barack Obama Executive Order 13702: Creating a National Strategic Computing Initiative , 2015 .

[17]  Jiayang Sun,et al.  Statistical and Domain Analytics Applied to PV Module Lifetime and Degradation Science , 2013, IEEE Access.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  M. Slee,et al.  Thrift : Scalable Cross-Language Services Implementation , 2022 .

[20]  Laura S. Bruckman,et al.  Degradation science: Mesoscopic evolution and temporal analytics of photovoltaic energy materials , 2015 .

[21]  Ying Zhang,et al.  A Hadoop-based Massive Molecular Data Storage Solution for Virtual Screening , 2012, 2012 Seventh ChinaGrid Annual Conference.

[22]  Tomas Cebecauer,et al.  SolarGIS: Solar Data and Online Applications for PV Planning and Performance Assessment , 2011 .

[23]  Mark A. Ratner,et al.  Challenges at the Frontiers of Matter and Energy: Transformative Opportunities for Discovery Science , 2015 .

[24]  Yang Hu,et al.  Insights into metastability of photovoltaic materials at the mesoscale through massive I–V analytics , 2016 .

[25]  Yike Guo,et al.  CGDM: collaborative genomic data model for molecular profiling data using NoSQL , 2016, Bioinform..

[26]  Neal Leavitt,et al.  Will NoSQL Databases Live Up to Their Promise? , 2010, Computer.

[27]  Roger H. French,et al.  Automatic Spectral Database and Archive System for Optical Spectroscopy , 1990 .

[28]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[29]  Burns David Selenium 2 Testing Tools: Beginner's Guide , 2012 .

[30]  Achim Streit,et al.  On the Application and Performance of MongoDB for Climate Satellite Data , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[31]  Franz Rubel,et al.  Observed and projected climate shifts 1901-2100 depicted by world maps of the Köppen-Geiger climate classification , 2010 .

[32]  Justin S. Fada,et al.  Democratizing an electroluminescence imaging apparatus and analytics project for widespread data acquisition in photovoltaic materials. , 2016, The Review of scientific instruments.

[33]  Ganesh Chandra Deka,et al.  Handbook of Research on Securing Cloud-Based Databases with Biometric Applications , 2014 .

[34]  Yang Hu,et al.  Global SunFarm data acquisition network, energy CRADLE, and time series analysis , 2013, 2013 IEEE Energytech.

[35]  Laura S. Bruckman,et al.  Protocols for Investigating Lifetime and Degradation of PV Technology Systems , 2014 .