The Argument for a “Data Cube” for Large-Scale Psychometric Data

In recent years, work with educational testing data has changed due to the affordances provided by technology, the availability of large data sets, and by the advances made in data mining and machine learning. Consequently, data analysis has moved from traditional psychometrics to computational psychometrics. Despite advances in the methodology and the availability of the large data sets collected at each administration, the way assessment data is collected, stored, and analyzed by testing organizations is not conducive to these real-time, data intensive computational methods that can reveal new patterns and information about students. In this paper, we propose a new way to label, collect, and store data from large scale educational learning and assessment systems (LAS) using the concept of the “data cube.” This paradigm will make the application of machine-learning, learning analytics, and complex analyses possible. It will also allow for storing the content for tests (items) and instruction (videos, simulations, items with scaffolds) as data, which opens up new avenues for personalized learning. This data paradigm will allow us to innovate at a scale far beyond the hypothesis-driven, small-scale research that has characterized educational research in the past.

[1]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[2]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[3]  Harvey Goldstein,et al.  GUILD: GUidance for Information about Linking Data sets† , 2017, Journal of public health.

[4]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[5]  Kikumi K. Tatsuoka,et al.  A Probabilistic Model for Diagnosing Misconceptions By The Pattern Classification Approach , 1985 .

[6]  Barry Devlin,et al.  Data Warehouse: From Architecture to Implementation , 1996 .

[7]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[8]  Natalia Miloslavskaya,et al.  Big Data, Fast Data and Data Lake Concepts , 2016, BICA.

[9]  Alina A. von Davier,et al.  Computational Psychometrics in Support of Collaborative Educational Assessments , 2017 .

[10]  Pak Chung Wong,et al.  A visual analytics paradigm enabling trillion-edge graph exploration , 2015, 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV).

[11]  Alex Rayón,et al.  Ensuring the integrity and interoperability of educational usage and social data through Caliper framework to support competency-assessment , 2014, 2014 IEEE Frontiers in Education Conference (FIE) Proceedings.

[12]  Dragan Gasevic,et al.  Recipe for success: lessons learnt from using xAPI within the connected learning analytics toolkit , 2016, LAK.

[13]  Michael Yudelson,et al.  Computational Psychometrics Approach to Holistic Learning and Assessment Systems , 2019, Front. Educ..

[14]  Wayne J. Camara,et al.  Beyond Academics: A Holistic Framework for Enhancing Education and Workplace Success , 2015 .

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Robert J. Mislevy,et al.  Taming Log Files From Game/Simulation‐Based Assessments: Data Models and Data Analysis Tools , 2016 .

[17]  Matthias von Davier,et al.  High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models , 2016 .

[18]  Frank Hayes The Story So Far , 2002 .