Discussion of “Industrial statistics and manifold data”

As industrial technology evolves and processes achieve higher levels of flexibility (ability to produce multiple products) and consistency (low production variability), more data is generated that can be used for different activities both online and offline, such as: process monitoring, diagnosis, prognosis, control, optimization and continuous improvement. Such data inherits characteristics from the underlying process phenomena, therefore, getting also increasingly complex, varied, available in larger volumes and at faster rates. In their paper, Enrique del Castillo and Xueqi Zhao provide a welcome and very opportune contribution to expand the methodological background of the statistical/datacentric community on how to handle complex data sets such as those collected by 3D scanners in additive/ subtractive manufacturing (AM/SM) systems – one of the technological landmarks of Industry 4.0. These data sets consist of Point Clouds or processed versions of them, such as triangulated meshes, typically with large sizes, usually on the Gb range, and high “complexity,” as data lies in a low dimensional manifold embedded in a higher dimensional Euclidean space. “Data complexity” is here taken in the sense of expressing the degree of deviation from common data structures, which users are used to handle and analyze: each entity/object of interest consists of a large data set of manifold data, instead of a line in a two-way table, as in classical statistical analysis. (This is a natural and intuitive interpretation of “data complexity,” but as we are faced with an increasingly variety of data structures, I deem it opportune to devote efforts to come up with a general definition embracing the multiple perspectives and dimensions that are relevant to consider; a quick search in the web reveals that some incursions were already made to this topic, and operational definitions were proposed mainly to support activities on supervised learning, namely classification tasks.) In their paper, Del Castillo and Zhao addressed both Statistical Process Monitoring (SPM) as well as Design of Experiments (DOE) and regression methodologies for manifold data. In the following paragraphs, I will refer to them but, due to the space constrains, I will focus the discussion more on the SPM analysis of manifold data.