Data-analysis has undergone an important change from statistical descriptive analysis to data-mining. Information networks and huge data-storage equipments brought data-retrieval to new dimensions. Time-series are especially easy to accumulate as digital sensors can be used to fill databases without any intervention. This is both a boon and a problem as the very amount of data available prevents the user from being able to understand them. One has to build high-level representations of the time-series to be able to extract some information. Segmentation is often used in process-monitoring for similar reasons.In this paper, we describe step by step difficulties and solutions that we studied when adapting automated time-series segmentation to a real-world example of electric consumption analysis. The data that we want to analyze consist of yearly reports of electric power consumption in 10 minute ticks. We study industrial consumers that have simple processes (ovens, motors) switched either on or off for the duration of the process. Hence we could use this prior knowledge to model the time-series with piecewise constant changing mean models. We then extend the segmentation to a symbolic representation to enable interpretation of the overwhelming number of generated segments.
[1]
Michael A. Eisenberg.
The Kineticist''s Workbench: Combining Symbolic and Numerical Methods in the Simulation of Chemical Reaction Mechanisms
,
1991
.
[2]
Lotfi A. Zadeh,et al.
Fuzzy logic = computing with words
,
1996,
IEEE Trans. Fuzzy Syst..
[3]
H. Akaike,et al.
Information Theory and an Extension of the Maximum Likelihood Principle
,
1973
.
[4]
G. Schwarz.
Estimating the Dimension of a Model
,
1978
.
[5]
Ramakrishnan Srikant,et al.
Mining Sequential Patterns: Generalizations and Performance Improvements
,
1996,
EDBT.
[6]
D. Madigan,et al.
Proceedings : KDD-99 : the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 15-18, 1999, San Diego, California, USA
,
1999
.
[7]
Jaideep Srivastava,et al.
Event detection from time series data
,
1999,
KDD '99.