PROFHMM_UNC: Introducing a Priori Knowledge for Completing Missing Values of Multidimensional Time-Series

We present a new method for estimating missing values or correcting unreliable observed values of time dependent physical fields. This method, is based on Hidden Markov Models and Self-Organizing Maps, and is named PROFHMM_UNC. PROFHMM_UNC combines the knowledge of the physical process under study provided by an already known dynamic model and the truncated time series of observations of the phenomenon. In order to generate the states of the Hidden Markov Model, Self-Organizing Maps are used to discretize the available data. We make a modification to the Viterbi algorithm that forces the algorithm to take into account a priori information on the quality of the observed data when selecting the optimum reconstruction. The validity of PROFHMM_UNC was endorsed by performing a twin experiment with the outputs of the ocean biogeochemical NEMO-PISCES model.

[1]  Chidchanok Lursinsap,et al.  Imputing incomplete time-series data based on varied-window similarity measure of data sequences , 2007, Pattern Recognit. Lett..

[2]  Andrew J. Viterbi,et al.  An Intuitive Justification and a Simplified Implementation of the MAP Decoder for Convolutional Codes , 1998, IEEE J. Sel. Areas Commun..

[3]  Chiman Kwan,et al.  A novel approach to fault diagnostics and prognostics , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[4]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[5]  Sylvie Thiria,et al.  Inverse Method for the Retrieval of Ocean Vertical Profiles using Self Organizing Maps and Hidden Markov Models - Application on Ocean Colour Satellite Image Inversion , 2011, International Joint Conference on Computational Intelligence.

[6]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[7]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[8]  G. Madec NEMO ocean engine , 2008 .

[9]  Joachim Hagenauer,et al.  A Viterbi algorithm with soft-decision outputs and its applications , 1989, IEEE Global Telecommunications Conference, 1989, and Exhibition. 'Communications Technology for the 1990s and Beyond.

[10]  Mustapha Lebbah,et al.  SOS-HMM: Self-Organizing Structure of Hidden Markov Model , 2011, ICANN.

[11]  Ito Wasito,et al.  Nearest neighbour approach in the least-squares data imputation algorithms , 2005, Inf. Sci..

[12]  Chidchanok Lursinsap,et al.  Imputing time series data by regional-gradient-guided bootstrapping algorithm , 2009, 2009 9th International Symposium on Communications and Information Technology.

[13]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[14]  Scott C. Doney,et al.  The US JGOFS Synthesis and Modeling Project – An introduction , 2001 .

[15]  Allan R. Robinson,et al.  OVERVIEW OF DATA ASSIMILATION , 2007 .

[16]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .