Semi-Supervised Novelty Detection with Adaptive Eigenbases, and Application to Radio Transients

Energy production, distribution, and consumption play a critical role in the sustainability of the planet and its natural resources. Electric power systems have been going through major changes that are aimed to make the energy infrastructure “smarter”, scalable, and more efficient. These new generation of smart energy grids need novel computational algorithms for supporting generation of power from wide range of sources, efficient energy distribution, and sustainable consumption. This paper argues that a fundamentally distributed approach with more local flexibility is a lot more sustainable methodology compared to the traditional centralized frameworks for analyzing and processing data. It considers the problem of predicting power generation and consumption trends over a distributed smart grid. Since power generation from solar, wind, geothermal and other renewable sources are likely to be part of many households in near future, both power generation and consumption data will be generated over a wide area network. Moreover, a good part of the communication links between the household data sources and the central server are likely to be over the wireless networks with low bandwidth and high data-plan cost. Analyzing such data (some of it privacy sensitive) in a centralized is not scalable, sometimes not privacy-preserving, and often not practical because of cost-sensitivity of the applications. This paper presents a more sustainable distributed asynchronous algorithm for constructing energy demand prediction models in a smart grid by multivariate linear regression. The paper offers the algorithm, analysis, and experimental results.

[1]  Antonio Irpino,et al.  Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation , 2007, EGC.

[2]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[3]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[4]  Soumya Ghosh,et al.  Automatic Recognition of Landforms on Mars Using Terrain Segmentation and Classification , 2006, Discovery Science.

[5]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[6]  K. Emanuel The Hurricane—Climate Connection , 2008 .

[7]  Latifur Khan,et al.  Multi-label large margin hierarchical perceptron , 2008, Int. J. Data Min. Model. Manag..

[8]  Chandrika Kamath,et al.  Associating weather conditions with ramp events in wind power generation , 2011, 2011 IEEE/PES Power Systems Conference and Exposition.

[9]  J. Randerson,et al.  Global estimation of burned area using MODIS active fire observations , 2005 .

[10]  Eamonn J. Keogh,et al.  Segmenting Time Series: A Survey and Novel Approach , 2002 .

[11]  R. Kleeman Measuring Dynamical Prediction Utility Using Relative Entropy , 2002 .

[12]  Pol Coppin,et al.  Review ArticleDigital change detection methods in ecosystem monitoring: a review , 2004 .

[13]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[14]  Stamatis Karnouskos,et al.  Predicting Energy Measurements of Service-Enabled Devices in the Future Smartgrid , 2010, 2010 12th International Conference on Computer Modelling and Simulation.

[15]  Vipin Kumar,et al.  A Comparative Study Of Algorithms For Land Cover Change , 2010, CIDU.

[16]  Fabio Gagliardi Cozman,et al.  Semi-Supervised Learning of Mixture Models , 2003, ICML.

[17]  Berthold Reinwald,et al.  Multidimensional content eXploration , 2008, Proc. VLDB Endow..

[18]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[19]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[20]  Marten Scheffer,et al.  Regime Shifts in the Sahara and Sahel: Interactions between Ecological and Climatic Systems in Northern Africa , 2003, Ecosystems.

[21]  Latifur Khan,et al.  SISC: A Text Classification Approach Using Semi Supervised Subspace Clustering , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[22]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Ryan L. Sriver,et al.  Observational evidence for an ocean heat pump induced by tropical cyclones , 2007, Nature.

[24]  Andrew J. Majda,et al.  Strategies for Model Reduction: Comparing Different Optimal Bases , 2004 .

[25]  Andrew T. Wittenberg,et al.  El Niño and our future climate: where do we stand? , 2010 .

[26]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[27]  Dragan Banjevic,et al.  Calculation of reliability function and remaining useful life for a Markov failure time process , 2006 .

[28]  Nicolas Barbier,et al.  Remote sensing detection of droughts in Amazonian forest canopies. , 2010, The New phytologist.

[29]  R. Lunetta,et al.  Land-cover change detection using multi-temporal MODIS NDVI data , 2006 .

[30]  R. Vautard,et al.  Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series , 1989 .

[31]  Kamalika Das,et al.  Block-GP: Scalable Gaussian Process Regression for Multimodal Data , 2010, 2010 IEEE International Conference on Data Mining.

[32]  Viliam Makis,et al.  A Control-Limit Policy And Software For Condition-Based Maintenance Optimization , 2001 .

[33]  W. Timothy Liu,et al.  New evidence for enhanced ocean primary production triggered by tropical cyclone , 2003 .

[34]  M. Cronin,et al.  Horizontal and vertical structure of easterly waves in the Pacific ITCZ , 2008 .

[35]  Michael Ghil,et al.  ADVANCED SPECTRAL METHODS FOR CLIMATIC TIME SERIES , 2002 .

[36]  Feng Yan,et al.  Sparse Gaussian Process Regression via L1 Penalization , 2010, ICML.

[37]  Vipin Kumar,et al.  Discovering Dynamic Dipoles in Climate Data , 2011, SDM.

[38]  A. Taylor,et al.  Widespread Increase of Tree Mortality Rates in the Western United States , 2009, Science.

[39]  Mikhail Belkin,et al.  DATA SPECTROSCOPY: EIGENSPACES OF CONVOLUTION OPERATORS AND CLUSTERING , 2008, 0807.3719.

[40]  George Kuczera,et al.  A hidden Markov model for modelling long-term persistence in multi-site rainfall time series 1. Model calibration using a Bayesian approach , 2003 .

[41]  W. Collins,et al.  The Community Climate System Model Version 3 (CCSM3) , 2006 .

[42]  Chunlin Huang,et al.  A Simplified Data Assimilation Method for Reconstructing Time-Series MODIS NDVI Data , 2008, IGARSS 2008 - 2008 IEEE International Geoscience and Remote Sensing Symposium.

[43]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[44]  Canada.,et al.  Data Mining and Machine Learning in Astronomy , 2009, 0906.2173.

[45]  David David,et al.  Extracting Critical Information from Free Text Data for Systems Health Management , 2011 .

[46]  Compton J. Tucker,et al.  Fifty years of deforestation and forest fragmentation in Madagascar , 2007, Environmental Conservation.

[47]  D. Hawkins POINT ESTIMATION OF THE PARAMETERS OF PIECEWISE REGRESSION MODELS. , 1976 .

[48]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[49]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[50]  Dipankar Bhattacharya Detection of Radio Emission from Pulsars , 1998 .

[51]  N. Graham,et al.  Importance of the Indian Ocean for simulating rainfall anomalies over eastern and southern Africa , 1999 .

[52]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  P. Diaconis,et al.  Geometric Bounds for Eigenvalues of Markov Chains , 1991 .

[54]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[55]  Jeffrey P. Kharoufeh,et al.  Explicit results for wear processes in a Markovian environment , 2003, Oper. Res. Lett..

[56]  A. Welsh,et al.  Generalized additive modelling and zero inflated count data , 2002 .

[57]  Reinhard Klein,et al.  Interactive Exploration of Large Event Datasets in High Energy Physics , 2009, J. WSCG.

[58]  J. Mclaughlin Searches for Fast Radio Transients , 2003, astro-ph/0304364.

[59]  Tong Zhang,et al.  The Value of Unlabeled Data for Classification Problems , 2000, ICML 2000.

[60]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[61]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[62]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[63]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[64]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[65]  Silvia Santini,et al.  Adaptive model selection for time series prediction in wireless sensor networks , 2007, Signal Process..

[66]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[67]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[68]  G. Blumenstock Drought in the United States Analyzed by Means of the Theory of Probability , 1942 .

[69]  Neil L. Frank Atlantic Tropical Systems of 1974 , 1975 .

[70]  Xi C. Chen,et al.  A STUDY OF TIME SERIES NOISE REDUCTION TECHNIQUES IN THE CONTEXT OF LAND COVER CHANGE DETECTION , 2011, CIDU 2011.

[71]  Latifur Khan,et al.  Cause Identification from Aviation Safety Incident Reports via Weakly Supervised Semantic Lexicon Construction , 2010, J. Artif. Intell. Res..

[72]  Viliam Makis,et al.  Optimal replacement policy and the structure of software for condition‐based maintenance , 1997 .

[73]  H. Loon,et al.  The Seesaw in Winter Temperatures between Greenland and Northern Europe. Part I: General Description , 1978 .

[74]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[75]  Stephen P. Boyd,et al.  Gossip algorithms: design, analysis and applications , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[76]  Mikhail Belkin,et al.  Using manifold structure for partially labelled classification , 2002, NIPS 2002.

[77]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[78]  M. Rajeevan,et al.  High resolution daily gridded rainfall data for the Indian region: Analysis of break and active monsoon spells , 2006 .

[79]  G. P. King,et al.  Extracting qualitative dynamics from experimental data , 1986 .

[80]  Martijn J. Booij,et al.  Extreme daily precipitation in Western Europe with climate change at appropriate spatial scales , 2002 .

[81]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[82]  Jinyan Li,et al.  Distance Based Subspace Clustering with Flexible Dimension Partitioning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[83]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[84]  P. Bérard Spectral Geometry: Direct and Inverse Problems , 1986 .

[85]  Luke Miratrix,et al.  Discovering word associations in news media via feature selection and sparse classification , 2010, MIR '10.

[86]  Hillol Kargupta,et al.  An Efficient Local Algorithm for Distributed Multivariate Regression in Peer-to-Peer Networks , 2008, SDM.

[87]  Bhaskar Jha,et al.  A New Methodology for Estimating the Unpredictable Component of Seasonal Atmospheric Variability , 2007 .

[88]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[89]  S. G. Djorgovski,et al.  Automated probabilistic classification of transients and variables , 2008, 0802.3199.

[90]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[91]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[92]  Grant Branstator,et al.  Two Limits of Initial-Value Decadal Predictability in a CGCM , 2010 .

[93]  Andrew J. Majda,et al.  Quantifying the Predictive Skill in Long-Range Forecasting. Part I: Coarse-Grained Predictions in a Simple Ocean Model , 2012 .

[94]  C. Guestrin,et al.  Distributed regression: an efficient framework for modeling sensor network data , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[95]  James D. Scott,et al.  Extratropical Atmosphere–Ocean Variability in CCSM3 , 2006 .

[96]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[97]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[98]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[99]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[100]  Martin P. Tingley,et al.  A Bayesian ANOVA Scheme for Calculating Climate Anomalies, with Applications to the Instrumental Temperature Record , 2012 .

[101]  C. Thorncroft,et al.  Analysis of African Easterly Wave Structures and Their Role in Influencing Tropical Cyclogenesis , 2010 .

[102]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[103]  David Rind,et al.  Teleconnections in a warmer climate: the pliocene perspective , 2011 .

[104]  Masao Fukushima,et al.  Application of the alternating direction method of multipliers to separable convex programming problems , 1992, Comput. Optim. Appl..

[105]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[106]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[107]  Mikhail Belkin,et al.  Data spectroscopy: learning mixture models using eigenspaces of convolution operators , 2008, ICML '08.

[108]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[109]  F. Camilo,et al.  The Parkes multi-beam pulsar survey - I. Observing and data analysis systems, discovery and timing of 100 pulsars , 2001, astro-ph/0106522.

[110]  Liqun Wang,et al.  Boundary crossing probability for Brownian motion and general boundaries , 1997, Journal of Applied Probability.

[111]  Franklin B. Schwing,et al.  Nonstationary seasonality of upper ocean temperature in the California Current , 2004 .

[112]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[113]  J. van Paradijs,et al.  The many faces of neutron stars , 1998 .

[114]  N. Mantua,et al.  The Pacific Decadal Oscillation , 2002 .

[115]  C. Pekeris,et al.  Atmospheric Oscillations , 1936, Nature.

[116]  John Bjørnar Bremnes,et al.  Probabilistic Forecasts of Precipitation in Terms of Quantiles Using NWP Model Output , 2004 .

[117]  Jiawei Han,et al.  Topic Cube: Topic Modeling for OLAP on Multidimensional Text Databases , 2009, SDM.

[118]  John-Paul Clarke,et al.  A Dynamic I/O Model for TRACON Traffic Management , 2007, ArXiv.

[119]  Vipin Kumar,et al.  Time series change detection: algorithms for land cover change , 2010 .

[120]  Kevin I. Hodges,et al.  African Easterly Wave Variability and Its Relationship to Atlantic Tropical Cyclone Activity , 2001 .

[121]  W. Verhoef,et al.  Reconstructing cloudfree NDVI composites using Fourier analysis of time series , 2000 .

[122]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[123]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[124]  R. Pielke,et al.  Hurricanes and Global Warming. , 2005 .

[125]  Hartmut Fricke,et al.  Collision risk on final approach – a radar data based evaluation method to assess safety ANP based Obstacle Assessment Surfaces , 2010 .

[126]  W. Meeker Accelerated Testing: Statistical Models, Test Plans, and Data Analyses , 1991 .

[127]  Jialin Lin,et al.  Interdecadal variability of ENSO in 21 IPCC AR4 coupled GCMs , 2007 .

[128]  R. Saravanan,et al.  Oceanic Forcing of Sahel Rainfall on Interannual to Interdecadal Time Scales , 2003, Science.

[129]  J. S. Bloom,et al.  The Dynamic Radio Sky: An Opportunity for Discovery , 2009, 0904.0633.

[130]  Charles Jones,et al.  The Influence of Intraseasonal Variations on Medium- to Extended-Range Weather Forecasts over South America , 2000 .

[131]  Eric Feron,et al.  Trajectory Clustering and an Application to Airspace Monitoring , 2010, IEEE Transactions on Intelligent Transportation Systems.

[132]  K. Trenberth Some Effects of Finite Sample Size and Persistence on Meteorological Statistics. Part I: Autocorrelations , 1984 .

[133]  Jin Chen,et al.  A simple method for reconstructing a high-quality NDVI time-series data set based on the Savitzky-Golay filter , 2004 .

[134]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[135]  Pang-Ning Tan,et al.  An Integrated Framework for Simultaneous Classification and Regression of Time-Series Data , 2010, SDM.

[136]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[137]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[138]  M. Steinbach,et al.  Data Mining for the Discovery of Ocean Climate Indices , 2002 .

[139]  Volker Tresp,et al.  The generalized Bayesian committee machine , 2000, KDD '00.

[140]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[141]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[142]  Alexander J. Smola,et al.  Sparse Greedy Gaussian Process Regression , 2000, NIPS.

[143]  J. García-Serrano,et al.  Rotational atmospheric circulation during North Atlantic-European winter: the influence of ENSO , 2011 .

[144]  Paul J. Roebber,et al.  What Do Networks Have to Do with Climate , 2006 .

[145]  Varun Chandola,et al.  Using Time Series Segmentation for Deriving Vegetation Phenology Indices from MODIS NDVI Data , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[146]  Renato D. C. Monteiro,et al.  Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression , 2009, Mathematical Programming.

[147]  Shigeki Sagayama,et al.  Multiple-regression hidden Markov model , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[148]  Shai Avidan,et al.  Fast Pixel/Part Selection with Sparse Eigenvectors , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[149]  R. B. Jackson,et al.  CO 2 emissions from forest loss , 2009 .

[150]  J. McQuigg A Simple Index of Drought Conditions , 1954 .

[151]  P. Beck,et al.  Improved monitoring of vegetation dynamics at very high latitudes: A new method using MODIS NDVI , 2006 .

[152]  R. Ekers,et al.  RADIO BURSTS WITH EXTRAGALACTIC SPECTRAL CHARACTERISTICS SHOW TERRESTRIAL ORIGINS , 2010, 1009.5392.

[153]  D J Thomson,et al.  Dependence of global temperatures on atmospheric CO2 and solar irradiance. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[154]  Thomas Gärtner,et al.  Efficient co-regularised least squares regression , 2006, ICML.

[155]  Chul Eddy Chung,et al.  On the evolution of the annual cycle in the tropical Pacific , 2001 .

[156]  Jian Pei,et al.  Ix-cubes: iceberg cubes for data warehousing and olap on xml data , 2007, CIKM '07.

[157]  Jagadish Shukla,et al.  Large scale extreme events in surface temperature during 1950--2003: an observational and modeling study , 2006 .

[158]  Varun Chandola,et al.  Scalable Time Series Change Detection for Biomass Monitoring Using Gaussian Process , 2010, CIDU.

[159]  C. Gomes Computational Sustainability: Computational methods for a sustainable environment, economy, and society , 2009 .

[160]  J. Palutikof,et al.  Vulnerability of the Netherlands and Northwest Europe to Storm Damage under Climate Change , 1999 .

[161]  Jiawei Han,et al.  Topic modeling for OLAP on multidimensional text databases: topic cube and its applications , 2009, Stat. Anal. Data Min..

[162]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[163]  Marti A. Hearst Improving Full-Text Precision on Short Queries using Simple Constraints , 1996 .

[164]  Hillol Kargupta,et al.  Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining , 2001, J. Parallel Distributed Comput..

[165]  Sophie Ancelet,et al.  Modelling spatial zero-inflated continuous data with an exponentially compound Poisson process , 2010, Environmental and Ecological Statistics.

[166]  J. Kučera,et al.  Cumulative Sum Charts - A Novel Technique for Processing Daily Time Series of MODIS Data for Burnt Area Mapping in Portugal , 2007, 2007 International Workshop on the Analysis of Multi-temporal Remote Sensing Images.

[167]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[168]  F. Achard,et al.  Challenges to estimating carbon emissions from tropical deforestation , 2007 .

[169]  D. Freedman,et al.  Some Asymptotic Theory for the Bootstrap , 1981 .

[170]  T. McKee,et al.  THE RELATIONSHIP OF DROUGHT FREQUENCY AND DURATION TO TIME SCALES , 1993 .

[171]  Alexandre d'Aspremont,et al.  Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[172]  James W. Hurrell,et al.  Decadal atmosphere-ocean variations in the Pacific , 1994 .

[173]  Lieven Vandenberghe,et al.  Topology Selection in Graphical Models of Autoregressive Processes , 2010, J. Mach. Learn. Res..

[174]  T. Ouarda,et al.  A Nonstationary Extreme Value Analysis for the Assessment of Changes in Extreme Annual Wind Speed over the Gulf of St. Lawrence, Canada , 2008 .

[175]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[176]  Asu,et al.  Exploration of Large Digital Sky Surveys , 2000, astro-ph/0012489.

[177]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[178]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[179]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[180]  L. Ghaoui,et al.  Sparse PCA: Convex Relaxations, Algorithms and Applications , 2010, 1011.3781.

[181]  Russell T. Graham,et al.  Hayman Fire Case Study , 2003 .

[182]  Andrew L. Rukhin,et al.  Analysis of Time Series Structure SSA and Related Techniques , 2002, Technometrics.

[183]  Wayne C. Palmer,et al.  Keeping Track of Crop Moisture Conditions, Nationwide: The New Crop Moisture Index , 1968 .

[184]  C. Joseph Lu,et al.  Using Degradation Measures to Estimate a Time-to-Failure Distribution , 1993 .

[185]  Bhavani M. Thuraisingham,et al.  A Practical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[186]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[187]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[188]  K. Emanuel Increasing destructiveness of tropical cyclones over the past 30 years , 2005, Nature.

[189]  Silvia Nittel,et al.  Semi-Streaming Quantization for Remote Sensing Data , 2003 .

[190]  H. Ritchie,et al.  Mapping the return periods of extreme sea levels: Allowing for short sea level records, seasonality, and climate change , 2007 .

[191]  François Bavaud Euclidean Distances, Soft and Spectral Clustering on Weighted Graphs , 2010, ECML/PKDD.

[192]  W. Kurz,et al.  Mountain pine beetle and forest carbon feedback to climate change , 2008, Nature.

[193]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[194]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[195]  W. Scott Spangler,et al.  The integration of business intelligence and knowledge management , 2002, IBM Syst. J..

[196]  A. A. Mahabal,et al.  The Catalina Real-Time Transient Survey (CRTS) , 2011, 1102.5004.

[197]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[198]  T. P. Barnett,et al.  Causes of Decadal Climate Variability over the North Pacific and North America , 1994, Science.

[199]  Nicu Sebe,et al.  A new analysis of the value of unlabeled data in semi-supervised learning for image retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[200]  S. Lippman,et al.  The Scripps Institution of Oceanography , 1959, Nature.

[201]  Latifur Khan,et al.  Multi-label ASRS Dataset Classification Using Semi Supervised Subspace Clustering , 2010, CIDU.

[202]  Ashok Srivastava,et al.  Stable and Efficient Gaussian Process Calculations , 2009, J. Mach. Learn. Res..

[203]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[204]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[205]  D. Lindenmayer,et al.  Modelling the abundance of rare species: statistical models for counts with extra zeros , 1996 .

[206]  Amy Braverman Compressing Massive Geophysical Datasets Using Vector Quantization , 2001 .

[207]  Bonnie K. Ray,et al.  Regression Models for Time Series Analysis , 2003, Technometrics.

[208]  M - Estimating Aggregates on a Peer-to-Peer Network , 2003 .

[209]  Pascal Yiou,et al.  Winter 2010 in Europe: A cold extreme in a warming climate , 2010 .

[210]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[211]  Damien Sulla-Menashe,et al.  MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets , 2010 .

[212]  Xinyu Dai,et al.  SBA-term: Sparse Bilingual Association for Terms , 2011, 2011 IEEE Fifth International Conference on Semantic Computing.

[213]  J. A. Cuesta-Albertos,et al.  On lower bounds for theL2-Wasserstein metric in a Hilbert space , 1996 .

[214]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[215]  C. F. Wu JACKKNIFE , BOOTSTRAP AND OTHER RESAMPLING METHODS IN REGRESSION ANALYSIS ' BY , 2008 .

[216]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[217]  Chris H. Q. Ding,et al.  Supernova Recognition Using Support Vector Machines , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[218]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[219]  Kathryn B. Laskey,et al.  Nonparametric Bayesian Clustering Ensembles , 2010, ECML/PKDD.

[220]  Mitra Fouladirad,et al.  Condition-based inspection/replacement policies for non-monotone deteriorating systems with environmental covariates , 2010, Reliab. Eng. Syst. Saf..

[221]  G. Wahba Spline models for observational data , 1990 .

[222]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[223]  Arindam Banerjee,et al.  Discriminative Mixed-Membership Models , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[224]  F. Takens Detecting strange attractors in turbulence , 1981 .

[225]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[226]  Vincent Ng,et al.  Semi-Supervised Cause Identification from Aviation Safety Reports , 2009, ACL.

[227]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[228]  D. Siegmund Boundary Crossing Probabilities and Statistical Applications , 1986 .

[229]  Chanseok Park,et al.  New cumulative damage models for failure using stochastic processes as initial damage , 2005, IEEE Transactions on Reliability.

[230]  Grant Branstator,et al.  Initial-value predictability of prominent modes of North Pacific subsurface temperature in a CGCM , 2011 .

[231]  James P. Hughes,et al.  Statistical downscaling of daily precipitation from observed and modelled atmospheric fields , 2004 .

[232]  L. Pettit,et al.  Bayesian analysis for inverse gaussian lifetime data with measures of degradation , 1999 .

[233]  W. Enke,et al.  Downscaling climate model outputs into local and regional weather elements by classification and regression , 1997 .

[234]  En Sup Yoon,et al.  Analysis of Novelty Detection Properties of Autoassociators , 2001 .

[235]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[236]  Li Wei,et al.  Semi-supervised time series classification , 2006, KDD '06.

[237]  Soumaya Yacout,et al.  Evaluating the Reliability Function and the Mean Residual Life for Equipment With Unobservable States , 2010, IEEE Transactions on Reliability.

[238]  S. Kulkarni,et al.  A LARGE-AREA SURVEY FOR RADIO PULSARS AT HIGH GALACTIC LATITUDES , 2009 .

[239]  H. Storch,et al.  Statistical Analysis in Climate Research , 2000 .

[240]  S. G. Djorgovski,et al.  The Palomar-Quest digital synoptic sky survey , 2007, 0801.3005.

[241]  Pang-Ning Tan,et al.  Semi-supervised learning with data calibration for long-term time series forecasting , 2008, KDD.

[242]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[243]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[244]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[245]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[246]  M. Steinbach,et al.  Clustering Earth Science Data: Goals, Issues and Results , 2001 .

[247]  Matthieu Lengaigne,et al.  Twentieth century ENSO characteristics in the IPCC database , 2008 .

[248]  Jiawei Han,et al.  Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[249]  James P. Crutchfield,et al.  Geometry from a Time Series , 1980 .

[250]  Arindam Banerjee,et al.  Gaussian Process Topic Models , 2010, UAI.

[251]  Hugh P Possingham,et al.  Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. , 2005, Ecology letters.

[252]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[253]  Luke Miratrix,et al.  Summarizing large-scale, multiple-document news data: sparse methods and human validation , 2013 .

[254]  D. Lu,et al.  Change detection techniques , 2004 .

[255]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[256]  Haralabos C. Papadopoulos,et al.  Distributed computation of averages over ad hoc networks , 2005, IEEE Journal on Selected Areas in Communications.

[257]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[258]  David Wai-Lok Cheung,et al.  OLAP on sequence data , 2008, SIGMOD Conference.

[259]  D. Bertsekas,et al.  An Alternating Direction Method for Linear Programming , 1990 .

[260]  Mikhail Belkin,et al.  Towards a theoretical foundation for Laplacian-based manifold methods , 2005, J. Comput. Syst. Sci..

[261]  Frank Schilder,et al.  FastSum: Fast and Accurate Query-based Multi-document Summarization , 2008, ACL.

[262]  Ming-Hsuan Yang,et al.  Incremental Learning for Visual Tracking , 2004, NIPS.

[263]  B. Shafer,et al.  Development of a surface water supply index (SWSI) to assess the severity of drought conditions in snowpack runoff areas , 1982 .

[264]  Ping-Feng Pai,et al.  Highway traffic forecasting by support vector regression model with tabu search algorithms , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[265]  Jennifer N. Hird,et al.  Noise reduction of NDVI time series: An empirical comparison of selected techniques , 2009 .

[266]  Robin T. Clarke,et al.  Estimating trends in data from the Weibull and a generalized extreme value distribution , 2002 .

[267]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[268]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[269]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[270]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[271]  D. Roy,et al.  Burned area mapping using multi-temporal moderate spatial resolution data—a bi-directional reflectance model-based expectation approach , 2002 .

[272]  Shivam Tripathi,et al.  On the identification of intra-seasonal changes in the Indian summer monsoon , 2009, SensorKDD '09.

[273]  Fabio Gagliardi Cozman,et al.  Unlabeled Data Can Degrade Classification Performance of Generative Classifiers , 2002, FLAIRS.

[274]  K. Doksum,et al.  Models for Variable-Stress Accelerated Life Testing Experiments Based on Wiener Processes and the Inverse Gaussian Distribution , 1992 .

[275]  G. Sugihara,et al.  Generalized Theorems for Nonlinear State Space Reconstruction , 2011, PloS one.

[276]  Vipin Kumar,et al.  Land cover change detection: a case study , 2008, KDD.

[277]  Vipin Kumar,et al.  Gopher: Global observation of Planetary Health and Ecosystem Resources , 2011, 2011 IEEE International Geoscience and Remote Sensing Symposium.

[278]  T. A. Harris,et al.  Rolling Bearing Analysis , 1967 .

[279]  Alexander Hinneburg,et al.  DENCLUE 2.0: Fast Clustering Based on Kernel Density Estimation , 2007, IDA.

[280]  Leonhard Hennig,et al.  Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis , 2009, RANLP.

[281]  Zhi-Hua Zhou,et al.  On Detecting Clustered Anomalies Using SCiForest , 2010, ECML/PKDD.

[282]  W. D. Reynard,et al.  The aviation safety reporting system , 1984 .

[283]  Latifur Khan,et al.  Multi-concept Document Classification Using a Perceptron-Like Algorithm , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[284]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[285]  Heiko Hoffmann,et al.  Kernel PCA for novelty detection , 2007, Pattern Recognit..

[286]  R. Katz,et al.  Teleconnections linking worldwide climate anomalies : scientific basis and societal impact , 1991 .

[287]  Nitesh V. Chawla,et al.  An exploration of climate data using complex networks , 2009, SensorKDD '09.

[288]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[289]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[290]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[291]  M. Rajeevan,et al.  Analysis of variability and trends of extreme rainfall events over India using 104 years of gridded daily rainfall data , 2008 .

[292]  H. Zackor,et al.  Prediction of congestion due to road works on freeways , 2001, ITSC 2001. 2001 IEEE Intelligent Transportation Systems. Proceedings (Cat. No.01TH8585).

[293]  L. L. Lai,et al.  An initial study on computational intelligence for smart grid , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[294]  James Ze Wang,et al.  Real-Time Computerized Annotation of Pictures , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[295]  Stephan R. Sain,et al.  Downscaling extremes: A comparison of extreme value distributions in point-source and gridded precipitation data , 2010, 1010.1604.

[296]  Nada Golmie,et al.  NIST Framework and Roadmap for Smart Grid Interoperability Standards, Release 2.0 , 2012 .

[297]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[298]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[299]  Evan M. Manning,et al.  Massive Dataset Analysis for NASA’s Atmospheric Infrared Sounder , 2012, Technometrics.

[300]  Mark Greaves,et al.  Visualizing text data sets , 1999, Comput. Sci. Eng..

[301]  Y. Lechevallier,et al.  Dynamic clustering of histograms using Wasserstein metric , 2006 .

[302]  Laurent El Ghaoui,et al.  Safe Feature Elimination for the LASSO and Sparse Supervised Learning Problems , 2010, 1009.4219.

[303]  Rong Li,et al.  Residual-life distributions from component degradation signals: A Bayesian approach , 2005 .

[304]  Chi-Chung Lam,et al.  FINANCIAL TIME SERIES FORECASTING BY NEURAL NETWORK USING CONJUGATE GRADIENT LEARNING ALGORITHM AND MULTIPLE LINEAR REGRESSION WEIGHT INITIALIZATION , 2000 .

[305]  Mehryar Mohri,et al.  On Transductive Regression , 2006, NIPS.

[306]  Michael K. Tippett,et al.  Predictability: Recent insights from information theory , 2007 .

[307]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[308]  Dan Hammer,et al.  Forma: Forest Monitoring for Action - Rapid Identification of Pan-Tropical Deforestation Using Moderate-Resolution Remotely Sensed Data , 2009 .

[309]  Yi Zhang,et al.  Entropy-based subspace clustering for mining numerical data , 1999, KDD '99.

[310]  Eric Hoffman,et al.  Airborne Spacing in the Terminal Area: A Study of Non-Nominal Situations , 2006 .

[311]  Soumya Ghosh,et al.  Automatic Annotation of Planetary Surfaces With Geomorphic Labels , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[312]  Potsdam,et al.  Complex networks in climate dynamics. Comparing linear and nonlinear network construction methods , 2009, 0907.4359.

[313]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[314]  Rob J Hyndman,et al.  Phenological change detection while accounting for abrupt and gradual trends in satellite image time series , 2010 .

[315]  Jing Pan,et al.  Prognostic Degradation Models for Computing and Updating Residual Life Distributions in a Time-Varying Environment , 2008, IEEE Transactions on Reliability.

[316]  Michael Elad,et al.  Learning Multiscale Sparse Representations for Image and Video Restoration , 2007, Multiscale Model. Simul..

[317]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[318]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[319]  John C. Stutz,et al.  Classification of Aeronautics System Health and Safety Documents , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[320]  P. Webster,et al.  Changes in Tropical Cyclone Number, Duration, and Intensity in a Warming Environment , 2005, Science.

[321]  Lars Eklundh,et al.  Mapping insect defoliation in Scots pine with MODIS time-series data , 2009 .

[322]  P. Friederichs,et al.  Statistical Downscaling of Extreme Precipitation Events Using Censored Quantile Regression , 2007 .

[323]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[324]  Michael K. Ng,et al.  HARP: a practical projected clustering algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[325]  Dustin G. Mixon,et al.  Availability of periodically inspected systems with Markovian wear and shocks , 2006, Journal of Applied Probability.

[326]  Zhi-Hua Zhou,et al.  Semi-Supervised Regression with Co-Training , 2005, IJCAI.

[327]  P. Whetton,et al.  Guidelines for Use of Climate Scenarios Developed from Statistical Downscaling Methods , 2004 .

[328]  Rob J Hyndman,et al.  Detecting trend and seasonal changes in satellite image time series , 2010 .

[329]  Vipin Kumar,et al.  Monitoring global forest cover using data mining , 2011, TIST.

[330]  Bo Zhao,et al.  TopCells: Keyword-based search of top-k aggregated documents in text cube , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[331]  Gregory P. Asner,et al.  Selective logging changes forest phenology in the Brazilian Amazon: Evidence from MODIS image time series analysis , 2009 .

[332]  Badrinath Roysam,et al.  Image change detection algorithms: a systematic survey , 2005, IEEE Transactions on Image Processing.

[333]  P. Webster,et al.  The horizontal and vertical structure of east Asian winter monsoon pressure surges , 1999 .

[334]  Ravi S. Nanjundiah,et al.  Monsoon prediction : Why yet another failure? , 2005 .

[335]  Mojib Latif,et al.  Decadal climate variability over the North Pacific and North America: Dynamics and predictability , 1996 .

[336]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[337]  Wendell R. Ricks,et al.  Cognitive models of pilot categorization and prioritization of flight-deck information , 1995 .

[338]  Li Wei,et al.  Assumption-Free Anomaly Detection in Time Series , 2005, SSDBM.

[339]  R. Heim A Review of Twentieth-Century Drought Indices Used in the United States , 2002 .

[340]  Giles M. Foody,et al.  Status of land cover classification accuracy assessment , 2002 .

[341]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[342]  E. Glikman,et al.  Some Pattern Recognition Challenges in Data-Intensive Astronomy , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[343]  Xiaojin Zhu,et al.  Kernel Regression with Order Preferences , 2007, AAAI.

[344]  A. Gaye,et al.  A cyclogenesis index for tropical Atlantic off the African coasts , 2006 .

[345]  Hichem Frigui,et al.  Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[346]  Eyke Hüllermeier,et al.  Regret Analysis for Performance Metrics in Multi-Label Classification: The Case of Hamming and Subset Zero-One Loss , 2010, ECML/PKDD.

[347]  T. Joseph W. Lazio,et al.  The dynamic radio sky , 2004 .

[348]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[349]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[350]  Yogesh Maan,et al.  Software Data-Processing Pipeline for Transient Detection , 2009 .

[351]  Nadine Aubry,et al.  Spatiotemporal analysis of complex signals: Theory and applications , 1991 .

[352]  Andrew J. Majda,et al.  Quantifying the predictive skill in long-range forecasting. Part II: Model error in coarse-grained Markov models with application to ocean-circulation regimes , 2012 .

[353]  R. Edwards,et al.  The Swinburne intermediate-latitude pulsar survey , 2001, astro-ph/0105126.

[354]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[355]  C. Mallows A Note on Asymptotic Joint Normality , 1972 .

[356]  W. M. Gray,et al.  The Recent Increase in Atlantic Hurricane Activity: Causes and Implications , 2001, Science.

[357]  Saso Dzeroski,et al.  Clustering Trees with Instance Level Constraints , 2007, ECML.

[358]  A. Sinko,et al.  Zero-Inflated Poisson and Zero-Inflated Negative Binomial Models Using the COUNTREG Procedure , 2008 .