Knowledge Discovery in Large Data Sets

In this work we briefly address the problem of unsupervised classification on large datasets, magnitude around 100,000,000 objects. The objects are variable objects, which are around 10% of the 1,000,000,000 astronomical objects that will be collected by GAIA/ESA mission. We tested unsupervised classification algorithms on known datasets such as OGLE and Hipparcos catalogs. Moreover, we are building several templates to represent the main classes of variable objects as well as new classes to build a synthetic dataset of this dimension. In the future we will run the GAIA satellite scanning law on these templates to obtain a testable large dataset.