Clustering Multi-relationnal TV Data by Diverting Supervised ILP

Traditionally, clustering operates on data described by a fixed number of (usually numerical) features; this description schema is said propositional or attribute-value. Yet, when the data cannot be described in that way, usual data-mining or clustering algorithms are no longer suitable. In this paper, we consider the problem of discovering similar types of programs in TV streams. The TV data have two important characteristics: 1) they are multi-relational, that is to say with multiple relationships between features; 2) they require background knowledge external to their interpretation. To process the data, we use Inductive Logic Programming (ILP) [9]. In this paper, we show how to divert ILP to work unsupervised in this context: from artificial learning problems, we induce a notion of similarity between broadcasts, which is later used to perform the clustering. Experiments presented show the soundness of the approach, and thus open up many research avenues.

[1]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[2]  D. Steinley Journal of Classification , 2004, Vegetatio.

[3]  Sid-Ahmed Berrani,et al.  Automatic TV Broadcast Structuring , 2010, Int. J. Digit. Multim. Broadcast..

[4]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[5]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[6]  Marek R. Ogiela,et al.  Multimedia tools and applications , 2005, Multimedia Tools and Applications.

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[9]  Saěso Dězeroski Relational Data Mining , 2001, Encyclopedia of Machine Learning and Data Mining.

[10]  Zein Al Abidin Ibrahim,et al.  TV Stream Structuring , 2011 .

[11]  Ecole Doctorale THSE DE DOCTORAT , 2011 .

[12]  Tim Morris BSc Multimedia Systems , 2000, Applied Computing.

[13]  Vincent Claveau,et al.  Découverte de connaissances dans les séquences par CRF non-supervisés , 2013 .

[14]  Katsumi Inoue,et al.  ILP turns 20 - Biography and future challenges , 2012, Mach. Learn..

[15]  R. Maitra,et al.  Supplement to “ A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere ” published in the Journal of Computational and Graphical Statistics , 2009 .

[16]  Jean-Philippe Poli,et al.  An automatic television stream structuring system for television archives holders , 2008, Multimedia Systems.

[17]  Luc De Raedt,et al.  ILP turns 20 , 2011, Machine Learning.

[18]  M. Cugmas,et al.  On comparing partitions , 2015 .

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Patrick Gros,et al.  Detecting repeats for video structuring , 2007, Multimedia Tools and Applications.

[21]  S. Dongen Graph clustering by flow simulation , 2000 .