Extraction de commentaires utilisateurs sur le Web

Dans cet article, nous presentons CommentsMiner, une solution d'ex-traction non supervisee pour l'extraction de commentaires utilisateurs. Notre approche se base sur une combinaison de techniques de fouille de sous-arbres frequents, d'extraction de donnees et d'apprentissage de classement. Nos experi-mentations montrent que CommentsMiner permet de resoudre le probleme d'ex-traction de commentaires sur 84% d'un jeu de donnees representatif et publique-ment accessible, loin devant les techniques existantes d'extraction.

[1]  Wolfgang Nejdl,et al.  A densitometric approach to web page segmentation , 2008, CIKM '08.

[2]  Rafael Corchuelo,et al.  A Survey on Region Extractors from Web Documents , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Fernando Berzal Galiano,et al.  Frequent tree pattern mining: A survey , 2010, Intell. Data Anal..

[4]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[5]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[6]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[7]  Jing Liu,et al.  Automatic extraction of web data records containing user-generated content , 2010, CIKM.

[8]  Louise E. Moser,et al.  Extracting data records from the web using tag path clustering , 2009, WWW '09.

[9]  Robert L. Grossman,et al.  Mining data records in Web pages , 2003, KDD '03.

[10]  Marie-Francine Moens,et al.  The downside of markup: examining the harmful effects of CSS and javascript on indexing today's web , 2012, CIKM '12.

[11]  Bing Liu,et al.  Web data extraction based on partial tree alignment , 2005, WWW '05.

[12]  Yun Chi,et al.  Mining closed and maximal frequent subtrees from databases of labeled rooted trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Yun Chi,et al.  Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees , 2005, IEEE Trans. Knowl. Data Eng..

[14]  Lidong Bing,et al.  Towards a unified solution: data record region detection and segmentation , 2011, CIKM '11.