Cleaning Your Wrong Google Scholar Entries

Entity categorization – the process of grouping entities into categories for some specific purpose – is an important problem with a great many applications, such as Google Scholar and Amazon products. Unfortunately, many real-world categories contain mis-categorized entities, such as publications in one’s Google Scholar page that are published by the others. We have proposed a general framework for a new research problem – discovering mis-categorized entities. In this demonstration, we have developed a Google Chrome extension, namely GSCleaner, as one important application of our studied problem. The attendees will have the opportunity to experience the following features: (1) mis-categorized entity discovery – The attendee can check mis-categorized entities on anyone’s Google Scholar page; and (2) Cleaning onsite – Any attendee can login and clean his Google Scholar page using GSCleaner.We describe our novel rule-based framework to discover mis-categorized entities. We also propose effective optimization techniques to apply the rules. Some empirical results show the effectiveness of GSCleaner on discovering mis-categorized entities.

[1]  Wen-Syan Li,et al.  String Similarity Joins: An Experimental Evaluation , 2014, Proc. VLDB Endow..

[2]  Guoliang Li,et al.  Discovering Mis-Categorized Entities , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[3]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[4]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[5]  Yang Li,et al.  Mining evidences for named entity disambiguation , 2013, KDD.

[6]  Madian Khabsa,et al.  Digital commons , 2020, Internet Policy Rev..

[7]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[8]  Guoliang Li,et al.  PASS-JOIN: A Partition-based Method for Similarity Joins , 2011, Proc. VLDB Endow..

[9]  Guoliang Li,et al.  K-Join: Knowledge-Aware Similarity Join , 2016, IEEE Transactions on Knowledge and Data Engineering.