Multidimensional Mining of Massive Text Data

Abstract Unstructured text, as one of the most important data forms, plays a crucial role in data-driven decision making in domains ranging from social networking and information retrieval to scien...

[1]  Boris Motik,et al.  Exploiting Partial Information in Taxonomy Construction , 2009, Description Logics.

[2]  Haixun Wang,et al.  Understand Short Texts by Harvesting and Analyzing Semantic Knowledge , 2017, IEEE Transactions on Knowledge and Data Engineering.

[3]  E. Gutiérrez-Peña,et al.  A Bayesian Analysis of Directional Data Using the von Mises–Fisher Distribution , 2005 .

[4]  Elizabeth León Guzman,et al.  Multidimensional analysis model for a document warehouse that includes textual measures , 2015, Decis. Support Syst..

[5]  Hui Wu,et al.  Semi-Supervised Recursive Autoencoders for Social Review Spam Detection , 2016, 2016 12th International Conference on Computational Intelligence and Security (CIS).

[6]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[7]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[8]  Jun Rao,et al.  Dynamic faceted search for discovery-driven analysis , 2008, CIKM '08.

[9]  William W. Cohen,et al.  Iterative Set Expansion of Named Entities Using the Web , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Shaowen Wang,et al.  Regions, Periods, Activities: Uncovering Urban Dynamics via Cross-Modal Representation Learning , 2017, WWW.

[11]  Marti A. Hearst Clustering versus faceted categories for information exploration , 2006, Commun. ACM.

[12]  Luming Zhang,et al.  ReAct: Online Multimodal Embedding for Recency-Aware Spatiotemporal Activity Modeling , 2017, SIGIR.

[13]  Bo Zhao,et al.  TopCells: Keyword-based search of top-k aggregated documents in text cube , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[14]  Jianxin Li,et al.  Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN , 2018, WWW.

[15]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[16]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[17]  Jimeng Sun,et al.  Automatic identification of heart failure diagnostic criteria, using text analysis of clinical notes from electronic health records , 2014, Int. J. Medical Informatics.

[18]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[19]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Eugene J. Shekita,et al.  Beyond basic faceted search , 2008, WSDM '08.

[21]  Sergej Sizov,et al.  GeoFolk: latent spatial semantics in web 2.0 social media , 2010, WSDM '10.

[22]  Pernille Warrer,et al.  Using text-mining techniques in electronic patient records to identify ADRs from medicine use. , 2012, British Journal of Clinical Pharmacology.

[23]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[24]  Jiawei Han,et al.  Mining Quality Phrases from Massive Text Corpora , 2015, SIGMOD Conference.

[25]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[26]  Jiawei Han,et al.  Content coverage maximization on word networks for hierarchical topic summarization , 2013, CIKM.

[27]  Eric Horvitz,et al.  Eyewitness: identifying local events via space-time signals in twitter feeds , 2015, SIGSPATIAL/GIS.

[28]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[29]  Zhe Chen,et al.  EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion , 2016, WSDM.

[30]  Jiawei Han,et al.  Weakly-Supervised Neural Text Classification , 2018, CIKM.

[31]  Bo Zhao,et al.  TEXplorer: keyword-based object search and exploration in multidimensional text databases , 2011, CIKM '11.

[32]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[33]  Liyuan Liu,et al.  TrioVecEvent: Embedding-Based Online Local Event Detection in Geo-Tagged Tweet Streams , 2017, KDD.

[34]  Wei Zhang,et al.  STREAMCUBE: Hierarchical spatio-temporal hashtag clustering for event exploration over the Twitter stream , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[35]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[36]  Manja Marz,et al.  An encoding of genome content for machine learning , 2019 .

[37]  Bo Zhao,et al.  Text Cube: Computing IR Measures for Multidimensional Text Database Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[38]  R. Fisher Dispersion on a sphere , 1953, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[39]  Jiawei Han,et al.  SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble , 2017, ECML/PKDD.

[40]  Yinan Zhang,et al.  A phrase mining framework for recursive construction of a topical hierarchy , 2013, KDD.

[41]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[42]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR Forum.

[43]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[44]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[45]  Bo Zhang,et al.  StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.

[46]  Chong Wang,et al.  Mining geographic knowledge using location aware topic model , 2007, GIR '07.

[47]  Ling Chen,et al.  Event detection from flickr data through wavelet-based spatial analysis , 2009, CIKM.

[48]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[49]  Heng Ji,et al.  EventCube: multi-dimensional search and mining of structured and text data , 2013, KDD.

[50]  Zhe Chen,et al.  Long-tail Vocabulary Dictionary Extraction from the Web , 2016, WSDM.

[51]  Junjie Yao,et al.  Evolutionary Taxonomy Construction from Dynamic Tag Space , 2010, WISE.

[52]  Shaowen Wang,et al.  GeoBurst+ , 2018, ACM Trans. Intell. Syst. Technol..

[53]  Vanja Josifovski,et al.  Learning to Extract Local Events from the Web , 2015, SIGIR.

[54]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[55]  Michelangelo Ceci,et al.  Classifying web documents in a hierarchy of categories: a comprehensive study , 2007, Journal of Intelligent Information Systems.

[56]  Yu Zhang,et al.  Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning , 2017, WWW.

[57]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[58]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[59]  Manja Marz,et al.  Distributed representations of protein domains and genomes and their compositionality , 2019, bioRxiv.

[60]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[61]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[62]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[63]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[64]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[65]  Torben Bach Pedersen,et al.  Contextualizing data warehouses with documents , 2008, Decis. Support Syst..

[66]  Gerhard Weikum,et al.  Interesting-phrase mining for ad-hoc text analytics , 2010, Proc. VLDB Endow..

[67]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[68]  Olivier Teste,et al.  Top_Keyword: An Aggregation Function for Textual Document OLAP , 2008, DaWaK.

[69]  Michael Gertz,et al.  EvenTweet: Online Localized Event Detection from Twitter , 2013, Proc. VLDB Endow..

[70]  Mauricio Quezada,et al.  Location-Aware Model for News Events in Social Media , 2015, SIGIR.

[71]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[72]  Luming Zhang,et al.  GMove: Group-Level Mobility Modeling Using Geo-Tagged Social Media , 2016, KDD.

[73]  Clare R. Voss,et al.  Scalable Topical Phrase Mining from Text Corpora , 2014, Proc. VLDB Endow..

[74]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[75]  Shiguang Wang,et al.  Joint Localization of Events and Sources in Social Networks , 2015, 2015 International Conference on Distributed Computing in Sensor Systems.

[76]  Alexander J. Smola,et al.  Taxonomy discovery for personalized recommendation , 2014, WSDM.

[77]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[78]  Grace Hui Yang,et al.  A Metric-based Framework for Automatic Taxonomy Induction , 2009, ACL.

[79]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[80]  Yeye He,et al.  SEISA: set expansion by iterative similarity aggregation , 2011, WWW.

[81]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[82]  Jiawei Han,et al.  MetaPAD: Meta Pattern Discovery from Massive Text Corpora , 2017, KDD.

[83]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[84]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[85]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[86]  Jian Xing,et al.  Effective Document Labeling with Very Few Seed Words: A Topic Model Approach , 2016, CIKM.

[87]  Jiawei Han,et al.  Multi-Dimensional, Phrase-Based Summarization in Text Cubes , 2016, IEEE Data Eng. Bull..

[88]  Berthold Reinwald,et al.  Multidimensional content eXploration , 2008, Proc. VLDB Endow..

[89]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[90]  Steffen Staab,et al.  Detecting non-gaussian geographical topics in tagged photo collections , 2014, WSDM.

[91]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[92]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[93]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[94]  William W. Cohen,et al.  Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[95]  Koichi Takeda,et al.  A method for online analytical processing of text data , 2007, CIKM '07.

[96]  Anthony K. H. Tung,et al.  Trendspedia: An Internet observatory for analyzing and visualizing the evolving web , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[97]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[98]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[99]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[100]  Weixiang Shao,et al.  Bimodal Distribution and Co-Bursting in Review Spam Detection , 2017, WWW.

[101]  Yue Lu,et al.  Opinion integration through semi-supervised topic modeling , 2008, WWW.

[102]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[103]  Brian M. Sadler,et al.  HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion , 2018, KDD.