Mining competitor relationships from online news: A network-based approach

Identifying competitors is important for businesses. We present an approach that uses graph-theoretic measures and machine learning techniques to infer competitor relationships on the basis of structure of an intercompany network derived from company citations (cooccurrence) in online news articles. We also estimate to what extent our approach complements the commercial company profile data sources, such as Hoover's and Mergent.

[1]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[2]  Mitsuru Ishizuka,et al.  Generating Useful Network-based Features for Analyzing Social Networks , 2008, AAAI.

[3]  Jyun-Cheng Wang,et al.  Recommending trusted online auction sellers using social network analysis , 2008, Expert Syst. Appl..

[4]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[5]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[6]  B. Kogut,et al.  Social Capital, Structural Holes and the Formation of an Industry Network , 1997 .

[7]  Rui Li,et al.  Competitor Mining with the Web , 2008, IEEE Transactions on Knowledge and Data Engineering.

[8]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[9]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[10]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[11]  Olivia R. Liu Sheng,et al.  A Network-based Approach to Mining Competitor Relationships from Online News , 2009, ICIS.

[12]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[13]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[14]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[15]  Abraham Bernstein,et al.  The Relational Vector-Space Model , 2003 .

[16]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[17]  Salvatore J. Stolfo,et al.  Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection , 1998, KDD.

[18]  E. Cren A NOTE ON THE HISTORY OF MARK-RECAPTURE POPULATION ESTIMATES , 1965 .

[19]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[20]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[21]  Olivia R. Liu Sheng,et al.  Discovering company revenue relations from news: A network approach , 2009, Decis. Support Syst..

[22]  Mark S. Granovetter Economic Action and Social Structure: The Problem of Embeddedness , 1985, American Journal of Sociology.

[23]  Luca Becchetti,et al.  Link analysis for Web spam detection , 2008, TWEB.

[24]  H. White,et al.  STRUCTURAL EQUIVALENCE OF INDIVIDUALS IN SOCIAL NETWORKS , 1977 .

[25]  Giles,et al.  Searching the world wide Web , 1998, Science.

[26]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[27]  R. Gulati,et al.  Where Do Interorganizational Networks Come From?1 , 1999, American Journal of Sociology.

[28]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[29]  Abraham Bernstein,et al.  Discovering Knowledge from Relational Data Extracted from Business News , 2002 .

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[31]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[32]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[33]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[34]  Ian Witten,et al.  Data Mining , 2000 .

[35]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[36]  Graham Cormode,et al.  Applying link-based classification to label blogs , 2007, WebKDD/SNA-KDD '07.

[37]  Tad Hogg,et al.  Inferring preference correlations from social networks , 2010, Electron. Commer. Res. Appl..

[38]  B. Uzzi,et al.  Embeddedness in the Making of Financial Capital: How Social Relations and Networks Benefit Firms Seeking Financing , 1999, The New Economic Sociology.

[39]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[40]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[41]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..