Whither Social Networks for Web Search?

Access to diverse perspectives nurtures an informed citizenry. Google and Bing have emerged as the duopoly that largely arbitrates which English language documents are seen by web searchers. A recent study shows that there is now a large overlap in the top organic search results produced by them. Thus, citizens may no longer be able to gain different perspectives by using different search engines. We present the results of our empirical study that indicates that by mining Twitter data one can obtain search results that are quite distinct from those produced by Google and Bing. Additionally, our user study found that these results were quite informative. The gauntlet is now on search engines to test whether our findings hold in their infrastructure for different social networks and whether enabling diversity has sufficient business imperative for them.

[1]  Balachander Krishnamurthy,et al.  Measuring personalization of web search , 2013, WWW.

[2]  Ko Fujimura,et al.  Improving tweet stream classification by detecting changes in word probability , 2012, SIGIR '12.

[3]  Mari Carmen Marcos,et al.  Comportamiento de los usuarios en la página de resultados de los buscadores. Un estudio basado en eye tracking , 2010 .

[4]  Daniel M. Romero,et al.  Influence and passivity in social media , 2010, ECML/PKDD.

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Rizal Setya Perdana What is Twitter , 2013 .

[7]  Fausto Giunchiglia,et al.  On the Interdisciplinary Foundations of Diversity , 2009, LivingWeb@ISWC.

[8]  Rakesh Agrawal,et al.  A Study of Distinctiveness in Web Results of Two Search Engines , 2015, WWW.

[9]  David Hawking,et al.  New-web search with microblog annotations , 2010, WWW '10.

[10]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[11]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[12]  William M. Webberley Inferring interestingness in online social networks , 2014 .

[13]  Fausto Giunchiglia,et al.  5 Conclusions and Future Work , 1997 .

[14]  King-Lup Liu,et al.  Building efficient and effective metasearch engines , 2002, CSUR.

[15]  Jung-Tae Lee,et al.  Finding interesting posts in Twitter based on retweet graph analysis , 2012, SIGIR '12.

[16]  W. Bruce Croft,et al.  User oriented tweet ranking: a filtering approach to microblogs , 2011, CIKM '11.

[17]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[18]  Oren Etzioni,et al.  Multi-Engine Search and Comparison Using the MetaCrawler , 1995, World Wide Web J..

[19]  Ryen W. White,et al.  Characterizing and predicting search engine switching behavior , 2009, CIKM.

[20]  Jiang-Ming Yang,et al.  A pattern tree-based approach to learning URL normalization rules , 2010, WWW '10.

[21]  Geert-Jan Houben,et al.  Twinder: A Search Engine for Twitter Streams , 2012, ICWE.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Amanda Spink,et al.  A study of results overlap and uniqueness among major Web search engines , 2006, Inf. Process. Manag..

[24]  William H. DuBay The Principles of Readability. , 2004 .

[25]  Hao-hua Chu,et al.  Search En-gines for the World Wide Web: A Compara-tive Study and Evaluation Methodology , 1996 .

[26]  Tamara G. Kolda,et al.  MATLAB Tensor Toolbox , 2006 .

[27]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[28]  Igor Santos,et al.  Twitter Content-Based Spam Filtering , 2013, SOCO-CISIS-ICEUTE.

[29]  Rand Fishkin,et al.  The Art of SEO , 2012 .

[30]  Juan Martínez-Romo,et al.  Detecting malicious tweets in trending topics using a statistical analysis of language , 2013, Expert Syst. Appl..

[31]  Christos Faloutsos,et al.  GigaTensor: scaling tensor analysis up by 100 times - algorithms and discoveries , 2012, KDD.

[32]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[33]  Guijun Wang,et al.  Information fusion with ProFusion , 1996, WebNet.

[34]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[35]  George R. Klare,et al.  Know your reader : the scientific approach to readability , 1954 .

[36]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[37]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track | NIST , 2011 .

[38]  Antonio Gulli,et al.  Building an open source meta-search engine , 2005, WWW '05.

[39]  Sang Ho Lee,et al.  On URL Normalization , 2005, ICCSA.

[40]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[41]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track , 2011, TREC.

[42]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[43]  Idit Keidar,et al.  Do not crawl in the dust: different urls with similar text , 2006, WWW '07.

[44]  Omar Alonso,et al.  Detecting Uninteresting Content in Text Streams , 2010 .

[45]  Sotiris Ioannidis,et al.  we.b: the web of short urls , 2011, WWW.

[46]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[47]  Sungyoung Lee,et al.  Precise tweet classification and sentiment analysis , 2013, 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).

[48]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[49]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[50]  日経BP社,et al.  Amazon Web Services完全ソリューションガイド , 2016 .

[51]  Gary Marchionini,et al.  A Comparative Study of Web Search Service Performance , 1996 .

[52]  Michalis Faloutsos,et al.  Efficient and Scalable Socware Detection in Online Social Networks , 2012, USENIX Security Symposium.

[53]  Oren Etzioni,et al.  Multi-Service Search and Comparison Using the MetaCrawler , 1995 .

[54]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[55]  Keishi Tajima,et al.  Tweet classification based on their lifetime duration , 2012, CIKM.

[56]  Christopher D. Brown,et al.  Receiver operating characteristics curves and related decision measures: A tutorial , 2006 .

[57]  Hae-Chang Rim,et al.  Identifying interesting Twitter contents using topical analysis , 2014, Expert Syst. Appl..

[58]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[59]  Alexandros Ntoulas,et al.  Estimating the Quality of Postings in the Real-time Web , 2010 .

[60]  Edward Cutrell,et al.  An eye tracking study of the effect of target rank on web search , 2007, CHI.

[61]  Weiyi Meng Search Engine , 2014, Encyclopedia of Social Network Analysis and Mining.

[62]  Giles,et al.  Searching the world wide Web , 1998, Science.

[63]  Ashley Muddiman,et al.  Exposure to News and Diverse Views in the Internet Age , 2013 .