Natural Language Search Interfaces: Health Data Needs Single-Field Variable Search

Background Data discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research. Objective The cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive. Methods Two user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface. Results Using a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F 1,19=37.3, P<.001), with a main effect of task (F 3,57=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F 1,19=18.0, P<.001). There was also a main effect of task (F 2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement. Conclusions The results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation.

[1]  Hui Zhang,et al.  Faceted search for heterogeneous digital collections , 2012, JCDL '12.

[2]  Stephen E. Robertson,et al.  On the history of evaluation in IR , 2008, J. Inf. Sci..

[3]  Robert G. Capra,et al.  Influence of training and stage of search on gaze behavior in a library catalog faceted search interface , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Eric Novotny I Don’t Think I Click: A Protocol Analysis Study of Use of a Library Online Catalog in the Internet Age , 2004 .

[5]  Weimao Ke,et al.  Interactive search result clustering: a study of user behavior and retrieval effectiveness , 2013, JCDL '13.

[6]  Richard Pak,et al.  Age-Sensitive Design of Online Health Information: Comparative Usability Study , 2009, Journal of medical Internet research.

[7]  S. Czaja,et al.  Examining age differences in performance of a complex information search and retrieval task. , 2001, Psychology and aging.

[8]  S. Fienberg,et al.  Sharing research data , 1985 .

[9]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[10]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[11]  Marta Betoldi In Labour , 2003 .

[12]  Omar Boussaïd,et al.  Integrating Query Context and User Context in an Information Retrieval Model Based on Expanded Language Modeling , 2012, CD-ARES.

[13]  Jonathan P. Tennant,et al.  Open Research Glossary , 2015 .

[14]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[15]  Yan Zhang Undergraduate students' mental models of the Web as an information retrieval system , 2008, J. Assoc. Inf. Sci. Technol..

[16]  W. Bruce Croft,et al.  The History of Information Retrieval Research , 2012, Proceedings of the IEEE.

[17]  Weimao Ke,et al.  Studying scatter/gather browsing for web search , 2012, ASIST.

[18]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[19]  Damon Horowitz,et al.  The anatomy of a large-scale social search engine , 2010, WWW '10.

[20]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[21]  Marius Veseth,et al.  Negotiating the coresearcher mandate – service users’ experiences of doing collaborative research on mental health , 2012, Disability and rehabilitation.

[22]  Qinghua Zheng,et al.  A Survey of Faceted Search , 2013, J. Web Eng..

[23]  Krishna P. Gummadi,et al.  Exploiting Social Networks for Internet Search , 2006, HotNets.

[24]  Barrie Gunter,et al.  The Google generation: the information behaviour of the researcher of the future , 2008, Aslib Proc..

[25]  Karen Markey Twenty-five years of end-user searching, Part 1: Research findings , 2007 .

[26]  Barry Smyth,et al.  Exploiting Query Repetition and Regularity in an Adaptive Community-Based Web Search Engine , 2004, User Modeling and User-Adapted Interaction.

[27]  L. Faulkner Beyond the five-user assumption: Benefits of increased sample sizes in usability testing , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[28]  Hannah Spring Health professionals of the future: teaching information skills to the Google generation. , 2010, Health information and libraries journal.

[29]  J. Freese,et al.  SECONDARY ANALYSIS OF LARGE SOCIAL SURVEYS , 2007 .

[30]  Tiziana Catarci,et al.  Human-Computer Interaction View on Information Retrieval Evaluation , 2012, PROMISE Winter School.

[31]  Ed H. Chi,et al.  Information Seeking Can Be Social , 2009, Computer.

[32]  M. Amparo Vila,et al.  MTCIR: A multi-term tag cloud information retrieval system , 2013, Expert Syst. Appl..

[33]  Gary Marchionini Toward human‐computer information retrieval , 2007 .

[34]  Margaret Law,et al.  Reduce, Reuse, Recycle: Issues in the Secondary Use of Research Data , 2006 .

[35]  Eszter Hargittai Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have , 2009 .