Leveraging Elasticsearch to Improve Data Discoverability in Science Gateways

Data discoverability is a challenge in science gateway architectures. As the volume of data managed and shared through a science gateway grows, it is imperative to expose a search functionality which enables users to quickly navigate to files within their own data sets as well as to identify relevant files in shared or public data sets. Desirable qualities in a file search feature include scalability to arbitrary data sizes, rapid and responsive indexing triggered by user activity, and easy maintainability by development teams without specialist knowledge of search algorithms. We describe a search architecture built around Elasticsearch that meets each of these criteria, and which has been successfully implemented at the Texas Advanced Computing Center to enhance data discoverability in several science gateway projects.