NBSearch: Semantic Search and Visual Exploration of Computational Notebooks

Code search is an important and frequent activity for developers using computational notebooks (e.g., Jupyter). The flexibility of notebooks brings challenges for effective code search, where classic search interfaces for traditional software code may be limited. In this paper, we propose, NBSearch, a novel system that supports semantic code search in notebook collections and interactive visual exploration of search results. NBSearch leverages advanced machine learning models to enable natural language search queries and intuitive visualizations to present complicated intra- and inter-notebook relationships in the returned results. We developed NBSearch through an iterative participatory design process with two experts from a large software company. We evaluated the models with a series of experiments and the whole system with a controlled user study. The results indicate the feasibility of our analytical pipeline and the effectiveness of NBSearch to support code search in large notebook collections.

[1]  Brad A. Myers,et al.  Variolite: Supporting Exploratory Programming by Data Scientists , 2017, CHI.

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Cristina V. Lopes,et al.  How Well Do Search Engines Support Code Retrieval on the Web? , 2011, TSEM.

[4]  Jan Borchers,et al.  TRACTUS: Understanding and Supporting Source Code Experimentation in Hypothesis-Driven Data Science , 2020, CHI.

[5]  Jian Zhao,et al.  Interactive Exploration of Implicit and Explicit Relations in Faceted Datasets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Jian Zhao,et al.  Flexible Learning with Semantic Visual Exploration and Sequence-Based Recommendation of MOOC Videos , 2018, CHI.

[8]  Benjamin M. Good,et al.  Tag clouds for summarizing web search results , 2007, WWW '07.

[9]  Ben Shneiderman,et al.  From Keyword Search to Exploration: Designing Future Search Interfaces for the Web , 2010, Found. Trends Web Sci..

[10]  Eduardo E. Veas,et al.  Rank As You Go: User-Driven Exploration of Search Results , 2016, IUI.

[11]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[12]  Hanspeter Pfister,et al.  LineUp: Visual Analysis of Multi-Attribute Rankings , 2013, IEEE Transactions on Visualization and Computer Graphics.

[13]  Margo I. Seltzer,et al.  BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure , 2012, TaPP.

[14]  Brad A. Myers,et al.  Towards Effective Foraging by Data Scientists to Find Past Analysis Choices , 2019, CHI.

[15]  Elias Salomão Helou Neto,et al.  Similarity Preserving Snippet-Based Visualization of Web Search Results , 2014, IEEE Transactions on Visualization and Computer Graphics.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[18]  James D. Hollan,et al.  Exploration and Explanation in Computational Notebooks , 2018, CHI.

[19]  Alex Sherstinsky,et al.  Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network , 2018, Physica D: Nonlinear Phenomena.

[20]  Souti Chattopadhyay,et al.  What's Wrong with Computational Notebooks? Pain Points, Needs, and Design Opportunities , 2020, CHI.

[21]  Jun Zhang,et al.  A Novel Visualization Model for Web Search Results , 2006, IEEE Transactions on Visualization and Computer Graphics.

[22]  Ken Krugler,et al.  Krugle Code Search Architecture , 2013, Finding Source Code on the Web for Remix and Reuse.

[23]  Cheng Deng,et al.  The Effects of Adding Search Functionality to Interactive Visualizations on the Web , 2018, CHI.

[24]  Anselm Spoerri RankSpiral: Toward Enhancing Search Results Visualizations , 2004, IEEE Symposium on Information Visualization.

[25]  Mark S. Ackerman,et al.  The perfect search engine is not enough: a study of orienteering behavior in directed search , 2004, CHI.

[26]  Daniel Cohen-Or,et al.  DynamicMaps: Similarity-based Browsing through a Massive Set of Images , 2015, CHI.

[27]  Brad A. Myers,et al.  The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool , 2018, CHI.

[28]  Chris North,et al.  Albireo: An Interactive Tool for Visually Summarizing Computational Notebook Structure , 2019, 2019 IEEE Visualization in Data Science (VDS).

[29]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[30]  Björn Hartmann,et al.  Composing Flexibly-Organized Step-by-Step Tutorials from Linked Source Code, Snippets, and Outputs , 2020, CHI.

[31]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[32]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[33]  Francoise Balmas Displaying dependence graphs: a hierarchical approach , 2004, J. Softw. Maintenance Res. Pract..

[34]  Arvind Satyanarayan,et al.  Augmenting Code with In Situ Visualizations to Aid Program Understanding , 2018, CHI.

[35]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[36]  Kathryn T. Stolee,et al.  How developers search for code: a case study , 2015, ESEC/SIGSOFT FSE.

[37]  Jian Zhao,et al.  Egocentric Analysis of Dynamic Networks with EgoLines , 2016, CHI.

[38]  Jaakko Peltonen,et al.  Topic-Relevance Map: Visualization for Improving Search Result Comprehension , 2017, IUI.

[39]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[40]  Doreen Seider,et al.  Visualizing Modules and Dependencies of OSGi-Based Applications , 2016, 2016 IEEE Working Conference on Software Visualization (VISSOFT).

[41]  Steven M. Drucker,et al.  Managing Messes in Computational Notebooks , 2019, CHI.

[42]  Collin McMillan,et al.  Exemplar: A Source Code Search Engine for Finding Highly Relevant Applications , 2012, IEEE Transactions on Software Engineering.

[43]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[44]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[45]  James D. Foley,et al.  ResultMaps: Visualization for Search Interfaces , 2009, IEEE Transactions on Visualization and Computer Graphics.

[46]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[47]  Susan T. Dumais,et al.  PivotPaths: Strolling through Faceted Information Spaces , 2012, IEEE Transactions on Visualization and Computer Graphics.