Data-based research at IIT Bombay

1. OVERVIEW The Indian Institute of Technology (IIT) Bombay has a history of research and development in the area of databases, dating back to the early 1980s. D. B. Phatak and N. L. Sarda were among the first faculty members at IIT Bombay to work in the area of database systems. This was a period when the financial sector of India, headquartered primarily in Bombay (now renamed Mumbai) saw a spurt in computerization, and IIT Bombay faculty played a leading role as consultants for database implementations in these companies. Research in the area of databases began in the early 1980s, but increased greatly from the early 1990s, with the hiring of several faculty including S. Seshadri, S. Sudarshan, and later Krithi Ramamritham, who moved to IIT Bombay from U. Mass. Amherst in the early to mid 1990s. With the hiring of Sunita Sarawagi and Soumen Chakrabarti in the late 1990s, there was a significant broadening, with the group no longer being just a database group, but rather a much broader data management group, with interests in information retrieval, and data mining. More recently Ganesh Ramakrishnan joined our group, further increasing its strengths in information retrieval and data mining. The number of PhD students increased from around 1 or 2 enrolled at a time in the early 1990s, to about 12 to 15 students at a time in recent years. While this number is much better than earlier, and is increasing rapidly, it is still small by most standards. However, our master’s and bachelor’s students have compensated for the shortage of PhD students, and have made very significant contributions to our research efforts, with well over three fourths of our papers having such students as coauthors. Today, the group covers a diverse range of interests, which you can see from the different research projects showcased in this article. In the following sections, we outline the major research projects of the group. We wrap up the article with a summary of other contributions to the community, by group members. For more information about the group, please visit: http://www.cse.iitb.ac.in/infolab

[1]  Krithi Ramamritham,et al.  Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments , 2012, Proc. VLDB Endow..

[2]  Soumen Chakrabarti,et al.  Index design and query processing for graph conductance search , 2011, The VLDB Journal.

[3]  Ashwin Srinivasan,et al.  What Kinds of Relational Features Are Useful for Statistical Learning? , 2012, ILP.

[4]  Ashwin Srinivasan,et al.  Using ILP to Construct Features for Information Extraction from Semi-structured Text , 2007, ILP.

[5]  Krithi Ramamritham,et al.  Query Planning for Continuous Aggregation Queries over a Network of Data Aggregators , 2012, IEEE Transactions on Knowledge and Data Engineering.

[6]  Ganesh Ramakrishnan,et al.  Web-scale entity-relation search architecture , 2011, WWW.

[7]  Ganesh Ramakrishnan,et al.  Compressed data structures for annotated web search , 2012, WWW.

[8]  S. Sudarshan,et al.  Program transformations for asynchronous query submission , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[9]  Somnath Banerjee,et al.  Learning to rank for quantity consensus queries , 2009, SIGIR.

[10]  Ganesh Ramakrishnan,et al.  Rule Ensemble Learning Using Hierarchical Kernels in Structured Output Spaces , 2012, AAAI.

[11]  Ashwin Srinivasan,et al.  Feature Construction Using Theory-Guided Sampling and Randomised Search , 2008, ILP.

[12]  S. Sudarshan,et al.  Rewriting procedures for batched bindings , 2008, Proc. VLDB Endow..

[13]  Rahul Gupta,et al.  Joint training for open-domain extraction on the web: exploiting overlap when supervision is limited , 2011, WSDM '11.

[14]  Aditya Ramesh,et al.  Keyword search on form results , 2011, The VLDB Journal.

[15]  Sunita Sarawagi,et al.  Answering Table Queries on the Web using Column Keywords , 2012, Proc. VLDB Endow..

[16]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[17]  Prashant J. Shenoy,et al.  Adaptive push-pull: disseminating dynamic web data , 2001, WWW '01.

[18]  Soumen Chakrabarti,et al.  Learning joint query interpretation and response ranking , 2013, WWW '13.

[19]  Rahul Gupta,et al.  Answering Table Augmentation Queries from Unstructured Lists on the Web , 2009, Proc. VLDB Endow..

[20]  Sunita Sarawagi,et al.  Efficient top-k count queries over imprecise duplicates , 2009, EDBT '09.

[21]  Soumen Chakrabarti,et al.  Learning random walks to rank nodes in graphs , 2007, ICML '07.

[22]  S. Sudarshan,et al.  DBridge: A program rewrite tool for set-oriented query execution , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[23]  Krithi Ramamritham,et al.  Keyword Search over Dynamic Categorized Information , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Mukesh K. Mohania,et al.  Ratio threshold queries over distributed data sources , 2010, ICDE.

[25]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[26]  Krithi Ramamritham,et al.  Executing incoherency bounded continuous queries at web data aggregators , 2005, WWW '05.

[27]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[28]  Prashant J. Shenoy,et al.  Maintaining Coherency of Dynamic Data in Cooperating Repositories , 2002, VLDB.

[29]  Ashwin Srinivasan,et al.  An investigation into feature construction to assist word sense disambiguation , 2009, Machine Learning.

[30]  Krithi Ramamritham,et al.  Scalable Execution of Continuous Aggregation Queries over Web Data , 2012, IEEE Internet Computing.

[31]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[32]  Ganesh Ramakrishnan,et al.  Probing the Space of Optimal Markov Logic Networks for Sequence Labeling , 2012, ILP.

[33]  Ashwin Srinivasan,et al.  Parameter Screening and Optimisation for ILP using Designed Experiments , 2011, J. Mach. Learn. Res..

[34]  Ashwin Srinivasan,et al.  Word Sense Disambiguation Using Inductive Logic Programming , 2007, ILP.

[35]  Ganesh Ramakrishnan,et al.  Identification of class specific discourse patterns , 2008, CIKM '08.

[36]  Krithi Ramamritham,et al.  Efficient Execution of Continuous Incoherency Bounded Queries over Multi-Source Streaming Data , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[37]  Krithi Ramamritham,et al.  Category-Based Infidelity Bounded Queries over Unstructured Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[38]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[39]  Krithi Ramamritham,et al.  Optimized query planning of continuous aggregation queries in dynamic data dissemination networks , 2007, WWW '07.

[40]  Pushpak Bhattacharyya,et al.  Towards Efficient Named-Entity Rule Induction for Customizability , 2012, EMNLP.

[41]  S. Sudarshan,et al.  Entity Ranking and Relationship Queries using an Extended Graph Model , 2012, COMAD.

[42]  Ganesh Ramakrishnan,et al.  Pruning Search Space for Weighted First Order Horn Clause Satisfiability , 2010, ILP.

[43]  S. Sudarshan,et al.  Generating test data for killing SQL mutants: A constraint-based approach , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[44]  S. Sudarshan,et al.  Extending XData to kill SQL query mutants in the wild , 2013, DBTest '13.

[45]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[46]  Soumen Chakrabarti,et al.  Learning to rank networked entities , 2006, KDD '06.

[47]  Ganesh Ramakrishnan,et al.  Efficient Rule Ensemble Learning using Hierarchical Kernels , 2011, ICML.

[48]  S. Sudarshan,et al.  Holistic optimization by prefetching query results , 2012, SIGMOD Conference.

[49]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[50]  Sunita Sarawagi,et al.  Active Evaluation of Classifiers on Large Datasets , 2012, 2012 IEEE 12th International Conference on Data Mining.