User driven multi-criteria source selection

Abstract Source selection is the problem of identifying a subset of available data sources that best meet a user’s needs. In this paper we propose a user-driven approach to source selection that seeks to identify sources that are most fit for purpose. The approach employs a decision support methodology to take account of a user’s context, to allow end users to tune their preferences by specifying the relative importance between different criteria, looking to find a trade-off solution aligned with his/her preferences. The approach is extensible to incorporate diverse criteria, not drawn from a fixed set, and solutions can use a subset of the data from each selected source, rather than require that sources are used in their entirety or not at all. The paper describes and motivates the approach, presenting a methodology for modelling a user’s context, and its collection of optimisation algorithms for exploring the space of solutions, and compares and evaluates the resulting algorithms using multiple real world data sets. The experiments show how source selection results are produced that are attuned to each user’s preferences, both with respect to overall weighted utility and through faithful representation of a user’s preferences within a result, while scaling to potentially thousands of sources.

[1]  Simon French,et al.  Decision Behaviour, Analysis and Support , 2009 .

[2]  Zakaria Saoud,et al.  Integrating social profile to improve the source selection and the result merging process in distributed information retrieval , 2016, Inf. Sci..

[3]  John A. Keane,et al.  Group aggregation of pairwise comparisons using multi-objective optimization , 2015, Inf. Sci..

[4]  Alessio Ishizaka,et al.  Multi-criteria Decision Analysis: Methods and Software , 2013 .

[5]  Shu-Ping Wan,et al.  Supplier selection using ANP and ELECTRE II in interval 2-tuple linguistic environment , 2017, Inf. Sci..

[6]  Alon Y. Halevy,et al.  Goods: Organizing Google's Datasets , 2016, SIGMOD Conference.

[7]  Enrico Zio,et al.  A COMPARISON OF METHODS FOR SELECTING PREFERRED SOLUTIONS IN MULTIOBJECTIVE DECISION MAKING , 2012 .

[8]  Bing Yu,et al.  A fuzzy TOPSIS model via chi-square test for information source selection , 2013, Knowl. Based Syst..

[9]  Enrique Herrera-Viedma,et al.  Consensus reaching model in the complex and dynamic MAGDM problem , 2016, Knowl. Based Syst..

[10]  Eng Ung Choo,et al.  A common framework for deriving preference values from pairwise comparison matrices , 2004, Comput. Oper. Res..

[11]  Alessio Ishizaka,et al.  Influence of aggregation and measurement scale on ranking a compromise alternative in AHP , 2009, J. Oper. Res. Soc..

[12]  Changyong Liang,et al.  A trust induced recommendation mechanism for reaching consensus in group decision making , 2017, Knowl. Based Syst..

[13]  Maria-Esther Vidal,et al.  Using Quality of Data Metadata for Source Selection and Ranking , 2000, WebDB.

[14]  Alessio Ishizaka,et al.  AHPSort: an AHP-based method for sorting problems , 2012 .

[15]  Alun D. Preece,et al.  Quality views: capturing and exploiting the user perspective on data quality , 2006, VLDB.

[16]  Marijke Lieferink,et al.  Does technique matter; a pilot study exploring weighting techniques for a multi-criteria decision support framework , 2014, Cost Effectiveness and Resource Allocation.

[17]  F. Chiclana,et al.  Strategic weight manipulation in multiple attribute decision making , 2018 .

[18]  T. L. Saaty A Scaling Method for Priorities in Hierarchical Structures , 1977 .

[19]  Tim Furche,et al.  DIADEM: Thousands of Websites to a Single Database , 2014, Proc. VLDB Endow..

[20]  Divesh Srivastava,et al.  SourceSight: Enabling Effective Source Selection , 2016, SIGMOD Conference.

[21]  F. A. Lootsma,et al.  Group preference aggregation in the multiplicative AHP The model of the group decision process and Pareto optimality , 1997 .

[22]  D. I. Cho,et al.  An integrated approach for supplier selection and purchasing decisions , 2008 .

[23]  Divesh Srivastava,et al.  Online Ordering of Overlapping Data Sources , 2013, Proc. VLDB Endow..

[24]  Georgios Paltoglou,et al.  Collection-integral source selection for uncooperative distributed information retrieval environments , 2010, Inf. Sci..

[25]  Enrique Herrera-Viedma,et al.  Consensus Building for the Heterogeneous Large-Scale GDM With the Individual Concerns and Satisfactions , 2018, IEEE Transactions on Fuzzy Systems.

[26]  Divesh Srivastava,et al.  Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration , 2015, CIDR.

[27]  Alessio Ishizaka,et al.  Comparison of fuzzy logic, AHP, FAHP and hybrid fuzzy AHP for new supplier selection and its performance analysis , 2014 .

[28]  Thomas L. Saaty,et al.  Decision making with dependence and feedback : the analytic network process : the organization and prioritization of complexity , 1996 .

[29]  Thomas L. Saaty,et al.  Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation , 1990 .

[30]  Divesh Srivastava,et al.  Characterizing and selecting fresh data sources , 2014, SIGMOD Conference.

[31]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[32]  Luis Martínez-López,et al.  Analyzing the performance of classical consensus models in large scale group decision making: A comparative study , 2017, Appl. Soft Comput..

[33]  Thomas L. Saaty,et al.  Group Decision Making and the AHP , 1989 .

[34]  Subbarao Kambhampati,et al.  Effectively mining and using coverage and overlap statistics for data integration , 2005, IEEE Transactions on Knowledge and Data Engineering.