FORA: An OWO based framework for finding outliers in web usage mining

Abstract Handling outliers are one of the primary concerns of today’s data mining techniques. The concept of outliers, it’s handling, and diagnosis is context specific and varies according to the field of application. The existence of outliers while mining web data is inevitable by virtue of unique characteristic features exhibited by a typical web user. As the output of a regression algorithm is always different from the actual value, it poses a challenge to the knowledge workers and researchers about the notion of an outlier in such cases. In this paper, we propose to develop the concept of an outlier with respect to regression analysis of any Web-based dataset. A framework to find outliers in the output of a regression algorithm is being formulated with the help of Ordered Weighted operators. The underlying idea is to find an error rectification value, ϵ, that will work, in association with the predicted value from the regression model and then help to distinguish an outlier. This will, in addition, also provide a possible range of deviation from the predicted output. A case study on a web dataset is being done to show the usefulness of the proposed approach.

[1]  Ali Emrouznejad,et al.  Ordered Weighted Averaging Operators 1988–2014: A Citation‐Based Literature Survey , 2014, Int. J. Intell. Syst..

[2]  F. Herrera,et al.  An intelligent news recommender agent for filtering and categorizing large volumes of text corpus , 2004 .

[3]  Matthias Ehrgott,et al.  Multiple criteria decision analysis: state of the art surveys , 2005 .

[4]  Liu Dsosu,et al.  Fuzzy random measure and its extension theorem , 1983 .

[5]  Ronald R. Yager,et al.  Soft likelihood functions in combining evidence , 2017, Inf. Fusion.

[6]  Vipin Kumar,et al.  Parallel and Distributed Computing for Cybersecurity , 2005, IEEE Distributed Syst. Online.

[7]  Ming-yuan Chen,et al.  Induced generalized intuitionistic fuzzy OWA operator for multi-attribute group decision making , 2012, Expert Syst. Appl..

[8]  Hans-Jürgen Zimmermann,et al.  Fuzzy Set Theory - and Its Applications , 1985 .

[9]  Zeshui Xu,et al.  Information Technology and Quantitative Management ( ITQM 2013 ) Prioritized Multi-Criteria Decision Making Based on the Idea of PROMETHEE , 2013 .

[10]  Dipankar Dasgupta,et al.  Anomaly detection in multidimensional data using negative selection algorithm , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[11]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[12]  Dimitar Filev,et al.  Induced ordered weighted averaging operators , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[14]  José M. Merigó,et al.  Fuzzy induced generalized aggregation operators and its application in multi-person decision making , 2011, Expert Syst. Appl..

[15]  Enrique Herrera-Viedma,et al.  An Information Retrieval Model with Ordinal Linguistic Weighted Queries Based on Two Weighting Elements , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[16]  Enrique Herrera-Viedma Modeling the retrieval process for an information retrieval system using an ordinal fuzzy linguistic approach , 2001 .

[17]  L. Kohout,et al.  FUZZY POWER SETS AND FUZZY IMPLICATION OPERATORS , 1980 .

[18]  Dipankar Dasgupta,et al.  A comparison of negative and positive selection algorithms in novel pattern detection , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[19]  Humberto Bustince,et al.  Quantitative orness for lattice OWA operators , 2016, Inf. Fusion.

[20]  Joseph Sarkis,et al.  Strategic analysis of logistics and supply chain management systems using the analytical network process , 1998 .

[21]  Gabriella Pasi,et al.  Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting , 2012, Inf. Process. Manag..

[22]  Bart Kosko,et al.  Fuzzy Cognitive Maps , 1986, Int. J. Man Mach. Stud..

[23]  José M. Merigó,et al.  Induced aggregation operators in the Euclidean distance and its application in financial decision making , 2011, Expert Syst. Appl..

[24]  F. Y. Edgeworth,et al.  XLI. On discordant observations , 1887 .

[25]  Costantina Caruso A Data Mining Methodology for Anomaly Detection in Network Data: Choosing System-Defined Decision Boundaries , 2007, SEBD.

[26]  Shruti Kohli,et al.  A Survey on Web Information Retrieval Inside Fuzzy Framework , 2013, SocProS.

[27]  Enrique Herrera-Viedma,et al.  A decision support system to develop a quality management in academic digital libraries , 2015, Inf. Sci..

[28]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[29]  Francisco Herrera,et al.  Aggregation operators for linguistic weighted information , 1997, IEEE Trans. Syst. Man Cybern. Part A.

[30]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[31]  Celine Vens,et al.  Outlier detection in relational data: A case study in geographical information systems , 2012, Expert Syst. Appl..

[32]  S. Weber A general concept of fuzzy connectives, negations and implications based on t-norms and t-conorms , 1983 .

[33]  Gabriella Pasi,et al.  Multidimensional Relevance: A New Aggregation Criterion , 2009, ECIR.

[34]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[35]  Martti Juhola,et al.  Informal identification of outliers in medical data , 2000 .

[36]  B. Ahn Some remarks on the LSOWA approach for obtaining OWA operator weights , 2009 .

[37]  Yu-Jie Wang,et al.  A fuzzy multi-criteria decision-making model by associating technique for order preference by similarity to ideal solution with relative preference relation , 2014, Inf. Sci..

[38]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[39]  C. Hwang,et al.  Fuzzy Multiple Attribute Decision Making Methods , 1992 .

[40]  Lazim Abdullah,et al.  Fuzzy Multi Criteria Decision Making and its Applications: A Brief Review of Category , 2013 .

[41]  Shruti Kohli,et al.  Analysis of Regression Techniques for Improved Information Extraction from Real-Time Industrial Dataset , 2013 .

[42]  Shruti Kohli,et al.  OWA Operator‐Based Hybrid Framework for Outlier Reduction in Web Mining , 2016, Int. J. Intell. Syst..

[43]  Francisco Herrera,et al.  Direct approach processes in group decision making using linguistic OWA operators , 1996, Fuzzy Sets Syst..

[44]  Gwo-Hshiung Tzeng,et al.  Compromise solution by MCDM methods: A comparative analysis of VIKOR and TOPSIS , 2004, Eur. J. Oper. Res..

[45]  R. Giles Łukasiewicz logic and fuzzy set theory , 1976 .

[46]  Paul Helman,et al.  An immunological approach to change detection: algorithms, analysis and implications , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[47]  Shruti Kohli,et al.  Fuzzy information retrieval in WWW: a survey , 2014, Int. J. Adv. Intell. Paradigms.

[48]  Ronald R. Yager Lexicographic ordinal OWA aggregation of multiple criteria , 2010, Inf. Fusion.

[49]  Didier Dubois,et al.  New Results about Properties and Semantics of Fuzzy Set-Theoretic Operators , 1980 .

[50]  Yi Peng,et al.  FAMCDM: A fusion approach of MCDM methods to rank multiclass classification algorithms , 2011 .

[51]  Didier Dubois,et al.  Refinements of the maximin approach to decision-making in a fuzzy environment , 1996, Fuzzy Sets Syst..

[52]  L. Valverde,et al.  On Some Logical Connectives for Fuzzy Sets Theory , 1983 .

[53]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[54]  Maria Elena Smith,et al.  Aspects of the P-Norm Model of Information Retrieval: Syntactic Query Generation, Efficiency, And Theoretical , 1990 .

[55]  Enrique Herrera-Viedma,et al.  Evaluating the information quality of Web sites: A methodology based on fuzzy computing with words , 2006, J. Assoc. Inf. Sci. Technol..

[56]  K. Menger Statistical Metrics. , 1942, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Francisco Herrera,et al.  A study of the origin and uses of the ordered weighted geometric operator in multicriteria decision making , 2003, Int. J. Intell. Syst..

[58]  Enrique Herrera-Viedma,et al.  Trust based consensus model for social network in an incomplete linguistic information context , 2015, Appl. Soft Comput..

[59]  Nicholas J. Belkin,et al.  Some(what) grand challenges for information retrieval , 2008, SIGF.

[60]  Enrique Herrera-Viedma,et al.  Web retrieval: Techniques for the aggregation and selection of queries and answers , 2008 .

[61]  Shyi-Ming Chen,et al.  Fuzzy information retrieval based on geometric-mean averaging operators , 2005 .

[62]  Bernd Freisleben,et al.  CARDWATCH: a neural network based database mining system for credit card fraud detection , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[63]  Francisco Herrera,et al.  Consensus vote models for detecting and filtering neutrality in sentiment analysis , 2018, Inf. Fusion.

[64]  Francisco Chiclana,et al.  Social Network Decision Making with Linguistic Trustworthiness–Based Induced OWA Operators , 2014, Int. J. Intell. Syst..

[65]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[66]  Donald H. Kraft,et al.  A mathematical model of a weighted boolean retrieval system , 1979, Inf. Process. Manag..

[67]  Shruti Kohli,et al.  An ordered weighted operator approach towards web usage mining , 2014, 2014 International Conference on Computer and Communication Technology (ICCCT).

[68]  Yong Chen,et al.  Robust principal component analysis and outlier detection with ecological data , 2004 .

[69]  Babak Daneshvar Rouyendegh,et al.  Curriculum Change Parameters Determined by Multi Criteria Decision Making (MCDM) , 2014 .

[70]  Didier Dubois,et al.  A review of fuzzy set aggregation connectives , 1985, Inf. Sci..

[71]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[72]  Lotfi A. Zadeh,et al.  Outline of a New Approach to the Analysis of Complex Systems and Decision Processes , 1973, IEEE Trans. Syst. Man Cybern..

[73]  Montserrat Casanovas,et al.  The Induced Generalized Hybrid Averaging Operator and its Application in Financial Decision Making , 2009 .

[74]  Gwo-Hshiung Tzeng,et al.  Evaluating intertwined effects in e-learning programs: A novel hybrid MCDM model based on factor analysis and DEMATEL , 2007, Expert Syst. Appl..

[75]  Edmundas Kazimieras Zavadskas,et al.  Fuzzy multiple criteria decision-making techniques and applications - Two decades review from 1994 to 2014 , 2015, Expert Syst. Appl..

[76]  Ching-Lai Hwang,et al.  Multiple attribute decision making : an introduction , 1995 .

[77]  Z. S. Xu,et al.  The ordered weighted geometric averaging operators , 2002, Int. J. Intell. Syst..

[78]  Zeshui Xu,et al.  Induced uncertain linguistic OWA operators applied to group decision making , 2006, Inf. Fusion.

[79]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[80]  U. Höhle Probabilistic uniformization of fuzzy topologies , 1978 .

[81]  Jian Lin,et al.  Some hybrid weighted averaging operators and their application to decision making , 2014, Inf. Fusion.

[82]  Yu Yandong Triangular norms and TNF-sigma-algebras , 1985 .

[83]  Wen-Chin Chen,et al.  A fuzzy AHP and BSC approach for evaluating performance of IT department in the manufacturing industry in Taiwan , 2008, Expert Syst. Appl..

[84]  M. Meeker Internet trends 2015 , 2015 .

[85]  Christer Carlsson,et al.  Fuzzy multiple criteria decision making: Recent developments , 1996, Fuzzy Sets Syst..

[86]  L. D. Miguel,et al.  An algorithm for group decision making using n-dimensional fuzzy sets, admissible orders and OWA operators , 2017, Information Fusion.

[87]  M. Gupta,et al.  Theory of T -norms and fuzzy inference methods , 1991 .

[88]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[89]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[90]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[91]  José M. Merigó,et al.  THE FUZZY GENERALIZED OWA OPERATOR AND ITS APPLICATION IN STRATEGIC DECISION MAKING , 2010, Cybern. Syst..

[92]  Humberto Bustince,et al.  Self-adapting weighted operators for multiscale gradient fusion , 2018, Inf. Fusion.