Testing Fairness Using the Log-likelihood Ratio
暂无分享,去创建一个
The training dataset was processed to produce an array of features for each record. The following features were produced for abstract, entities, title, and venue: average idf, average tfidf, average saturation term, average BM25, where saturation term is the function of term frequency that multiplies the IDF of the BM25 weight. To reduce the high variability, the logarithm function was applied to average idf, average tfidf, and average BM25. Moreover, the number of authors per author group was produced. Finally, the relevance assessment, the number of in-citations, the number of out-citations, and the number of query terms was added to each output record. The aforementioned feature file was processed to compute the proportion of authors for each relevance assessment and for each author group, that is:
[1] Clinton Gormley,et al. Elasticsearch: The Definitive Guide , 2015 .