Comparing with the search function focusing on massive unstructured process data in current civil aircraft system, the improved search algorithm proposed in this paper mainly makes the following optimization:At first, the data is preprocessed before segmentation to enhance the precision and speed;Secondly, considering that keywords in different places weight differently, a concept of key coefficient is introduced based on the traditional TF-IDF algorithm, and for a better representativeness of the extracted words, threshold in the weight of keywords is set as extracted words filters; Then, the improved TF-IDF algorithm is based on the big data programming model MapReduce to improve the computational efficiency. Final, including the feedback of individual query, the improved sorting algorithm can provide a better query result to meet users’ demand.
[1]
M. M. Sufyan Beg.
A subjective measure of web search quality
,
2005,
Inf. Sci..
[2]
Zhang Min,et al.
Automatic Search Engine Performance Evaluation Based on User Behavior Analysis
,
2008
.
[3]
Ivan Koychev,et al.
Learning to recommend from positive evidence
,
2000,
IUI '00.
[4]
Li Bin.
Improve of TF-IDF algorithm based on Hadoop framework
,
2012
.
[5]
Eric Brill,et al.
Beyond PageRank: machine learning for static ranking
,
2006,
WWW '06.
[6]
Atsuhiro Takasu,et al.
Effect of relationships between words on Japanese information retrieval
,
2006,
TALIP.