The last few decades has witnessed a stupendous growth of information across the internet. The giant of information are unused across the globe and it requires rigid methodology to mine and extract the text. The growth of information is increasing exponentially and it becomes more important to detect useful pattern from the data. It is very difficult for the user to retrieve the data from the database. To solve this problem many techniques have been implemented and still require enhancements to overcome many retrieval problems in the unstructured data (Jayaraj et al., 2014).Text mining is a process to empathize and discover useful meaningful tacit information from a large amount of the semi-structured or unstructured textual data. Simply the text mining comprehends the intermingling of human linguistic competence and computational power of the system (Fan et al., 2006). The linguistic capability includes the adeptness to differentiate spelling, filtering out unpromising data, understanding the synonyms/meaning, different slags, abbreviations and finding the literal meaning. Orthodox approaches in text clustering and mining use words as a measure to discover similarity between documents. These words are presumed to be reciprocally independent which in real application it may differ and the concept, semantics and features are what describe the documents. The technique of extracting these features from the documents is called feature extraction (Liu et al., 2005). The concept of feature extraction has been successfully practiced in unsupervised algorithms like PCA (Principal Component Analysis) and SVD (Singular Value Decomposition).Recently most research aimed to speed up text mining process involves improvements in extracting features from the text, since the time consumed for extracting the word features from texts surpasses the initial training time. This paper portrays a fast method for the extraction of features with the aid of a configuration file to figure out the unpromising texts and completely eliminate the texts to reduce the dimensionality and space (Dorre et al .,1999).The most important advantage of this work is after extracting the promising features from the text, the feature selection process is carried out to filter out the unwanted meaningless text from the textual data. This new approach reduces the space and the dimension of the text considerably. The feature extraction phase is further subdivided into two levels, namely extraction and selection. The extraction reduces the space considerably and when further selection is carried out the space is reduced largely. But the feature extraction phase involves large complexities and limitations. Extraction of the information from resumes has been an important area of focus for a lot of researchers. A resume is a concise document about an individual trying to market him/her to the industry. Resumes contain both structured and unstructured data too (Kun Yu et al., 2005). Most of the business records are maintained in the form of documents and hence the documents are in unstructured format (Jayaraj et al., 2015). In a resume, the format is not predetermined and it is based on the authors thinking and writing skills, which makes the information extraction, comparison, and selection a Abstract Background: A novel algorithm named feature extraction is used to extract the textual data. Method: The most common method used for feature extraction from the documents is TF-IDF (Term FrequencyInverse Document Frequency). The TF-IDF measure is a method much used for weighting terms in information retrieval. Results: The basic idea of this research work is to develop an approach to select the appropriate resume efficiently and enhances the recruitment process by extracting the unique and special features in the resume and makes it simpler for the employer to select the right candidates without much effort and manual work.
[1]
Boris Chidlovskii,et al.
Scalable Feature Extraction from Noisy Documents
,
2009,
2009 10th International Conference on Document Analysis and Recognition.
[2]
S. Palani,et al.
A NOVEL APPROACH FOR FEATURE EXTRACTION AND SELECTION ON MRI IMAGES FOR BRAIN TUMOR CLASSIFICATION
,
2012
.
[3]
V. Mahalakshmi,et al.
Information Retrieval Configuration File Text Categorization Algorithm for Improving Business Intelligence
,
2015
.
[4]
James Allan,et al.
Matching resumes and jobs based on relevance models
,
2007,
SIGIR.
[5]
Jochen Dörre,et al.
Text mining: finding nuggets in mountains of textual data
,
1999,
KDD '99.
[6]
Weiguo Fan,et al.
Tapping the power of text mining
,
2006,
CACM.
[8]
Kun Yu,et al.
Resume Information Extraction with Cascaded Hybrid Model
,
2005,
ACL.
[9]
Sumit Maheshwari,et al.
An Approach to Extract Special Skills to Improve the Performance of Resume Selection
,
2010,
DNIS.
[10]
V. Mahalakshmi,et al.
Augmenting Efficiency of Recruitment Process using IRCF text mining Algorithm
,
2015
.
[11]
Jianchu Kang,et al.
A comparative study on unsupervised feature selection methods for text clustering
,
2005,
2005 International Conference on Natural Language Processing and Knowledge Engineering.