The aim of the present paper is to explore and obtain a simple method capable to detect the most important variables (features) from a large set of variables. To verify the performance of the approach described in the following sections, we used a set of news. Text sources are considered high-dimensional data, where each word is treated as a single variable. In our work, a linear predictor model has been used to uncover the most influential variables, reducing strongly the dimension of the data set. Input data is classified in two categories; arranged as a collection of plain text data, pre-processed and transformed into a numerical matrix containing around 10,000 different variables. We adjust the linear model's parameters based on its prediction results, the variables with strongest effect on output survive, while those with negligible effect are removed. In order to collect, automatically, a summarized set of features, we sacrifice some details and accuracy of the prediction model, although we try to balance the squared error with the subset obtained.