Detecting Fake Points of Interest from Location Data

The pervasiveness of GPS-enabled mobile devices and the widespread use of location-based services have resulted in the generation of massive amounts of geo-tagged data. In recent times, the data analysis now has access to more sources, including reviews, news, and images, which also raises questions about the reliability of Point-of-Interest (POI) data sources. While previous research attempted to detect fake POI data through various security mechanisms, the current work attempts to capture the fake POI data in a much simpler way. The proposed work is focused on supervised learning methods and their capability to find hidden patterns in location-based data. The ground truth labels are obtained through real-world data, and the fake data is generated using an API, so we get a dataset with both the real and fake labels on the location data. The objective is to predict the truth about a POI using the Multi-Layer Perceptron (MLP) method. In the proposed work, MLP based on data classification technique is used to classify location data accurately. The proposed method is compared with traditional classification and robust and recent deep neural methods. The results show that the proposed method is better than the baseline methods. Keywords—detection, location, point-of-interest, classification, deep learning, multilayer perceptron

[1]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[2]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[3]  Flávio Sanson Fogliatto,et al.  Learning curve models and applications: Literature review and research directions , 2011 .

[4]  Ian Welch,et al.  How do they find us? A study of geolocation tracking techniques of malicious web sites , 2020, Comput. Secur..

[5]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[6]  Matthias Hein,et al.  Why ReLU Networks Yield High-Confidence Predictions Far Away From the Training Data and How to Mitigate the Problem , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alois Knoll,et al.  Gradient boosting machines, a tutorial , 2013, Front. Neurorobot..

[8]  Alexander Vezhnevets,et al.  ‘ Modest AdaBoost ’ – Teaching AdaBoost to Generalize Better , 2005 .

[9]  S. Oliva,et al.  Modeling future spread of infections via mobile geolocation data and population dynamics. An application to COVID-19 in Brazil , 2020, PloS one.

[10]  Bogdan Carbunar,et al.  The Art and Craft of Fraudulent App Promotion in Google Play , 2019, CCS.

[11]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[12]  Alex C. Snoeren,et al.  Pinning Down Abuse on Google Maps , 2017, WWW.

[13]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[14]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[15]  Guokun Lai,et al.  Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing , 2020, NeurIPS.

[16]  Osman Ghazali,et al.  A trust computing mechanism for cloud computing with multilevel thresholding , 2011, 2011 6th International Conference on Industrial and Information Systems.

[17]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[18]  V. Rao Vemuri,et al.  Use of K-Nearest Neighbor classifier for intrusion detection , 2002, Comput. Secur..

[19]  Zachary N. J. Peterson,et al.  Geolocation of data in the cloud , 2013, CODASPY.

[20]  A. Ganapathiraju,et al.  LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL , 1995 .

[21]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[22]  Goran M. Djuknic,et al.  Geolocation and Assisted GPS , 2001, Computer.

[23]  David W. Chadwick,et al.  A cloud-edge based data security architecture for sharing and analysing cyber threat information , 2020, Future Gener. Comput. Syst..

[24]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[25]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[26]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[27]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[28]  M. W Gardner,et al.  Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences , 1998 .

[29]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[30]  Tülay Adali,et al.  Approximation by Fully Complex Multilayer Perceptrons , 2003, Neural Computation.

[31]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[32]  Lifeng Wu,et al.  Light Gradient Boosting Machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data , 2019, Agricultural Water Management.

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.