iDiary: from GPS signals to a text-searchable diary

This paper describes a system that takes as input GPS data streams generated by users' phones and creates a searchable database of locations and activities. The system is called iDiary and turns large GPS signals collected from smartphones into textual descriptions of the trajectories. The system features a user interface similar to Google Search that allows users to type text queries on their activities (e.g., "Where did I buy books?") and receive textual answers based on their GPS signals. iDiary uses novel algorithms for semantic compression (known as coresets) and trajectory clustering of massive GPS signals in parallel to compute the critical locations of a user. Using an external database, we then map these locations to textual descriptions and activities so that we can apply text mining techniques on the resulting data (e.g. LSA or transportation mode recognition). We provide experimental results for both the system and algorithms and compare them to existing commercial and academic state-of-the-art. This is the first GPS system that enables text-searchable activities from GPS data.

[1]  Ian Foster,et al.  Designing and building parallel programs , 1994 .

[2]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[3]  Thad Starner,et al.  Using GPS to learn significant locations and predict movement across multiple users , 2003, Personal and Ubiquitous Computing.

[4]  Dan Feldman,et al.  Turning big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering , 2013, SODA.

[5]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[6]  James Allan,et al.  Frontiers, Challenges, and Opportunities for Information Retrieval , 2012 .

[7]  L. Schulman,et al.  Universal ε-approximators for integrals , 2010, SODA '10.

[8]  Vania Bogorny,et al.  A model for enriching trajectories with semantic geographical information , 2007, GIS.

[9]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[10]  Dan Feldman,et al.  The single pixel GPS: learning big data signals from tiny coresets , 2012, SIGSPATIAL/GIS.

[11]  S. Johansen,et al.  MAXIMUM LIKELIHOOD ESTIMATION AND INFERENCE ON COINTEGRATION — WITH APPLICATIONS TO THE DEMAND FOR MONEY , 2009 .

[12]  Matthias Grossglauser,et al.  CRAWDAD dataset epfl/mobility (v.2009-02-24) , 2009 .

[13]  Deborah Estrin,et al.  A framework for data quality and feedback in participatory sensing , 2007, SenSys '07.

[14]  Deborah Estrin,et al.  PEIR, the personal environmental impact report, as a platform for participatory sensing systems research , 2009, MobiSys '09.

[15]  Kai-Florian Richter,et al.  Semantic trajectory compression: Representing urban movement in a nutshell , 2012, J. Spatial Inf. Sci..

[16]  Wang-Chien Lee,et al.  Semantic trajectory mining for location prediction , 2011, GIS.

[17]  Dieter Fox,et al.  Location-Based Activity Recognition , 2005, KI.

[18]  Qiang Du,et al.  Convergence of the Lloyd Algorithm for Computing Centroidal Voronoi Tessellations , 2006, SIAM J. Numer. Anal..

[19]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[20]  Jin He,et al.  A three-dimensional Douglas–Peucker algorithm and its application to automated generalization of DEMs , 2009, Int. J. Geogr. Inf. Sci..

[21]  D. Hawkins POINT ESTIMATION OF THE PARAMETERS OF PIECEWISE REGRESSION MODELS. , 1976 .

[22]  P. Lerman Fitting Segmented Regression Models by Grid Search , 1980 .

[23]  Tetsuo Asano,et al.  Number Theory Helps Line Detection in Digital Images , 1993, ISAAC.

[24]  Dan Feldman,et al.  iDiary: From GPS Signals to a Text- Searchable Diary , 2015, ACM Trans. Sens. Networks.

[25]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[26]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[27]  Ryosuke Shibasaki,et al.  Activity-Aware Map: Identifying Human Daily Activity Pattern Using Mobile Phone Data , 2010, HBU.

[28]  Bernhard Mitschang,et al.  Usability analysis of compression algorithms for position data streams , 2010, GIS '10.

[29]  Nabil H. Mustafa,et al.  k-means projective clustering , 2004, PODS.

[30]  Mark de Berg,et al.  Streaming Algorithms for Line Simplification , 2007, SCG '07.

[31]  Ouri Wolfson,et al.  On-line data reduction and the quality of history in moving objects databases , 2006, MobiDE '06.

[32]  Tieniu Tan,et al.  Similarity based vehicle trajectory clustering and anomaly detection , 2005, IEEE International Conference on Image Processing 2005.

[33]  Sariel Har-Peled,et al.  Coresets for Discrete Integration and Clustering , 2006, FSTTCS.

[34]  Dan Feldman,et al.  An effective coreset compression algorithm for large scale sensor networks , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[35]  Hojung Cha,et al.  Automatically characterizing places with opportunistic crowdsensing using smartphones , 2012, UbiComp.

[36]  Verena Kantere,et al.  On-line discovery of hot motion paths , 2008, EDBT '08.

[37]  Shie Mannor,et al.  Generating storylines from sensor data , 2013, Pervasive Mob. Comput..

[38]  Ouri Wolfson,et al.  Spatio-temporal data reduction with deterministic error bounds , 2003, DIALM-POMC '03.

[39]  Pasi Fränti,et al.  Compression of GPS Trajectories , 2012, 2012 Data Compression Conference.

[40]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[41]  Christopher C. Miller,et al.  A Beast in the Field: The Google Maps Mashup as GIS/2 , 2006, Cartogr. Int. J. Geogr. Inf. Geovisualization.

[42]  Gang Chen,et al.  Mining Frequent Trajectory Patterns from GPS Tracks , 2010, 2010 International Conference on Computational Intelligence and Software Engineering.

[43]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[44]  Micha Sharir,et al.  Davenport-Schinzel sequences and their geometric applications , 1995, Handbook of Computational Geometry.

[45]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[46]  Stefano Spaccapietra,et al.  SeMiTri: a framework for semantic annotation of heterogeneous trajectories , 2011, EDBT/ICDT '11.

[47]  Krzysztof Janowicz,et al.  On the semantic annotation of places in location-based social networks , 2011, KDD.

[48]  John W. Fisher,et al.  Coresets for visual summarization with applications to loop closure , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).