Proceeding of the 1 st International Workshop on Privacy-Preserving IR : When Information Retrieval Meets Privacy and Security ( PIR 2014 )

Many real world applications in the healthcare domain would gain a substantial advantage from sharing and search technologies available for P2P infrastructures if these technologies could provide required confidentiality guarantees. Currently, DHT-based indexes which are typically applied for effective and efficient information sharing and retrieval in P2P networks do not offer sufficient confidentiality for the patient data in a healthcare network and medical document archives. In this paper we discuss the challenges involved in securing patient data stored in a DHT-based index and discuss initial solutions to address these challenges.

[1]  David K. Y. Yau,et al.  Privacy vulnerability of published anonymous mobility traces , 2010, MobiCom.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Chi-Yin Chow,et al.  Trajectory privacy in location-based services and data publication , 2011, SKDD.

[4]  Leif Azzopardi,et al.  The economics in interactive information retrieval , 2011, SIGIR.

[5]  Aleksandar Kuzmanovic,et al.  Measuring serendipity: connecting people, locations and interests in a mobile 3G network , 2009, IMC '09.

[6]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[7]  Pascale Fung,et al.  Extractive Speech Summarization Using Shallow Rhetorical Structure Modeling , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jun Huan,et al.  Aligned Graph Classification with Regularized Logistic Regression , 2009, SDM.

[9]  Eli Pariser,et al.  The Filter Bubble: What the Internet Is Hiding from You , 2011 .

[10]  Petros Boufounos,et al.  Universal Rate-Efficient Scalar Quantization , 2010, IEEE Transactions on Information Theory.

[11]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[12]  Kang G. Shin,et al.  Privacy protection for users of location-based services , 2012, IEEE Wireless Communications.

[13]  Ian H. Witten,et al.  Subject metadata support powered by Maui , 2010, JCDL '10.

[14]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[15]  Wen-Chih Peng,et al.  Dummy-Based Schemes for Protecting Movement Trajectories , 2012, J. Inf. Sci. Eng..

[16]  Andreas Krause,et al.  A Utility-Theoretic Approach to Privacy and Personalization , 2008, AAAI.

[17]  K. Spärck Jones,et al.  Between shallow and deep: an experiment in automatic summarising , 2005 .

[18]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[19]  Dino Pedreschi,et al.  Mobility, Data Mining and Privacy - Geographic Knowledge Discovery , 2008, Mobility, Data Mining and Privacy.

[20]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[21]  Hassan Aljifri,et al.  Search engines and privacy , 2004, Comput. Secur..

[22]  Bettina Berendt,et al.  Addressing Users' Privacy Concerns for Improving Personalization Quality: Towards an Integration of User Studies and Algorithm Evaluation , 2003, ITWP.

[23]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[24]  Albert-László Barabási,et al.  Limits of Predictability in Human Mobility , 2010, Science.

[25]  Vincent P. Wade,et al.  Personalisation in the wild: providing personalisation across semantic, social and open-web resources , 2011, HT '11.

[26]  Reza Shokri,et al.  Evaluating the Privacy Risk of Location-Based Services , 2011, Financial Cryptography.

[27]  Hui Xiong,et al.  Achieving Guaranteed Anonymity in GPS Traces via Uncertainty-Aware Path Cloaking , 2010, IEEE Transactions on Mobile Computing.

[28]  Jaime G. Carbonell,et al.  Self reinforcement for important passage retrieval , 2013, SIGIR.

[29]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[30]  Bhiksha Raj,et al.  Speaker verification using Secure Binary Embeddings , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[31]  Michael Moss,et al.  The Hutton Inquiry, the President of Nigeria and What the Butler Hoped to See , 2005 .

[32]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[33]  Horacio Saggion,et al.  The CONCISUS Corpus of Event Summaries , 2012, LREC.

[34]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[35]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[36]  Michael Moss,et al.  Where Have All the Files Gone? Lost in Action Points Every One? , 2012 .

[37]  Anshu Aggarwal,et al.  HTTP: The Definitive Guide , 2002 .

[38]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[39]  Giacomo Berardi,et al.  A utility-theoretic ranking method for semi-automated text classification , 2012, SIGIR '12.

[40]  Barry Smyth,et al.  Communities, Collaboration, and Recommender Systems in Personalized Web Search , 2011, Recommender Systems Handbook.

[41]  Gerald Penn,et al.  A Critical Reassessment of Evaluation Baselines for Speech Summarization , 2008, ACL.

[42]  Kristina Winbladh,et al.  Analysis of user comments: An approach for software requirements evolution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[43]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[44]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[45]  Helen Nissenbaum,et al.  Privacy in Context - Technology, Policy, and the Integrity of Social Life , 2009 .

[46]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[47]  Thorsten Joachims,et al.  Temporal corpus summarization using submodular word coverage , 2012, CIKM '12.

[48]  João Paulo da Silva Neto,et al.  Keyphrase Cloud Generation of Broadcast News , 2013, INTERSPEECH.

[49]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[50]  Cong Sun,et al.  Balancing trajectory privacy and data utility using a personalized anonymization model , 2014, J. Netw. Comput. Appl..

[51]  Petros Boufounos,et al.  Secure binary embeddings for privacy preserving nearest neighbors , 2011, 2011 IEEE International Workshop on Information Forensics and Security.

[52]  B Watkin,et al.  Rule of law. , 1977, Nursing mirror and midwives journal.

[53]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[54]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[55]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[56]  Lisa Singh,et al.  Exploring re-identification risks in public domains , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[57]  Heng Xu,et al.  Information Privacy Research: An Interdisciplinary Review , 2011, MIS Q..

[58]  Omer Tene What Google Knows: Privacy and Internet Search Engines , 2007 .

[59]  Christos Faloutsos,et al.  Why people hate your app: making sense of user feedback in a mobile app store , 2013, KDD.

[60]  Min Wu,et al.  Confidentiality-Preserving Image Search: A Comparative Study Between Homomorphic Encryption and Distance-Preserving Randomization , 2014, IEEE Access.

[61]  Jaime G. Carbonell,et al.  Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization , 2012, LREC.

[62]  Quan Z. Sheng,et al.  The ethical and social implications of personalization technologies for e-learning , 2014, Inf. Manag..

[63]  Michael Zimmer,et al.  The Gaze of the Perfect Search Engine: Google as an Infrastructure of Dataveillance , 2008 .

[64]  William Webber,et al.  Approximate Recall Confidence Intervals , 2012, TOIS.

[65]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[66]  Wei Jiang,et al.  N-Gram Based Secure Similar Document Detection , 2011, DBSec.

[67]  Jianliang Xu,et al.  Distortion-based anonymity for continuous queries in location-based mobile services , 2009, GIS.

[68]  Bracha Shapira,et al.  Personalized search: Integrating collaboration and social networks , 2011, J. Assoc. Inf. Sci. Technol..

[69]  Jeremy Pickens,et al.  Assessor disagreement and text classifier accuracy , 2013, SIGIR.

[70]  Sree Hari Krishnan Parthasarathi,et al.  Exploiting innocuous activity for correlating users across sites , 2013, WWW.

[71]  Alexander H. Waibel,et al.  Minimizing Word Error Rate in Textual Summaries of Spoken Language , 2000, ANLP.

[72]  Fabio Gasparetti,et al.  Personalized Search on the World Wide Web , 2007, The Adaptive Web.

[73]  David D. Lewis,et al.  Information retrieval for e-discovery , 2010, SIGIR.

[74]  Yücel Saygin,et al.  Towards trajectory anonymization: a generalization-based approach , 2008, SPRINGL '08.

[75]  Dilek Z. Hakkani-Tür,et al.  Long story short - Global unsupervised models for keyphrase based meeting summarization , 2010, Speech Commun..

[76]  Yang Wang,et al.  Personalization and privacy: a survey of privacy risks and remedies in personalization-based systems , 2012, User Modeling and User-Adapted Interaction.

[77]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[78]  Ning Chen,et al.  AR-miner: mining informative reviews for developers from mobile app marketplace , 2014, ICSE.

[79]  Chris Clifton,et al.  Efficient privacy-preserving similar document detection , 2010, The VLDB Journal.

[80]  Richard M. Schmidt,et al.  The Freedom of Information Act , 1987 .

[81]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[82]  Anthony Lincoln FYI: TMI: Toward a holistic social theory of information overload , 2011, First Monday.

[83]  Kai Riemer,et al.  Personalisation of eSearch Services - Concepts, Techniques, and Market Overview , 2006, Bled eConference.

[84]  Grace Hui Yang,et al.  Increasing Stability of Result Organization for Session Search , 2013, ECIR.

[85]  Mihir Bellare,et al.  Efficient Garbling from a Fixed-Key Blockcipher , 2013, 2013 IEEE Symposium on Security and Privacy.

[86]  Xiaokui Xiao,et al.  Obfuscating the Topical Intention in Enterprise Text Search , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[87]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[88]  Craig MacDonald,et al.  Towards a Classifier for Digital Sensitivity Review , 2014, ECIR.

[89]  Tao Qin,et al.  A study of relevance propagation for web search , 2005, SIGIR '05.

[90]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[91]  Zhongfu Wu,et al.  Personalisation in web computing and informatics: Theories, techniques, applications, and future research , 2010, Inf. Syst. Frontiers.

[92]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[93]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[94]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[95]  Daniel J. Solove Access and Aggregation: Privacy, Public Records, and the Constitution , 2001 .

[96]  Hui Zang,et al.  Mining call and mobility data to improve paging efficiency in cellular networks , 2007, MobiCom '07.

[97]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[98]  Maria das Graças Volpe Nunes,et al.  A comprehensive comparative evaluation of RST-based summarization methods , 2010, TSLP.

[99]  Luo Si,et al.  Protecting source privacy in federated search , 2007, SIGIR.

[100]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[101]  Ricardo Ribeiro,et al.  Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity: Extended abstract , 2013, IJCAI.

[102]  Eemil Lagerspetz,et al.  The company you keep: mobile malware infection rates and inexpensive risk indicators , 2013, WWW.

[103]  Vincent P. Wade,et al.  A comparative survey of Personalised Information Retrieval and Adaptive Hypermedia techniques , 2012, Inf. Process. Manag..

[104]  Falk Scholer,et al.  The effect of threshold priming and need for cognition on relevance calibration and assessment , 2013, SIGIR.

[105]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[106]  Philippe Golle,et al.  On the Anonymity of Home/Work Location Pairs , 2009, Pervasive.

[107]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[108]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[109]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[110]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.