Today it is quite common for web page content to include an advertisement. Since advertisers often want to target their message to people with certain demographic attributes, the anonymity of Internet users poses a special problem for them. The purpose of the present research is to find an effective way to infer demographic information (e.g. gender, age or income) about people who use the Internet but for whom demographic information is not otherwise available. Our hope is to build a high quality database of demographic profiles covering a large segment of the Internet population without having to survey each individual Internet user. Though Internet users are largely anonymous, they nonetheless provide a certain amount of usage information. Usage information includes, but is not limited to, (a) search terms entered by the Internet user and (b) web pages accessed by the Internet user. In this paper, we describe an application of the Latent Semantic Analysis (LSA) [1] information retrieval technique to construct a vector space in which we can represent the usage data associated with each Internet user of interest. Subsequently, we show how the LSA vector space enables us to produce demographic inferences by supplying the input to a three layer neural model trained using the scaled conjugate gradient (SCG) method.
[1]
James E. Pitkow,et al.
In Search of Reliable Usage Data on the WWW
,
1997,
Comput. Networks.
[2]
Philip S. Yu,et al.
SpeedTracer: A Web Usage Mining and Analysis Tool
,
1998,
IBM Syst. J..
[3]
Thomas Narten,et al.
Privacy Extensions for Stateless Address Autoconfiguration in IPv6
,
2001,
RFC.
[4]
Mark S. Ackerman,et al.
Beyond Concern: Understanding Net Users' Attitudes About Online Privacy
,
1999,
ArXiv.
[5]
Jaideep Srivastava,et al.
Data Preparation for Mining World Wide Web Browsing Patterns
,
1999,
Knowledge and Information Systems.