Confucius is a great teacher in ancient China. His theories and principles were effectively spread throughout China by his disciples. Confucius is the product code name of Google's Knowledge Search product, which is developed at Google Beijing office. In this talk, I present Knowledge Search's key disciples, which are data management subroutines that generate labels for questions, that match existing answers to a question, that evaluate quality of answers, that rank users based on their contributions, that distill high-quality answers for search engines to index, and that route questions to domain experts, and etc. This talk presents scalable algorithms that we have devised to make these disciples effective in dealing with huge datasets. Efforts in making these algorithms run even faster on thousands of machines, and some open research problems will also be presented.
[1]
Edward Y. Chang,et al.
Collaborative filtering for orkut communities: discovery of user latent behavior
,
2009,
WWW '09.
[2]
Edward Y. Chang,et al.
Pfp: parallel fp-growth for query recommendation
,
2008,
RecSys '08.
[3]
Edward Y. Chang,et al.
Combinational collaborative filtering for personalized community recommendation
,
2008,
KDD.
[4]
Hao Wang,et al.
PSVM : Parallelizing Support Vector Machines on Distributed Computers
,
2007
.
[5]
Edward Y. Chang,et al.
Parallel Spectral Clustering
,
2008,
ECML/PKDD.
[6]
G. G. Stokes.
"J."
,
1890,
The New Yale Book of Quotations.
[7]
Edward Y. Chang,et al.
Parallelizing Support Vector Machines on Distributed Computers
,
2007,
NIPS.