Querying web resources with metadata in a database
暂无分享,去创建一个
We propose an extended SQL framework for metadata-based web querying where the metadata resides in a traditional object-relational database. We extend database relations with scoring functions and importance scores, called sideway functions and sideway values. We add to SQL score-management clauses with well-defined semantics, and propose an algebra to evaluate the extended SQL queries. The metadata model is based on topics (representing entities), relationships among topics (called metalinks), and importance scores (sideway values) of topics and metalinks.
We show that our SQL extensions are well-defined, meaning that, given a database and a query Q, under any query processing scheme, the output tuples of Q and their importance scores stay the same. Proposed SQL extensions and the algebra are illustrated through two web resources, namely, the DBLP Bibliography and the ACM SIGMOD Anthology.
To process the SQL extensions, we introduce SVA operators that modify and propagate sideway values of base relations in automated and generic ways. In this thesis, we discuss two sideway value algebra operators, namely SVA topic selection operator and SVA topic closure operator, present their implementation algorithms, and report their experimental evaluations using both real data and synthetically generated data.
As our real data, we automatically analyze all articles in the ACM Anthology, construct indices for them, and use the indices to extract meta-relationships (metalinks) between the papers. Also, we present and evaluate algorithms to efficiently locate similar papers in the ACM Anthology.