CORC  > 北京大学  > 信息科学技术学院
Tianwang: Towards a quality and scalable Web serch service
Li, XM ; Wang, JY
2002
关键词Web information retrieval crawling ranking scalability Tianwang
英文摘要As the amount of information on the Web and the number of inexperienced new users are growing rapidly, people increasingly rely on Web search engine to find useful information from the Internet. As a result, the quality of search service and system scalability as two primary challenges have been faced by web information retrieval systems, namely Web search engines. In this paper, we present our solutions to them in Tianwang, a well known Chinese and English Web search engine in China, covering entire Web in China and scaling to the rapid growth of Chinese Web information. To achieve good scalability, we have designed and implemented a parallel and distributed architecture, consisting of crawling, indexing and searching subsystems, each with multiple processing nodes. Besides parallel processing, effectiveness of the crawling subsystem is also enforced by a rational URL allocating algorithm, an efficient heuristic crawling strategy, and a method to assure its reconfigurability. Distributed indexing subsystem can start multiple indexers to work in parallel,in order to reduce the time for creating the index database. The performance of searching subsystem is also enhanced by a query caching mechanism. To return high quality search results, an effective near-replicas detection algorithm and an innovative ranking system are employed, the latter of which makes full use of HTML tags, link popularity-based anchor text, and proximity in search.; Computer Science, Information Systems; CPCI-S(ISTP); 0
语种英语
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/293913]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Li, XM,Wang, JY. Tianwang: Towards a quality and scalable Web serch service. 2002-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace