CORC  > 清华大学
基于多属性的海量Web数据关联存储及检索系统
罗芳 ; 李春花 ; 周可 ; 黄永峰 ; 廖正霜 ; LUO Fang ; LI Chun-hua ; ZHOU Ke ; HUANG Yong-feng ; LIAO Zheng-shuang
2016-03-30 ; 2016-03-30
关键词分类存储 多条件选择查询 关联映射 辅助索引 category storage multi-conditions selectable query associated mapping secondary indexing TP391.3
其他题名An associated storage and retrieval system of massive Web data based on multi-attributes
中文摘要传统的Web数据检索一般采用全文检索方法,该方法具有很好的灵活性,但舆情分析往往需要获得相关的网页属性及统计信息。针对传统的Web检索方法无法满足上述需求,基于Hadoop平台设计并实现了一种基于多属性的海量Web数据的关联存储及检索系统,为舆情分析提供基础检索与统计服务。主要实现HDFS上基于属性的网页数据的分类和聚类存储,解决小文件存储同时提高数据访问吞吐量;建立原始网页数据与属性数据之间的关联映射;基于HBase的已有索引机制,结合分布式本地索引机制解决基于HBase的动态属性多条件选择查询的辅助索引问题。; Traditional Web Retrievals commonly use the full-text search method which has good flexibility.However,as the analysis of public opinion usually needs relative information of web attributes and statistics,the traditional retrieval method can not satisfy it well.An associated storage and retrieval system based on the Hadoop platform is designed and implemented,which can offer good basic service for the analysis of public opinion.Firstly,the associated storage of web data based on HDFS is realized by machine learning.Secondly,the problem of small files storage together with the access efficiency of associated data is solved.Thirdly,the mapping between original web data and the extracted attributes is established.Finally,the retrieval of dynamic multiple attributes based on the existed indexing on HBase and the distributed local indexing are realized.
语种中文 ; 中文
内容类型期刊论文
源URL[http://ir.lib.tsinghua.edu.cn/ir/item.do?handle=123456789/146667]  
专题清华大学
推荐引用方式
GB/T 7714
罗芳,李春花,周可,等. 基于多属性的海量Web数据关联存储及检索系统[J],2016, 2016.
APA 罗芳.,李春花.,周可.,黄永峰.,廖正霜.,...&LIAO Zheng-shuang.(2016).基于多属性的海量Web数据关联存储及检索系统..
MLA 罗芳,et al."基于多属性的海量Web数据关联存储及检索系统".(2016).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace