CORC  > 北京大学  > 信息科学技术学院
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Chen, Qi ; Yao, Jinyu ; Xiao, Zhen
刊名IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
2015
关键词MapReduce data skew sampling partitioning JOINS
DOI10.1109/TPDS.2014.2350972
英文摘要MapReduce is an effective tool for parallel data processing. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data assigned to each task. This causes some tasks to take much longer to finish than others and can significantly impact performance. This paper presents LIBRA, a lightweight strategy to address the data skew problemamong the reducers of MapReduce applications. Unlike previous work, LIBRA does not require any pre-run sampling of the input data or prevent the overlap between the map and the reduce stages. It uses an innovative sampling method which can achieve a highly accurate approximation to the distribution of the intermediate data by sampling only a small fraction of the intermediate data during the normal map processing. It allows the reduce tasks to start copying as soon as the chosen sample map tasks (only a small fraction of map tasks which are issued first) complete. It supports the split of large keys when application semantics permit and the total order of the output data. It considers the heterogeneity of the computing resources when balancing the load among the reduce tasks appropriately. LIBRA is applicable to a wide range of applications and is transparent to the users. We implement LIBRA in Hadoop and our experiments show that LIBRA has negligible overhead and can speed up the execution of some popular applications by; National High Technology Research and Development Program ("863" Program) of China [2013AA013203]; National Natural Science Foundation of China [61170056]; SCI(E); EI; ARTICLE; chenqi@net.pku.edu.cn; yjy@net.pku.edu.cn; xiaozhen@net.pku.edu.cn; 9; 2520-2533; 26
语种英语
内容类型期刊论文
源URL[http://ir.pku.edu.cn/handle/20.500.11897/416630]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Chen, Qi,Yao, Jinyu,Xiao, Zhen. LIBRA: Lightweight Data Skew Mitigation in MapReduce[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,2015.
APA Chen, Qi,Yao, Jinyu,&Xiao, Zhen.(2015).LIBRA: Lightweight Data Skew Mitigation in MapReduce.IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS.
MLA Chen, Qi,et al."LIBRA: Lightweight Data Skew Mitigation in MapReduce".IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2015).
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace