CORC  > 北京大学  > 信息科学技术学院
A Method to Identify and Correct Problematic Software Activity Data: Exploiting Capacity Constraints and Data Redundancies
Zheng, Qimu ; Mockus, Audris ; Zhou, Minghui
2015
关键词data quality mining software repositories capacity constraint data redundancy
英文摘要Mining software repositories to understand and improve software development is a common approach in research and practice. The operational data obtained from these repositories often do not faithfully represent the intended aspects of software development and, therefore, may jeopardize the conclusions derived from it. We propose an approach to identify problematic values based on the constraints of software development and to correct such values using data redundancies. We investigate the approach using issue and commit data of Mozilla project. In particular, we identified problematic data in four types of events and found the fraction of problematic values to exceed 10% and rapidly rising. We found the corrected values to be 50% closer to the most accurate estimate of task completion time. Finally, we found that the models of time until fix changed substantially when data were corrected, with the corrected data providing a 20% better fit. We discuss how the approach may be generalized to other types of operational data to increase fidelity of software measurement in practice and in research.; EI; CPCI-S(ISTP); zheng.qm@163.com; audris@utk.edu; zhmh@pku.edu.cn; 637-648
语种英语
出处10th Joint Meeting of the European Software Engineering Conference (ESEC) / ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE)
DOI标识10.1145/2786805.2786866
内容类型其他
源URL[http://ir.pku.edu.cn/handle/20.500.11897/436704]  
专题信息科学技术学院
推荐引用方式
GB/T 7714
Zheng, Qimu,Mockus, Audris,Zhou, Minghui. A Method to Identify and Correct Problematic Software Activity Data: Exploiting Capacity Constraints and Data Redundancies. 2015-01-01.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace