A Method to Identify and Correct Problematic Software Activity Data: Exploiting Capacity Constraints and Data Redundancies | |
Zheng, Qimu ; Mockus, Audris ; Zhou, Minghui | |
2015 | |
关键词 | data quality mining software repositories capacity constraint data redundancy |
英文摘要 | Mining software repositories to understand and improve software development is a common approach in research and practice. The operational data obtained from these repositories often do not faithfully represent the intended aspects of software development and, therefore, may jeopardize the conclusions derived from it. We propose an approach to identify problematic values based on the constraints of software development and to correct such values using data redundancies. We investigate the approach using issue and commit data of Mozilla project. In particular, we identified problematic data in four types of events and found the fraction of problematic values to exceed 10% and rapidly rising. We found the corrected values to be 50% closer to the most accurate estimate of task completion time. Finally, we found that the models of time until fix changed substantially when data were corrected, with the corrected data providing a 20% better fit. We discuss how the approach may be generalized to other types of operational data to increase fidelity of software measurement in practice and in research.; EI; CPCI-S(ISTP); zheng.qm@163.com; audris@utk.edu; zhmh@pku.edu.cn; 637-648 |
语种 | 英语 |
出处 | 10th Joint Meeting of the European Software Engineering Conference (ESEC) / ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE) |
DOI标识 | 10.1145/2786805.2786866 |
内容类型 | 其他 |
源URL | [http://ir.pku.edu.cn/handle/20.500.11897/436704] |
专题 | 信息科学技术学院 |
推荐引用方式 GB/T 7714 | Zheng, Qimu,Mockus, Audris,Zhou, Minghui. A Method to Identify and Correct Problematic Software Activity Data: Exploiting Capacity Constraints and Data Redundancies. 2015-01-01. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论