ISCAS OpenIR
handling missing data in software effort prediction with naive bayes and em algorithm
Zhang Wen; Yang Ye; Wang Qing
2011
会议名称7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011
会议录名称ACM International Conference Proceeding Series
页码-
会议日期September
会议地点Banff, AB, Canada
收录类别EI
ISBN9781450307093
部门归属(1) Laboratory for Internet Software Technologies Institute of Software Chinese Academy of Sciences Beijing 100190 China
摘要Background: Missing data, which usually appears in software effort datasets, is becoming an important problem in software effort prediction. Aims: In this paper, we adapt nai¨Bayes and EM (Expectation Maximization) for software effort prediction, and develop two embedded strategies: missing data toleration and missing data imputation, to handle the missing data in software effort datasets. Method: The missing data toleration strategy ignores missing values in software effort datasets while missing data imputation strategy uses observed values to impute missing values. Results: Experiments on ISBSG and CSBSG datasets demonstrate that: 1)both proposed strategies outperform BPNN with classic imputation techniques as MI and MINI. Meanwhile, the imputation strategy outperforms toleration strategy in most cases and has produced the highest accuracy as 75.15%; 2) the unlabeled projects used in training prediction model has signifintly improved the performances of effort prediction of nai¨Bayes and EM with both strategies, especially when the size of training data to the size of unlabeled data is at a relatively optimal level; 3) each class of software effort data exactly corresponds to a Gaussian component for both ISBSG and CSBSG datasets. Conclusion: Although initial experiments on ISBSG data set demonstrate some promising aspects of the proposed strategies, we cannot draw that they can be generalized to be applied in all the other software effort datasets. Copyright © 2011 ACM.; Background: Missing data, which usually appears in software effort datasets, is becoming an important problem in software effort prediction. Aims: In this paper, we adapt nai¨Bayes and EM (Expectation Maximization) for software effort prediction, and develop two embedded strategies: missing data toleration and missing data imputation, to handle the missing data in software effort datasets. Method: The missing data toleration strategy ignores missing values in software effort datasets while missing data imputation strategy uses observed values to impute missing values. Results: Experiments on ISBSG and CSBSG datasets demonstrate that: 1)both proposed strategies outperform BPNN with classic imputation techniques as MI and MINI. Meanwhile, the imputation strategy outperforms toleration strategy in most cases and has produced the highest accuracy as 75.15%; 2) the unlabeled projects used in training prediction model has signifintly improved the performances of effort prediction of nai¨Bayes and EM with both strategies, especially when the size of training data to the size of unlabeled data is at a relatively optimal level; 3) each class of software effort data exactly corresponds to a Gaussian component for both ISBSG and CSBSG datasets. Conclusion: Although initial experiments on ISBSG data set demonstrate some promising aspects of the proposed strategies, we cannot draw that they can be generalized to be applied in all the other software effort datasets. Copyright © 2011 ACM.
关键词Algorithms Data Handling Embedded Software Experiments Forecasting Mathematical Models Models Predictive Control Systems Software Engineering
语种英语
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/16211
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Zhang Wen,Yang Ye,Wang Qing. handling missing data in software effort prediction with naive bayes and em algorithm[C],2011:-.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang Wen]的文章
[Yang Ye]的文章
[Wang Qing]的文章
百度学术
百度学术中相似的文章
[Zhang Wen]的文章
[Yang Ye]的文章
[Wang Qing]的文章
必应学术
必应学术中相似的文章
[Zhang Wen]的文章
[Yang Ye]的文章
[Wang Qing]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。