中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 会议论文
Title:
handling missing data in software effort prediction with naive bayes and em algorithm
Author: Zhang Wen ; Yang Ye ; Wang Qing
Source: ACM International Conference Proceeding Series
Conference Name: 7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011
Conference Date: September
Issued Date: 2011
Conference Place: Banff, AB, Canada
Keyword: Algorithms ; Data handling ; Embedded software ; Experiments ; Forecasting ; Mathematical models ; Models ; Predictive control systems ; Software engineering
Indexed Type: EI
ISBN: 9781450307093
Department: (1) Laboratory for Internet Software Technologies Institute of Software Chinese Academy of Sciences Beijing 100190 China
Abstract: Background: Missing data, which usually appears in software effort datasets, is becoming an important problem in software effort prediction. Aims: In this paper, we adapt nai¨Bayes and EM (Expectation Maximization) for software effort prediction, and develop two embedded strategies: missing data toleration and missing data imputation, to handle the missing data in software effort datasets. Method: The missing data toleration strategy ignores missing values in software effort datasets while missing data imputation strategy uses observed values to impute missing values. Results: Experiments on ISBSG and CSBSG datasets demonstrate that: 1)both proposed strategies outperform BPNN with classic imputation techniques as MI and MINI. Meanwhile, the imputation strategy outperforms toleration strategy in most cases and has produced the highest accuracy as 75.15%; 2) the unlabeled projects used in training prediction model has signifintly improved the performances of effort prediction of nai¨Bayes and EM with both strategies, especially when the size of training data to the size of unlabeled data is at a relatively optimal level; 3) each class of software effort data exactly corresponds to a Gaussian component for both ISBSG and CSBSG datasets. Conclusion: Although initial experiments on ISBSG data set demonstrate some promising aspects of the proposed strategies, we cannot draw that they can be generalized to be applied in all the other software effort datasets. Copyright © 2011 ACM.
English Abstract: Background: Missing data, which usually appears in software effort datasets, is becoming an important problem in software effort prediction. Aims: In this paper, we adapt nai¨Bayes and EM (Expectation Maximization) for software effort prediction, and develop two embedded strategies: missing data toleration and missing data imputation, to handle the missing data in software effort datasets. Method: The missing data toleration strategy ignores missing values in software effort datasets while missing data imputation strategy uses observed values to impute missing values. Results: Experiments on ISBSG and CSBSG datasets demonstrate that: 1)both proposed strategies outperform BPNN with classic imputation techniques as MI and MINI. Meanwhile, the imputation strategy outperforms toleration strategy in most cases and has produced the highest accuracy as 75.15%; 2) the unlabeled projects used in training prediction model has signifintly improved the performances of effort prediction of nai¨Bayes and EM with both strategies, especially when the size of training data to the size of unlabeled data is at a relatively optimal level; 3) each class of software effort data exactly corresponds to a Gaussian component for both ISBSG and CSBSG datasets. Conclusion: Although initial experiments on ISBSG data set demonstrate some promising aspects of the proposed strategies, we cannot draw that they can be generalized to be applied in all the other software effort datasets. Copyright © 2011 ACM.
Language: 英语
Content Type: 会议论文
URI: http://ir.iscas.ac.cn/handle/311060/16211
Appears in Collections:软件所图书馆_会议论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
Zhang Wen,Yang Ye,Wang Qing. handling missing data in software effort prediction with naive bayes and em algorithm[C]. 见:7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011. Banff, AB, Canada. September.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Zhang Wen]'s Articles
[Yang Ye]'s Articles
[Wang Qing]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Zhang Wen]‘s Articles
[Yang Ye]‘s Articles
[Wang Qing]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace