ISCAS OpenIR
dacoop: accelerating data-iterative applications on map/reduce cluster
Liang Yi; Li Guangrui; Wang Lei; Hu Yanpeng
2011
会议名称2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011
会议录名称Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
页码207-214
会议日期October 20, 2011 - October 22, 2011
会议地点Gwangju, Korea, Republic of
收录类别EI
ISBN9780769545646
部门归属(1) Department of Computer Science Beijing University of Technology Beijing China; (2) Institute of Computing Technology Chinese Academy of Sciences Beijing China; (3) Hwellzen Software Center Shanghai China
摘要Map/reduce is a popular parallel processing framework for massive-scale data-intensive computing. The data-iterative application is composed of a serials of map/reduce jobs and need to repeatedly process some data files among these jobs. The existing implementation of map/reduce framework focus on perform data processing in a single pass with one map/reduce job and do not directly support the data-iterative applications, particularly in term of the explicit specification of the repeatedly processed data among jobs. In this paper, we propose an extended version of Hadoop map/reduce framework called Dacoop. Dacoop extends Map/Reduce programming interface to specify the repeatedly processed data, introduces the shared memorybased data cache mechanism to cache the data since its first access, and adopts the caching-aware task scheduling so that the cached data can be shared among the map/reduce jobs of data-iterative applications. We evaluate Dacoop on two typical data-iterative applications: k-means clustering and the domain rule reasoning in sementic web, with real and synthetic datasets. Experimental results show that the data-iterative applications can gain better performance on Dacoop than that on Hadoop. The turnaround time of a data-iterative application can be reduced by the maximum of 15.1%. © 2011 IEEE.; Map/reduce is a popular parallel processing framework for massive-scale data-intensive computing. The data-iterative application is composed of a serials of map/reduce jobs and need to repeatedly process some data files among these jobs. The existing implementation of map/reduce framework focus on perform data processing in a single pass with one map/reduce job and do not directly support the data-iterative applications, particularly in term of the explicit specification of the repeatedly processed data among jobs. In this paper, we propose an extended version of Hadoop map/reduce framework called Dacoop. Dacoop extends Map/Reduce programming interface to specify the repeatedly processed data, introduces the shared memorybased data cache mechanism to cache the data since its first access, and adopts the caching-aware task scheduling so that the cached data can be shared among the map/reduce jobs of data-iterative applications. We evaluate Dacoop on two typical data-iterative applications: k-means clustering and the domain rule reasoning in sementic web, with real and synthetic datasets. Experimental results show that the data-iterative applications can gain better performance on Dacoop than that on Hadoop. The turnaround time of a data-iterative application can be reduced by the maximum of 15.1%. © 2011 IEEE.
关键词Cache Memory Cluster Computing Multitasking Scheduling Algorithms Turnaround Time
语种英语
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/16322
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Liang Yi,Li Guangrui,Wang Lei,et al. dacoop: accelerating data-iterative applications on map/reduce cluster[C],2011:207-214.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Liang Yi]的文章
[Li Guangrui]的文章
[Wang Lei]的文章
百度学术
百度学术中相似的文章
[Liang Yi]的文章
[Li Guangrui]的文章
[Wang Lei]的文章
必应学术
必应学术中相似的文章
[Liang Yi]的文章
[Li Guangrui]的文章
[Wang Lei]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。