中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Subject: Computer Science (provided by Thomson Reuters)
Title:
一种Hadoop小文件存储和读取的方法
Alternative Title: n approach for storing and accessing small files on hadoop
Author: 张春明 ; 芮建武 ; 何婷婷
Keyword: HDFS ; 小文件 ; HIFM ; 分层索引 ; 索引预加载 ; 数据预取
Source: 计算机应用与软件
Issued Date: 2012
Volume: 29, Issue:11, Pages:95-100
Indexed Type: CNKI ; WANFANG ; CSCD
Department: 中国科学院软件研究所基础软件国家工程中心;中国科学院研究生院;
Sponsorship: 新闻出版重大科技工程项目(0610-1041BJNF2328/23)|国家科技支撑计划课题(2011BAH14B02)|中国科学院知识创新工程方向性项目课题(KGCX2-YW-174)
Abstract: HDFS(Hadoop Distributed File System)凭借其高容错、可伸缩和廉价存储的优点,在当前面向云计算的应用场景中得到了广泛应用。然而,HDFS设计的初衷是存储超大文件,对于海量小文件,由于NameNode内存开销等问题,其存储和读取性能并不理想。提出一种基于小文件合并的方法 HIFM(Hierarchy Index File Merging),综合考虑小文件之间的相关性和数据的目录结构,来辅助将小文件合并成大文件,并生成分层索引。采用集中存储和分布式存储相结合的方式管理索引文件,并实现索引文件预加载。此外,HIFM采用数据预取的机制,提高顺序访问小文件的效率。实验结果表明,HIFM方法能够有效提高小文件存储和读取效率,显著降低NameNode和DataNode的内存开销,适合应用在有一定目录结构的海量小文件存储的应用场合。
English Abstract: Benefiting from its advantages of high fault-tolerance, scalability and low-cost storage capability, HDFS (Hadoop distributed file system) has been gaining widely application in current cloud computing-based applied scenes. However, HDFS is primarily designed for streaming access of ultra-large files and suffers the performance penalty in both storage and accessing while managing massive small files due to the memory overhead problem of NameNode. In this paper, an approach based on combining small files, called HIFM (hierarchy index file merging), is proposed. In it, the correlations between small files and the directory structure of data are comprehensively considered to assist the small files to be merged into large ones and to generate hierarchical index. Centralised storage and distributed storage methods are jointly used in index files management, and the preload of index files is implemented. Besides, in order to improve the efficiency of sequentially ?accessing? the small files, HIFM adopts data prefetching mechanism. Experimental results show that HIFM can improve the efficiency of ?storing? and accessing small files effectively, and mitigate the memory overhead of NameNode and DataNode obviously. It is suitable for the applications which have massive structured small files storage.
Language: 中文
Citation statistics:
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/15297
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
张春明,芮建武,何婷婷. 一种Hadoop小文件存储和读取的方法[J]. 计算机应用与软件,2012-01-01,29(11):95-100.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张春明]'s Articles
[芮建武]'s Articles
[何婷婷]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张春明]‘s Articles
[芮建武]‘s Articles
[何婷婷]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2020  中国科学院软件研究所 - Feedback
Powered by CSpace