中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Title:
基于极值点分块的重复数据检测算法
Alternative Title: A Duplicate Data Detection Algorithm based on Extremum Deifned Chunking
Author: 谢垂益 ; 卿斯汉
Keyword: 重复数据检测 ; 基于内容分块 ; 基于极值点分块 ; 指纹 ; duplicated data detection ; content deifned chunking ; extremum deifned chunking ; ifngerprint
Source: 信息网络安全
Issued Date: 2013
Issue: 8, Pages:10-12
Department: 韶关学院数学与信息科学学院,广东韶关,512005 中国科学院软件研究所,北京,100190
Abstract: 重复数据检测技术能够大幅降低数据中心的存储量,节省网络带宽,减少建设和运维成本。为了克服基于内容分块(CDC)方法容易出现超长块的缺点,文章提出了基于极值点分块(EDC)的重复数据检测算法。EDC算法先计算出所有右边界在数据块上下限范围内的滑动窗口中数据的指纹,找出最后一个指纹极值,所对应的滑动窗口结束位置作为数据块的分界点,再计算该数据块的哈希值并判断是否重复块。实验结果表明,EDC算法的重复数据检测率、磁盘利用率分别是CDC算法的1.48倍和1.12倍,改进效果显著。 The duplicate data detection technology can significantly reduce the duplication of data in data centers, save network bandwidth, decrease the cost of construction and maintenance. A duplicate data detection algorithm based on Extremum Defined Chunking(EDC) is proposed to overcome the long segment problem of Content Deifned Chunking(CDC) method. The EDC algorithm ifrst calculates all ifngerprints of the sliding windows that their boundary are within the upper and lower limits of data blocks. The last extremum of all ifngerprints is found out, the corresponding end position of the sliding window become the cut-off point of data block. Then the hash value of the data block is calculated to determine whether it is duplicate block. The experimental results show that ECD algorithm, duplicated data detection rate, disk utilization rate is respectively 1.48 times, 1.12 times of CDC algorithm, the effect is signiifcantly notable.
Language: 中文
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/17006
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
谢垂益,卿斯汉. 基于极值点分块的重复数据检测算法[J]. 信息网络安全,2013-01-01(8):10-12.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[谢垂益]'s Articles
[卿斯汉]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[谢垂益]‘s Articles
[卿斯汉]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2020  中国科学院软件研究所 - Feedback
Powered by CSpace