中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Title:
approximate pairwise clustering for large data sets via sampling plus extension
Author: Wang Liang ; Leckie Christopher ; Kotagiri Ramamohanarao ; Bezdek James
Source: Pattern Recognition
Issued Date: 2011
Volume: 44, Issue:2, Pages:222-235
Indexed Type: EI
Department: (1) National Lab of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing 100190 China; (2) Department of Computer Science and Software Engineering University of Melbourne Parkville VIC 3010 Australia
Abstract: Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd.
English Abstract: Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd.
Language: 英语
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/16176
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
Wang Liang,Leckie Christopher,Kotagiri Ramamohanarao,et al. approximate pairwise clustering for large data sets via sampling plus extension[J]. Pattern Recognition,2011-01-01,44(2):222-235.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[Wang Liang]'s Articles
[Leckie Christopher]'s Articles
[Kotagiri Ramamohanarao]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[Wang Liang]‘s Articles
[Leckie Christopher]‘s Articles
[Kotagiri Ramamohanarao]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2020  中国科学院软件研究所 - Feedback
Powered by CSpace