ISCAS OpenIR
approximate pairwise clustering for large data sets via sampling plus extension
Wang Liang; Leckie Christopher; Kotagiri Ramamohanarao; Bezdek James
2011
发表期刊Pattern Recognition
ISSN0031-3203
卷号44期号:2页码:222-235
摘要Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd.; Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd.
收录类别EI
部门归属(1) National Lab of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing 100190 China; (2) Department of Computer Science and Software Engineering University of Melbourne Parkville VIC 3010 Australia
语种英语
内容类型期刊论文
URI标识http://ir.iscas.ac.cn/handle/311060/16176
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Wang Liang,Leckie Christopher,Kotagiri Ramamohanarao,et al. approximate pairwise clustering for large data sets via sampling plus extension[J]. Pattern Recognition,2011,44(2):222-235.
APA Wang Liang,Leckie Christopher,Kotagiri Ramamohanarao,&Bezdek James.(2011).approximate pairwise clustering for large data sets via sampling plus extension.Pattern Recognition,44(2),222-235.
MLA Wang Liang,et al."approximate pairwise clustering for large data sets via sampling plus extension".Pattern Recognition 44.2(2011):222-235.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Wang Liang]的文章
[Leckie Christopher]的文章
[Kotagiri Ramamohanarao]的文章
百度学术
百度学术中相似的文章
[Wang Liang]的文章
[Leckie Christopher]的文章
[Kotagiri Ramamohanarao]的文章
必应学术
必应学术中相似的文章
[Wang Liang]的文章
[Leckie Christopher]的文章
[Kotagiri Ramamohanarao]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。