approximate pairwise clustering for large data sets via sampling plus extension

ISCAS OpenIR

	approximate pairwise clustering for large data sets via sampling plus extension
	Wang Liang; Leckie Christopher; Kotagiri Ramamohanarao; Bezdek James
	2011
发表期刊	Pattern Recognition
ISSN	0031-3203
卷号	44 期号:2 页码:222-235
摘要	Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd.; Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd.
收录类别	EI
部门归属	(1) National Lab of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing 100190 China; (2) Department of Computer Science and Software Engineering University of Melbourne Parkville VIC 3010 Australia
语种	英语
内容类型	期刊论文
URI标识	http://ir.iscas.ac.cn/handle/311060/16176
专题	中国科学院软件研究所
推荐引用方式 GB/T 7714	Wang Liang,Leckie Christopher,Kotagiri Ramamohanarao,et al. approximate pairwise clustering for large data sets via sampling plus extension[J]. Pattern Recognition,2011,44(2):222-235.
APA	Wang Liang,Leckie Christopher,Kotagiri Ramamohanarao,&Bezdek James.(2011).approximate pairwise clustering for large data sets via sampling plus extension.Pattern Recognition,44(2),222-235.
MLA	Wang Liang,et al."approximate pairwise clustering for large data sets via sampling plus extension".Pattern Recognition 44.2(2011):222-235.