Institutional Repository
| approximate pairwise clustering for large data sets via sampling plus extension | |
| Wang Liang; Leckie Christopher; Kotagiri Ramamohanarao; Bezdek James | |
| 2011 | |
| Source | Pattern Recognition
![]() |
| ISSN | 0031-3203 |
| Volume | 44Issue:2Pages:222-235 |
| English Abstract | Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd.; Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a "sampling, clustering plus extension" strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method. © 2010 Elsevier Ltd. |
| Indexed Type | EI |
| Department | (1) National Lab of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing 100190 China; (2) Department of Computer Science and Software Engineering University of Melbourne Parkville VIC 3010 Australia |
| Language | 英语 |
| Content Type | 期刊论文 |
| URI | http://ir.iscas.ac.cn/handle/311060/16176 |
| Collection | 中国科学院软件研究所 |
| Recommended Citation GB/T 7714 | Wang Liang,Leckie Christopher,Kotagiri Ramamohanarao,et al. approximate pairwise clustering for large data sets via sampling plus extension[J]. Pattern Recognition,2011,44(2):222-235. |
| APA | Wang Liang,Leckie Christopher,Kotagiri Ramamohanarao,&Bezdek James.(2011).approximate pairwise clustering for large data sets via sampling plus extension.Pattern Recognition,44(2),222-235. |
| MLA | Wang Liang,et al."approximate pairwise clustering for large data sets via sampling plus extension".Pattern Recognition 44.2(2011):222-235. |
| Files in This Item: | There are no files associated with this item. | |||||
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment