ISCAS OpenIR
parallelization and performance optimization on face detection algorithm with opencl: a case study
Wang Weiyan; Zhang Yunquan; Yan Shengen; Zhang Ying; Jia Haipeng
2012
SourceTsinghua Science and Technology
ISSN1007-0214
Volume17Issue:3Pages:287-295
English AbstractFace detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very nave implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore, those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU. © 2012 Tsinghua University Press.; Face detect application has a real time need in nature. Although Viola-Jones algorithm can handle it elegantly, today's bigger and bigger high quality images and videos still bring in the new challenge of real time needs. It is a good idea to parallel the Viola-Jones algorithm with OpenCL to achieve high performance across both AMD and NVidia GPU platforms without bringing up new algorithms. This paper presents the bottleneck of this application and discusses how to optimize the face detection step by step from a very nave implementation. Some brilliant tricks and methods like CPU execution time hidden, stubbles usage of local memory as high speed scratchpad and manual cache, and variable granularity were used to improve the performance. Those technologies result in 4-13 times speedup varying with the image size. Furthermore, those ideas may throw on some light on the way to parallel applications efficiently with OpenCL. Taking face detection as an example, this paper also summarizes some universal advice on how to optimize OpenCL program, trying to help other applications do better on GPU. © 2012 Tsinghua University Press.
Indexed TypeEI
KeywordAlgorithms Optimization
Department(1) Laboratory of Parallel Software and Computational Science Institute of Software Chinese Academy of Science Beijing 100190 China; (2) State Key Laboratory of Computer Science Institute of Software Chinese Academy of Science Beijing 100190 China; (3) Graduate University Chinese Academy of Sciences Beijing 100190 China; (4) Ocean University of China Qingdao 2661 China
Language英语
Content Type期刊论文
URIhttp://ir.iscas.ac.cn/handle/311060/15016
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
Wang Weiyan,Zhang Yunquan,Yan Shengen,et al. parallelization and performance optimization on face detection algorithm with opencl: a case study[J]. Tsinghua Science and Technology,2012,17(3):287-295.
APA Wang Weiyan,Zhang Yunquan,Yan Shengen,Zhang Ying,&Jia Haipeng.(2012).parallelization and performance optimization on face detection algorithm with opencl: a case study.Tsinghua Science and Technology,17(3),287-295.
MLA Wang Weiyan,et al."parallelization and performance optimization on face detection algorithm with opencl: a case study".Tsinghua Science and Technology 17.3(2012):287-295.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Wang Weiyan]'s Articles
[Zhang Yunquan]'s Articles
[Yan Shengen]'s Articles
Baidu academic
Similar articles in Baidu academic
[Wang Weiyan]'s Articles
[Zhang Yunquan]'s Articles
[Yan Shengen]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Wang Weiyan]'s Articles
[Zhang Yunquan]'s Articles
[Yan Shengen]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.