ISCAS OpenIR
accelerating viola-jones facce detection algorithm on gpus
Jia Haipeng; Zhang Yunquan; Wang Weiyan; Xu Jianliang
2012
Conference NameIEEE 14th International Conference on High Performance Computing and Communications (HPCC) / IEEE 9th International Conference on Embedded Software and Systems (ICESS)
SourceProceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
Pages396-403
Conference DateJUN 25-27, 2012
Conference PlaceLiverpool, ENGLAND
Indexed TypeISTP ; EI
ISBN978-0-7695-4749-7
DepartmentJia Haipeng; Zhang Yunquan; Wang Weiyan Chinese Acad Sci Inst Software Lab Parallel Software & Computat Sci Beijing Peoples R China.
English AbstractThe Viola-Jones face detection algorithm represents a class of parallel algorithms that both memory accesses and work distributions are irregular, thereby hard to obtain high performance on GPUs. Furthermore, conventional GPU programming wisdom usually guides us on how to optimize data parallel workloads with regular inputs and outputs. While how to efficiently write task-level parallelism programs with irregular workloads have not much material to reference. In this paper, we present an OpenCL-implementation of Viola-Jones face detection algorithm with high performance on both NVIDIA and AMD GPUs through five main techniques: warp size work granularity, persistent threads, Uberkernel, local and global queues. We also demonstrate the high performance of our implementation by comparing it with a well-optimized CPU version from OpenCV library. Experiment results show that the speedup reaches up to 5.193 similar to 35.08 times (16.91 on average) and 5.85 similar to 32.641 times (17.535 on average) on AMD and NVIDIA GPU respectively.; The Viola-Jones face detection algorithm represents a class of parallel algorithms that both memory accesses and work distributions are irregular, thereby hard to obtain high performance on GPUs. Furthermore, conventional GPU programming wisdom usually guides us on how to optimize data parallel workloads with regular inputs and outputs. While how to efficiently write task-level parallelism programs with irregular workloads have not much material to reference. In this paper, we present an OpenCL-implementation of Viola-Jones face detection algorithm with high performance on both NVIDIA and AMD GPUs through five main techniques: warp size work granularity, persistent threads, Uberkernel, local and global queues. We also demonstrate the high performance of our implementation by comparing it with a well-optimized CPU version from OpenCV library. Experiment results show that the speedup reaches up to 5.193 similar to 35.08 times (16.91 on average) and 5.85 similar to 32.641 times (17.535 on average) on AMD and NVIDIA GPU respectively.
KeywordViola-jones Imbalanced Computation Persistent Threads Local Queues Global Queues
SponsorshipIEEE, IEEE Comp Soc, Univ Bradford, IEEE Tech Comm Scalable Comp (TCSC)
Language英语
Content Type会议论文
URIhttp://ir.iscas.ac.cn/handle/311060/15807
Collection中国科学院软件研究所
Recommended Citation
GB/T 7714
Jia Haipeng,Zhang Yunquan,Wang Weiyan,et al. accelerating viola-jones facce detection algorithm on gpus[C],2012:396-403.
Files in This Item:
There are no files associated with this item.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Jia Haipeng]'s Articles
[Zhang Yunquan]'s Articles
[Wang Weiyan]'s Articles
Baidu academic
Similar articles in Baidu academic
[Jia Haipeng]'s Articles
[Zhang Yunquan]'s Articles
[Wang Weiyan]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Jia Haipeng]'s Articles
[Zhang Yunquan]'s Articles
[Wang Weiyan]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.