ISCAS OpenIR
accelerating viola-jones facce detection algorithm on gpus
Jia Haipeng; Zhang Yunquan; Wang Weiyan; Xu Jianliang
2012
会议名称IEEE 14th International Conference on High Performance Computing and Communications (HPCC) / IEEE 9th International Conference on Embedded Software and Systems (ICESS)
会议录名称Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012
页码396-403
会议日期JUN 25-27, 2012
会议地点Liverpool, ENGLAND
收录类别ISTP ; EI
ISBN978-0-7695-4749-7
部门归属Jia Haipeng; Zhang Yunquan; Wang Weiyan Chinese Acad Sci Inst Software Lab Parallel Software & Computat Sci Beijing Peoples R China.
摘要The Viola-Jones face detection algorithm represents a class of parallel algorithms that both memory accesses and work distributions are irregular, thereby hard to obtain high performance on GPUs. Furthermore, conventional GPU programming wisdom usually guides us on how to optimize data parallel workloads with regular inputs and outputs. While how to efficiently write task-level parallelism programs with irregular workloads have not much material to reference. In this paper, we present an OpenCL-implementation of Viola-Jones face detection algorithm with high performance on both NVIDIA and AMD GPUs through five main techniques: warp size work granularity, persistent threads, Uberkernel, local and global queues. We also demonstrate the high performance of our implementation by comparing it with a well-optimized CPU version from OpenCV library. Experiment results show that the speedup reaches up to 5.193 similar to 35.08 times (16.91 on average) and 5.85 similar to 32.641 times (17.535 on average) on AMD and NVIDIA GPU respectively.; The Viola-Jones face detection algorithm represents a class of parallel algorithms that both memory accesses and work distributions are irregular, thereby hard to obtain high performance on GPUs. Furthermore, conventional GPU programming wisdom usually guides us on how to optimize data parallel workloads with regular inputs and outputs. While how to efficiently write task-level parallelism programs with irregular workloads have not much material to reference. In this paper, we present an OpenCL-implementation of Viola-Jones face detection algorithm with high performance on both NVIDIA and AMD GPUs through five main techniques: warp size work granularity, persistent threads, Uberkernel, local and global queues. We also demonstrate the high performance of our implementation by comparing it with a well-optimized CPU version from OpenCV library. Experiment results show that the speedup reaches up to 5.193 similar to 35.08 times (16.91 on average) and 5.85 similar to 32.641 times (17.535 on average) on AMD and NVIDIA GPU respectively.
关键词Viola-jones Imbalanced Computation Persistent Threads Local Queues Global Queues
主办者IEEE, IEEE Comp Soc, Univ Bradford, IEEE Tech Comm Scalable Comp (TCSC)
语种英语
内容类型会议论文
URI标识http://ir.iscas.ac.cn/handle/311060/15807
专题中国科学院软件研究所
推荐引用方式
GB/T 7714
Jia Haipeng,Zhang Yunquan,Wang Weiyan,et al. accelerating viola-jones facce detection algorithm on gpus[C],2012:396-403.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Jia Haipeng]的文章
[Zhang Yunquan]的文章
[Wang Weiyan]的文章
百度学术
百度学术中相似的文章
[Jia Haipeng]的文章
[Zhang Yunquan]的文章
[Wang Weiyan]的文章
必应学术
必应学术中相似的文章
[Jia Haipeng]的文章
[Zhang Yunquan]的文章
[Wang Weiyan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。