中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
基于视频的人机交互及其关键技术研究
作者: 王西颖
答辩日期: 2007-06-08
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 基于视频的人机交互 ; 手势交互 ; 交互框架模型 ; 视频分割 ; 视频跟踪 ; 手势理解 ; 虚拟现实
其他题名: Research on key issues of vision-based interaction
摘要: 随着人机交互技术的发展,各种新的交互手段不断涌现,使人机交互朝着更加自然、高效和更加智能化的方向前进。基于视频的交互(VBI,Vision Based Interaction)或基于摄像头的交互(CBI,Camera Based Interaction)就是近年来出现的新型交互技术之一,并且受到了广泛的关注。通过基于视频的交互方式,人可以按照自身行为习惯完成交互动作,由摄像头感知人的动作和行为,并由计算机进行视频数据的分析与理解,然后自动地完成交互任务,整个过程甚至可以忽略计算机与摄像头的存在。基于视频的交互在虚拟现实、普适计算等领域越来越受到研究人员的重视,并将成为主流交互方式之一。 视频手势交互是VBI的主要方式,手势具有丰富的形状与动作信息,其直接操纵的特点特别适合3D交互任务的完成,但由于人手是一个复杂的变形体,手势本身具有的复杂性、多义性,以及时间与空间上的差异性等特点,使对其进行分析与理解成为一项极富挑战性的研究课题。 基于视频的交互是涉及计算机视觉、数字图像及视频处理、模式识别、人机交互以及行为心理学等多种学科的技术,本文主要针对VBI的交互框架与关键使能技术两个方面进行了研究。基于视频的交互框架研究从总体角度概括了基于视频交互的基本特性,对交互系统开发与具体技术路线应用具有重要的指导意义。针对视频手势交互,本文对其关键使能技术进行了研究,包括视频手势的分割、连续视频跟踪、动态手势建模及理解,并提出了有效的算法或解决方案。 本文主要的创新性成果可归纳为以下五个方面: 1、VBI的STEF交互框架。 首先系统地分析了基于视频交互的基本特点、它的输入输出结构及其主要的计算环境-普适计算、虚拟现实与计算机支持的协同计算环境,在此基础上提出了一种基于视频交互的框架-STEF(Scene,Task,Event and Feedback)模型。STEF模型是一种面向任务的以视觉事件为驱动,并具备反馈机制的循环结构模型。该模型可为基于视频的交互界面(VBUI,Vision-Based User Interface)研究及应用程序设计提供总体指导。 2、基于模糊集的手势图像分割方法 提出了一种新的基于模糊集理论的手势分割方法,定义了三种不同的手势模糊集,以及在模糊集合基础上的模糊形态学处理方法,实现了在连续的视频帧中手区域的精确提取。并且进一步利用图像金字塔技术实现了对手势图像的多分辨率分析,成功实现了手指与手掌部位的分离。 3、一种面向实时交互的视频手势跟踪算法。 针对基于视频手势的实时交互任务,提出了一种快速连续的变形手势跟踪方法,它结合了基于模型与基于表观方法的特点,使跟踪过程建立在对目标对象-手势的理解基础之上,通过识别静态手势并将其模型与图像特征相匹配,实现了自动跟踪初始化和跟踪失败后的自动恢复。跟踪过程中能够动态地更新跟踪模板,以适应多关节手势不断变化的外观轮廓。通过将复杂的高维度特征向量分解为多个2D跟踪模板,跟踪计算量大为减小。 该方法还将K-Means聚类与粒子滤波Particle Filter算法相结合,成功解决了多手指互相干扰的问题。通过跟踪检测实现了目标丢失后的自动恢复,保证了交互的连续性。实验证明,这种方法可以实时地跟踪外观不断变化的手区域与手指尖位置,是一种有效的变形手势跟踪方法。 4、基于HMM-FNN模型的复杂手势识别算法。 提出一种新颖的HMM-FNN模型,它是一种结合了隐马尔可夫模型(HMM,Hidden Markov Model)的时序建模能力与模糊神经网络(FNN, Fuzzy Neural Network)的模糊逻辑表达与推理能力的模型,HMM对观察值序列的似然概率作为各子类别的模糊隶属度,通过模糊神经网络的模糊推理得到最终输出结果。针对复杂的动态手势,提出了基于HMM-FNN模型的手势建模与识别方法。它充分利用了动态手势本身的特点,即动态手势运动特征的可分解性与语义描述上的模糊性,将其分解为手形变化、2D平面运动与Z轴方向运动三个组成部分,通过对手及手指指尖的位置跟踪,获得三组特征值序列作为HMM-FNN模型的输入数据。 与普通HMM模型相比,该方法在对复杂动态手势识别时,通过利用手势本身的特点将复杂问题进行分解,避免了用高维度特征对手势进行描述,从而降低了运算复杂度,提高了系统性能。此外该方法还充分考虑到手势的模糊特性,并通过FNN的形式进行模糊规则的建模与模糊推理,较之简单的确定性推理,系统的鲁棒性得到增强。HMM-FNN模型还充分利用了人的先验知识,在模糊规则的构造与网络连接结构上进行优化处理,提高系统的训练与识别效率。 5、层次化的交互手势建模与理解方法。 针对虚拟现实环境下的交互手势,提出一种层次化方法对交互手势进行建模与分类。根据交互手势的运动特征和交互特点,本文首先给出了一种新的层次化交互手势类型划分方法,然后对不同类型的交互手势通过不同的模型进行描述与表示,这样就避免了采用单一模型导致效率不高的问题。在层次化建模的基础上进行识别和理解,这是一个由粗到精的过程,通过滑动窗技术实时提取手势的全局统计特征,实现手势类别的粗略划分,然后根据各类手势不同特点采用不同的分析方法,从而完成交互任务。此外,交互环境及上下文信息被用以辅助手势的类别划分,提高了识别效率。 基于视频的交互研究具有较高的应用价值和良好的应用前景,目前的研究工作还处于初级阶段,需要进一步解决的问题仍然很多。但本文作者相信,随着计算机科学的发展,以及多学科多领域知识的进一步融合必将大大推动该方向研究的进展,基于视频的交互方式终将进入人们的日常生活,与其他交互方式一起共同构建一个更加自然、和谐的人机交互环境。
英文摘要: Along with the development of Human-Computer Interaction techniques, many new interactive methods have appeared, and constructing highly intelligent and convenient interactive system has never been so promising. Vision hold great promise in building advanced Human-Computer Interaction system, and Vision-Based Interaction (VBI) has attracted interests from many researchers. After capturing human’s action by camera(s), computer will analyze and understand the visual data automatically. Since computer can sense the existence of people and understand their motivation, human can interact with computer by a more natural way, which very similar with the way amony peoples, even with no aware of the presence of camera and computer. Currently, in the research field of Virtual Reality and ubiquitous computing, VBI has been becoming a popular way for the interaction between human and computer. Visual gesture is the most important and effective approach in VBI, because human hand has very abundant shapes and it can perform complicated action easily. However, since hand is an articulated non-rigid object, and hand gesture itself is very complex, the analysis and understand of gesture become a very challenging task. The research of VBI involves many different scientific fields, such as computer vision, pattern recognition, image Processing and etc. The work of this thesis mainly focuses on the interactive framework of VBI and several crucial enabling technologies, e.g. spatial segmentation of visual gesture, continuous tracking of deformable hand gesture, modeling and understand of dynamic gestures and etc. The main innovative achievements of this thesis includes four aspects as follows: 1. STEF interactive framework for VBI. Firstly, we analyze the basic characters of VBI and its input-output structure, and then point out that the most suitable computing environments of VBI are Virtual Reality, Ubiquitous Computing and Computer Supported Cooperative Work. Secondly, we present a new interactive framework of VBI – STEF (Scene, Task, Event and Feedback) model. STEF model is a recurrent model with feedback and driven by visual event. The interactive model can give a general guidance for the design and development of Vision-Based User Interface (VBUI) or vision-based application. 2. Fuzzy-set based gesture segmentation method We present a novel fuzzy-set based gesture segmentation method. It defines three fuzzy sets for gesture images, and it also defines some morphological computing rules for the image process. After hand is segmented, we apply image pyramid method for the division of fingers and palm. Experiments show the proposed method can segment hand and fingers successfully. 3. A novel method for continuous tracking of deformable hand gesture. Towards real-time interactive tasks based on visual hand gesture, we present a novel tracking method that can track deformable hand continuously in real time. The proposed approach combines characteristics of appearance-based method and model-based method. It accomplishes auto-initialization by static posture recognition and matching posture model with image features. High-dimensional 3D feature vector of hand is decomposed into several 2D models in order to reduce computational complexity, and these 2D models will be updated to fit changing hand shape during continuous tracking process. In order to track multi-fingers, the proposed method integrates K-Means clustering with Particle Filter to solve interference problem among fingers. Besides, tracking detection module can find tracking failure and trigger auto-initialization to guarantee the continuous interaction between human and computer. Experiments show that the proposed approach is an effective method that can tracks deformable hand in real time successfully. 4. HMM-FNN based recognition method of dynamic gesture. A novel model, HMM-FNN, is presented in this thesis for the modeling and recognition of complex dynamic gesture. It combines the temporal modeling capability of Hidden Markov Model (HMM), and advantage of Fuzzy Neural Network (FNN) for fuzzy rule modeling and fuzzy reference. By HMM-FNN model, dynamic gesture is firstly decomposed into three independent parts: posture changing, 2D motion trajectory and movement along Z-axis, each of which is modeled by a HMM. The likelihood probability of HMM model to the input sequence is considered as fuzzy membership of FNN model. By way of fuzzy inference, the final classification result of dynamic gesture can be obtained. By this method, high dimensional motion feature is transformed into several low dimensional sub-features, which leads to the reduction of model complexity. Besides, human’s experience can be taken advantaged to build and optimize model structure. Experiments show that the presented method is an effective way for the modeling and recognition of complex dynamic gestures, and is superior to traditional HMM method. 5. Hierarchical modeling and understanding method for interactive gesture. For the interactive gestures in Virtual Reality environment, we propose a hierarchical method for its modeling and understanding. Firstly, we propose a hierarchical method for gesture taxonomy based on their motion feature and interactive characters. Then, different gestures are modeled and represented by different models, which avoid the low efficiency of traditional methods, which often model all gestures by single model, such as HMM. The recognition process based on hierarchical modeling of interactive gestures is a process of “coarse to fine”, in other words, we first calculate global motion feature by sliding window technology to accomplish coarse classification, and then apply different analysis method to different kinds of gestures individually. Besides, interactive context is used to classify gestures, which can improve recognition rate distinctly. Vision-Based interaction has a good perspective due to its high value for actual application. But current research about VBI is still in its infant, and there are many open problems to be solved. However, the author of this thesis believes that as the development of computer science and other scientific branches, the research of VBI will be promoted quickly and greatly, and it will play more and more important role in the process of constructing nature and efficient Human-Computer Interaction environment.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/7536
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200418015029038王西颖_paper.pdf(1696KB)----限制开放-- 联系获取全文

Recommended Citation:
王西颖. 基于视频的人机交互及其关键技术研究[D]. 软件研究所. 中国科学院软件研究所. 2007-06-08.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[王西颖]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[王西颖]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace