ISCAS OpenIR  > 中科院软件所  > 中科院软件所
作者: 董静
答辩日期: 2007-06-07
授予单位: 中国科学院软件研究所
授予地点: 软件研究所
学位: 博士
关键词: 语义角色标注 ; 语义角色标注规范 ; 汉语实体关系抽取
其他题名: The Study of Semantic Role Labeling Based on the CRF Model
摘要: 语义分析是自然语言理解的关键技术之一。尽管经过几十年的发展,目前仍没有成熟的方法能够自动获取文本的语义信息。语义角色自动标注是对这一关键技术的初步探索。 本文首先介绍语义角色标注的定义和语料资源,其后在对现有的语义角色标注方法进行较深入分析和总结的基础上,提出了不同的特征选择方法来对英语和汉语进行语义角色标注。在汉语的实体关系抽取中,提出区分包含关系和非包含关系的方法,通过引入语义角色标注的特征来提高汉语实体关系的抽取精度。具体地说,本文的主要工作有: 第一,从标注方法、标注步骤以及特征选择这三个方面总结和分析了目前国内外语义角色标注的研究现状。另外,还介绍了汉语语义角色标注以及语义角色标注的应用情况,包括自动问答,信息抽取,机器翻译等。在此基础上,我们给出了未来语义角色标注研究值得关注的几个问题。 第二,针对基于树条件随机场模型的英语语义角色标注方法中,句法树父子结点之间的约束关系相对比较薄弱的不足,我们提出“压平”句法树,在线性链条件随机场模型中引入句法树“水平层次”上的角色标签之间的马尔科夫依赖关系。另外,在比较了不同特征对于语义角色标注任务的影响后我们提出了一些新的特征以提高英语语义角色标注系统的性能。 第三,针对汉语语义角色标注任务,我们定义了12角色的汉语语义角色标注规范,并且选用863TreeBank中部分新闻和应用类型的文档作为标注语料。同样基于线性链条件随机场模型,但结合汉语的特点,我们提出了更多的语言特征,取得了积极的效果。 第四,在汉语实体关系抽取任务中,我们提出将实体关系抽取划分为包含关系抽取和非包含关系抽取两个子任务。针对这两种关系的差异,我们采取不同的适合各自特点的句法特征集,尤其是对非包含关系,我们引入语义信息,借鉴语义角色标注的特征,以提高汉语实体关系抽取系统的性能。
英文摘要: Automatic Semantic Analysis is one of the key technologies in Natural Language Processing. However, through several decades’ study, threre are still no way to get the detailed semantic relations betweent the words of a sentence. Now Semantic Role Labeling (SRL) plays an important role as a rising exploration to semantic understanding. This thesis starts with an introduction of the concept and corpus of Semantic Role Labeling and then makes an in-depth analysis on the existing SRL methods. Several methods are proposed to deal with the SRL of English and Chinese. Additional, in the task of Chinese Entity Relations Extraction, we divide the entity relations into two categories: embedding relations and non-embedding relations. The information of semantic roles, especially the features used in SRL, is taken to support the identification of relations. The main works of this thesis are as follows: First, when studying the others’ researches at home and abroad, we classify the commonly used Semantic Role Labeling methods from three aspects: labeling algorithms, steps of labeling and feature selection, and then give them a detailed introduction. In addition, we also introduce the research on Chinese Semantic Role Labeling, as well as its application in other areas of Natural Language Processing, including automatic question answering, information extraction, machine translation etc. On these basises, we propose several research points in SRL, which may attract much attention in near future. Secondly, observing the weak dependence of the parent-child pairs in tree-based Conditional Random Fields, we present an approach to Semantic Role labeling based on linear Conditional random Fields Model, by incorporating tree-to-string alignment and linear dependence among role labels at horizontal level. Furthermore, we also analyze the effects of various features in the task of labeling and suggest some new features. Compared with the tree-based CRF model, our approach not only reduces the time and space costs, but also increases the accuracy of the system. Thirdly, we define a 12 roles annotation standard to create Chinese semantic role corpus. Part documents on news and application of 863TreeBank are chosen to be annotated semantic roles artificially. Then, based on linear Conditional random Fields Model, we suggest several lexiconized features in Chinese SRL and achieve a good result. Finally, in Chinese Entity Relation Extraction, we present a novel method through dividing the entity relations into two categories: embedding relations and non-embedding relations. After some simple experiments, we discover that some syntactic features have explicitly different effects on the identification of the two kinds of relations. So two different set of syntactic features are suggested to extract the two categories. Especially in the sub-task of non-embedding relations extraction, the information of semantic roles and features are used to improve performance of the system.
语种: 中文
内容类型: 学位论文
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
10001_200428015029013董静_paper.doc(941KB)----限制开放-- 联系获取全文

Recommended Citation:
董静. 基于CRF模型的语义角色标注研究[D]. 软件研究所. 中国科学院软件研究所. 2007-06-07.
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[董静]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[董静]‘s Articles
Related Copyright Policies
Social Bookmarking
Add to CiteULike Add to Connotea Add to Add to Digg Add to Reddit
所有评论 (0)
内 容:
Email:  *
验证码:   刷新
标 题:
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.



Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace