中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 中科院软件所  > 中科院软件所
题名:
人机交互文法获取研究
作者: 张瑞岭
答辩日期: 1999
专业: 计算机软件
授予单位: 中国科学院软件研究所
授予地点: 中国科学院软件研究所
学位: 博士
关键词: 形式规约 ; 上下文无关文法 ; 归纳学习 ; 文法推断 ; 语言分析 ; 复用 ; 逐步求精
摘要: 形式规约是对软件系统所需要解决问题的完备的、精确的刻画。它通常以健全的数学理论为基础,用数学符号描述软件的功能和性质,因此形式规约具有语言清晰、从而有可能实现由机器自动验证其性质的优点。另一方面,形式规约具有健全数学基础这一特性也带来形式规约的一个优点,即难于获取,也就是说如果用户未熟练掌握相应的数学知识,则很难写出很好的形式规约。为解决这一问题,我们尝试将机器学习用于规约获取,即由机器辅助用户获取形式规约。在这一思想指导下,同时根据我们定义的形式规约的表达方法,本文研究人机交互上下文无关文法获取,并实现一个实验性工具SAQ/CL,以帮助人从对语言的片断的、不精确的认识出发获取语言的文法定义。论文首先综述文法推断研究的历史和现状。作为归纳学习的一个重要的研究领域,文法推断研究如何从语言的有限句子出发,通过归纳推理获取语言的文法定义。论文介绍了文法推断的主要理论模型,即Gold的极限识认模型、Angluin的交互式精确认识模型和Valiant的近似学习模型;接着列举上下文无关文法类及其非平凡子类的推断方法,以及目前研究得较多的随机文法、隐马尔可夫模型的推断方法,同时还介绍遗传算法和神经网络用于文法推断的研究成果,并简要分析了各种推断方法的优缺点;最后尝试展望文法推断研究的发展趋势。在综述了文法推断研究的历史和现状之后,论文提出一种人机交互概念获取模型,并以此模型为框架设计了一个基于复用的逐步求精式概念获取算法。该模型的要点包括:概念以上下文无关文法有示;一次获取过程是多个概念联立获取;推断的原始信息主要包括联立获取的未知概念及其有限实例样本集合和可以复用的已知概念;人机交互过程中的主要问题类型包括:成员问题、聚类问题和等价问题。获取算法以复用和逐步求精为主要获取策略;以精简产生式集合计算、循环结构查找和同类子字抽取为主要求精操作。论文给出了获取算法的详细描述,并分析了获取算法的有关性质。在实现方面,论文提出了一种特殊的上下文无关文法(Quasi-CFG, 简称QCFG)。在形式规约获取系统SAQ中,无论简单概念还是复杂概念,其词法和句法定义要求描述在一个完整的文法中。对于复杂概念,如程序设计语言或自然语言,如果用标准的上下文无关文法描述其包含词法结构信息的完整文法定义,则需要将空格和回车等没有实质意义的分隔符合包含到文法定义中去,使得文法定义非常繁琐且不直观。QCFG正是为解决这一问题而提出的,它将标准的上下文无关文法中的终极符集合和非终极符集合进行细化,从而能够把复杂概念的词法和句法描述集成在一个文法中,进而可以把词法分析和句法分析结合到一个完整的语法分析过程中。论文同时给出了相应于QCFG的语法分析算法和语法树生成算法。实现方面的技术还包括提出从给定上下文无关法随机生成句子的两种方法,自顶向下和自底向上。本文首次将文法推断用于规约获取研究。其问题的特殊性在于:(1)要求获取的文法结构自然;(2)从正例出发通过人机交互学习规模较大的语言。论文针对这些特殊性,提出以复用和逐步求精为主要策略的文法推断模型和方法,并在机器上实现概念获取和检验子系统SAQ/CL,通过一系列实验,检验了本文的研究成果。
英文摘要: A Formal Specification(FS) is a complete and accurate description of tasks which is a software system should perform. Generally FS employs sound mathematical theories as its foundation, so FS has clear semantics and it is probable to verify its accuracy automatically by machine. On the other hand, FS's mathematical foundation gives birth to its disadvantage of difficulty in acquisition. As an attempt to solve this problem, we apply techniques in Machine Learning to FS acquisition. Motivated by this intention this thesis deals with how to acquire context free grammars from examples by human-machine interaction and implements an experimental system SAQ/CL to help users acquire grammatical definitions of languages from their fragmental or inaccurate knowledge concerning the languages. Firstly, the thesis gives a survey of research of grammatical inference(GI). As an important area in inductive learning, GI deals with how to obtain the grammatical description of a formal language from given finite data drawn from the language by inductive inference. The thesis first introduces three major learning models of GI: Gold's language identification in the limit, Angluin's query learning model, and Valiant's probably approximately correct learning. Then the thesis enumerates methods for GI with an emphasis on results concerning the inference of context free grammar class and its subclasses, hidden Markov models, and stochastic context free grammar class. In addition, Genetic algorithms and neural network learning in GI are discussed. The thesis also discusses advantages as well as disadvantages of each method, and gives the future directions of GI research. Secondly, an interactive model of concept acquisition and a stepwise refining method for acquiring concept from positive examples are presented. The main points of the model include: (1)concepts are defined by context free grammars; (2) a learning process may involve one or more unknown concept(s) to be learned; (3) the source information for inference include positive examples of unknown concepts and/or known concepts to be reused; (4) the queries during human-machine interaction are of three types: membership, cluster and equivalence. The inferring method employs reuse and stepwise refinement as its basic strategies. The operations of refinement consists of computation of reduced productions, extraction of repeating patterns, and substring cluster. The formal description of the method as well as the discussion of its accuracy are presented. In implementation, the thesis proposes a special type of context free grammars (Quasi-CFG, QCFG in short) which is obtained by dividing the set of nonterminals and the set of terminals of a conventional context-free grammar into two respectively, and it can be used to describe the overall structural information (which consists of lexical structure and syntax) of complicated concepts, such as programming languages and natural languages. As a result, lexical analysis and syntax analysis can be integrated into one parsing process. The thesis presents the algorithm for Quasi-CFG's parsing as well as the corresponding algorithm for generation of derivation tree. As another important technique is implementation, the thesis also presents two algorithms for randomly generating sentences from a given context free grammar in top-down and bottom-up ways respectively. The thesis first applies GI to specification acquisition where the particularities of GI problem lie in requirements that acquired grammars have natural structures and that the learning method should be able to tackle nontrivial languages. To meet these particularities, the thesis presents a model as well as an algorithm for acquiring grammars by interaction. The thesis work also includes the implementation of Concept Acquisition and Verification System(SAQ/CL). A series of experiments have been conducted on SAQ/CL and the experimental results, demonstrate that the system can successfully infer nontrivial CF grammars with acceptable time complexity and precision.
语种: 中文
内容类型: 学位论文
URI标识: http://ir.iscas.ac.cn/handle/311060/5878
Appears in Collections:中科院软件所

Files in This Item:
File Name/ File Size Content Type Version Access License
LW002845.pdf(2303KB)----限制开放-- 联系获取全文

Recommended Citation:
张瑞岭. 人机交互文法获取研究[D]. 中国科学院软件研究所. 中国科学院软件研究所. 1999-01-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[张瑞岭]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[张瑞岭]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2017  中国科学院软件研究所 - Feedback
Powered by CSpace