中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 互联网软件技术实验室  > 学位论文
Subject: 计算机软件 ; 计算机软件::软件理论
Title:
基于隐语义图谱的桌面搜索方法研究及应用
Author: 皇甫杨
Issued Date: 2015-05-26
Supervisor: 王青
Major: 计算机软件与理论
Degree Grantor: 中国科学院研究生院
Place of Degree Grantor: 北京
Degree Level: 硕士
Keyword: 隐语义建模 ; 图模型 ; 信息检索 ; 桌面搜索
Abstract:

桌面搜索(或称个人信息检索)是定义在本地个人信息空间上的搜索过程,旨在帮助个人用户有效的搜索到所需要的本地资源(即文件)。近年来随着社会信息化进程的不断推进,大数据时代悄然来临,个人用户在本地计算机上生成和存储的数据爆炸式的增长。个人数据的存储和管理也已经进入了TB级时代。个人计算机用户对快速准确的搜索庞大的本地数据的需求日趋强烈。这使得桌面搜索在近年来成为了工业界和学术界关注和研究的热点领域。在工业界已经有一些被广大用户熟知的桌面搜索解决方案,比如Google Desktop SearchWindows Desktop Search等,这些传统的桌面搜索解决方案实现的是基于关键字的检索,而没有考虑本地资源之间潜在的语义关系。这就要求用户必须准确的记忆和键入搜索关键词,而这样的搜索结果其实是不充分的。在信息检索中,丰富的有意义的关联关系和信号的引入能够有效的提升搜索结果的质量。在本地环境下,资源之间直观上来看相互独立毫无关联。然而个人计算机上的资源的创建、浏览、存储因人而异,和用户的使用习惯、个人经验和记忆等息息相关。用户这种管理资源的习惯、经验和记忆在资源之间无形的产生了某些隐性的语义关联。对资源之间潜在的关联关系进行挖掘和利用为桌面搜索的研究提供了非常多的可能性。通过观察我们发现用户在使用个人计算机时有一个普遍的模式:“操作某些资源以完成跟某个特定主题相关的任务,并且这些资源会被用户根据资源之间的某种关系组织到某些特定的目录中存储”。这一发现启发我们“主题信息”、“用户历史行为信息”、“目录结构”对于定位本地资源是非常有帮助的。

本文提出了一种基于统一的多维隐语义图谱LSGLatent Semantic Graph)的桌面搜索方法。该方法能够分别从本地资源的内容、用户的历史行为数据以及资源的目录存储结构中挖掘并量化两两资源之间的关联关系,并将三种关系整合为统一的隐语义关系图谱 LSG来系统地表征本地资源之间的关联体系。然后在LSG的基础上,分别实现了基于资源之间的隐语义关系的个性化排名算法和推荐算法,来提升传统的基于关键词的搜索效果并向用户推荐更多间接相关的结果以改善用户的搜索体验。当一个查询到来时,本文的搜索方法会先利用向量空间模型从索引抽取相关结果集合,然后基于LSG的排名算法会对结果集进行重排序,同时基于LSG的推荐算法会为结果集中的每个结果推荐5个最相关的本地资源。为了更好的研究基于LSG的搜索方法的有效性,本文设计并实现了基于LSG的桌面搜索原型系统,并将其与主流的桌面搜索引擎以及目前比较先进的学术界方法实现的系统进行对比实验,结果表明本文的方法有着较好的性能和表现。
English Abstract:

Desktop Search refers to the process of searching within one’s personal space of information, which is aimed to help user search the local resources effectively. With the development of informatization, the personal data generated and stored on PC grows rapidly. The management on personal data steps into the era of TB. How to pinpoint the local resources among local data ocean quickly and actually has become a hot research topic in both industry and research communities. Major Internet service providers have released their prominent desktop search applications recently, such as Google Desktop, Windows Desktop. These traditional solutions are keyword based search without considering any kinds of implicit semantic relations, which will result in an insufficient search. In Information retrieval, introducing rich meaningful association signals can help improve the search. Intuitively, it seems that personal resources are independent with each other. In fact, most local items have been explicitly viewed, created, or saved by the user. As such, there items are personal to the individual and are intertwined with personal experience and memories which indicates that implicit associations among local resources exist extensively. These associations can be further used to improve traditional keyword-based search. We observe that users usually operate PC in a common pattern: Operating some resources to finish a specific task related to a certain topic, and organizing these resources in some directories. This observation inspires us that topic, user behaviors and directory structure are quite useful information for locating resources.

In this thesis, we propose a personal information retrieval approach based on a unified multi-dimensional latent semantic graph. The approach exploits the three kinds of information to improve traditional desktop search. We denote the three implicit information as {Task, Topic, Location} Relations respectively. The heart of our approach is Latent Semantic Graph (LSG), which is used to measure the three relations with associated score. Based on LSG, we develop a personalized rank schema to improve tradition keyword-based desktop search and design a creative semantic recommendation algorithm to expand the query results. We implement the prototype system based on LSG and conduct user experiments. Experiments reveal that the performance of proposed approach is superior to that of traditional keyword-based desktop search and our approach is effective.

Content Type: 学位论文
URI: http://ir.iscas.ac.cn/handle/311060/17111
Appears in Collections:互联网软件技术实验室 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
学位论文_皇甫杨_v5_反馈修改版_2.pdf(2957KB)----限制开放 联系获取全文

Recommended Citation:
皇甫杨. 基于隐语义图谱的桌面搜索方法研究及应用[D]. 北京. 中国科学院研究生院. 2015-05-26.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[皇甫杨]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[皇甫杨]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2020  中国科学院软件研究所 - Feedback
Powered by CSpace