中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件工程技术研究开发中心  > 学位论文
Title:
基于NoSQL的保险行业垂直搜索引擎的研究与实现
Author: 覃元元
Issued Date: 2012-04
Supervisor: 左春
Major: 计算机技术
Degree Grantor: 中国科学院研究生院
Place of Degree Grantor: 北京
Degree Level: 硕士
Keyword: NoSQL ; 文档数据库 ; 信息收集 ; 信息检索
Abstract:
    随着互联网的普及与发展,电子商务正变得越来越普遍,在保险行业,保险产品作为一种商品出现在电子商务平台上被进行交易正变得越来越普遍。不过这些电子商务网站上的保险产品大都是和网站有合作关系的少数几家保险公司旗下的产品,不能真正的为消费者提供丰富全面的产品信息。而垂直搜索引擎则可以通过从各个保险公司的产品网站上爬取产品数据,然后对数据进行处理并提供检索,消费者根据自己的需求可以在所有的保险公司旗下的产品中检索查询出相关的产品。
    本文首先对NoSQL相关的技术进行了分析和研究,然后详细介绍了面向文档的数据库MongoDB的特点和操作,最后根据本系统数据存储的特点,确定采用MongoDB作为系统数据存储的方案。接下来对基于保险行业的垂直搜索引擎系统所涉及到的关键技术和模块进行了分析与研究,研究了如何使用Heritrix和HtmlParser来实现主题的网络爬虫收集保险产品数据和基于Lucene的索引建立和检索的模型,同时还对相关的Web系统进行了研究和设计。论文还介绍了面向保险行业的垂直搜索引擎系统的设计目标以及该系统的体系结构,并对于保险产品数据的存储和相关的数据操作实现进行了研究,该系统充分利用和实现了上述研究成果,并通过使用Heritrix,HtmlParser,Lucene,IKAnalyer,struts2以及jQuery等相关的工具组件实现了这个基于保险行业的垂直搜索系统。
English Abstract:
    With the popularity of the Internet, e-commerce is becoming more common in our life; insurance products are transactions by e-commerce platform are becoming more common. However, the insurance products on these e-commerce sites can not really provide rich and comprehensive product information for consumers because there are only a few of the insurance company's products. The vertical search engine can climb the insurance company sites to get the insurance product data and retrieval the insurance product, consumers can retrieve all of company’s products according to their needs.
    The first this paper has analysis and research technology about NoSQL, and then described in detail the characteristics and operation of the document-oriented database MongoDB, and finally to determine MongoDB program as a system of data storage according to the characteristics of the system's data storage. the second this paper has analysis and research these technologies are involved in the insurance industry vertical search engine,for example how to use the Heritrix and HtmlParser to realize the Web crawler to collect the insurance product data and based on the Lucene index and retrieve model, but also on the associated Web system research and design. Finally,The paper also introduces the Design goals and architecture of the vertical search engine system, and the technology about data storage and data manipulation by MongoDB, According to the above study and research,vertical search system based on the insurance industry has been realized by use Heritrix, HtmlParser, Lucene, IKAnalyer, struts2 and jQuery and other tools components.
Language: 中文
Content Type: 学位论文
URI: http://ir.iscas.ac.cn/handle/311060/14527
Appears in Collections:软件工程技术研究开发中心 _学位论文

Files in This Item:
File Name/ File Size Content Type Version Access License
基于NoSQL的保险行业垂直搜索引擎的研究与实现.pdf(1003KB)----限制开放 联系获取全文

Recommended Citation:
覃元元. 基于NoSQL的保险行业垂直搜索引擎的研究与实现[D]. 北京. 中国科学院研究生院. 2012-04-01.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[覃元元]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[覃元元]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2020  中国科学院软件研究所 - Feedback
Powered by CSpace