ISCAS OpenIR  > 软件工程技术研究开发中心
基于NoSQL的保险行业垂直搜索引擎的研究与实现
覃元元
Major计算机技术
Supervisor左春
2012-04
Degree Grantor中国科学院研究生院
Degree Level硕士
Place of Degree Grantor北京
KeywordNosql 文档数据库 信息收集 信息检索
English Abstract
    随着互联网的普及与发展,电子商务正变得越来越普遍,在保险行业,保险产品作为一种商品出现在电子商务平台上被进行交易正变得越来越普遍。不过这些电子商务网站上的保险产品大都是和网站有合作关系的少数几家保险公司旗下的产品,不能真正的为消费者提供丰富全面的产品信息。而垂直搜索引擎则可以通过从各个保险公司的产品网站上爬取产品数据,然后对数据进行处理并提供检索,消费者根据自己的需求可以在所有的保险公司旗下的产品中检索查询出相关的产品。
    本文首先对NoSQL相关的技术进行了分析和研究,然后详细介绍了面向文档的数据库MongoDB的特点和操作,最后根据本系统数据存储的特点,确定采用MongoDB作为系统数据存储的方案。接下来对基于保险行业的垂直搜索引擎系统所涉及到的关键技术和模块进行了分析与研究,研究了如何使用Heritrix和HtmlParser来实现主题的网络爬虫收集保险产品数据和基于Lucene的索引建立和检索的模型,同时还对相关的Web系统进行了研究和设计。论文还介绍了面向保险行业的垂直搜索引擎系统的设计目标以及该系统的体系结构,并对于保险产品数据的存储和相关的数据操作实现进行了研究,该系统充分利用和实现了上述研究成果,并通过使用Heritrix,HtmlParser,Lucene,IKAnalyer,struts2以及jQuery等相关的工具组件实现了这个基于保险行业的垂直搜索系统。
Abstract
    With the popularity of the Internet, e-commerce is becoming more common in our life; insurance products are transactions by e-commerce platform are becoming more common. However, the insurance products on these e-commerce sites can not really provide rich and comprehensive product information for consumers because there are only a few of the insurance company's products. The vertical search engine can climb the insurance company sites to get the insurance product data and retrieval the insurance product, consumers can retrieve all of company’s products according to their needs.
    The first this paper has analysis and research technology about NoSQL, and then described in detail the characteristics and operation of the document-oriented database MongoDB, and finally to determine MongoDB program as a system of data storage according to the characteristics of the system's data storage. the second this paper has analysis and research these technologies are involved in the insurance industry vertical search engine,for example how to use the Heritrix and HtmlParser to realize the Web crawler to collect the insurance product data and based on the Lucene index and retrieve model, but also on the associated Web system research and design. Finally,The paper also introduces the Design goals and architecture of the vertical search engine system, and the technology about data storage and data manipulation by MongoDB, According to the above study and research,vertical search system based on the insurance industry has been realized by use Heritrix, HtmlParser, Lucene, IKAnalyer, struts2 and jQuery and other tools components.
Language中文
Content Type学位论文
URIhttp://ir.iscas.ac.cn/handle/311060/14527
Collection软件工程技术研究开发中心
Recommended Citation
GB/T 7714
覃元元. 基于NoSQL的保险行业垂直搜索引擎的研究与实现[D]. 北京. 中国科学院研究生院,2012.
Files in This Item:
File Name/Size DocType Version Access License
基于NoSQL的保险行业垂直搜索引擎的研(1003KB) 开放获取LicenseApplication Full Text
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[覃元元]'s Articles
Baidu academic
Similar articles in Baidu academic
[覃元元]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[覃元元]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.