中国科学院软件研究所机构知识库
Advanced  
ISCAS OpenIR  > 软件所图书馆  > 期刊论文
Title:
资讯类新闻套图系统
Alternative Title: Web Information Extraction and Knowledge Presentation System
Author: 江浩亮 ; 左春
Keyword: Web信息提取 ; 动态数据集 ; 高可扩展性 ; 个性化推荐 ; 套图 ; web information extraction ; dynamic data set ; highly scalable ; personalized recommendations ; imgset
Source: 计算机系统应用
Issued Date: 2014
Issue: 10, Pages:57-62
Department: 中国科学院软件研究所 软件工程技术研发中心,北京 100190; 中国科学院大学,北京 100190 中国科学院软件研究所 软件工程技术研发中心,北京 100190; 中国科学院大学,北京 100190; 中科软科技股份有限公司,北京 100190
Abstract: 考虑到图片具有对事件诠释力强,传播便利的特点,研究了从大量数据密集的新闻Web页面中自动提取数据,并组织成套图结构展现给用户。基于页面模板实现动态页面抽取和解析,处理转换为对应的套图数据结构。基于余弦相关性对来自不同网站的新闻套图数据进行去重,并根据相应的标准,为数据集进行评分排序。考虑巨大的新闻数据和用户数量,本系统基于hadoop分布式平台,满足系统的高可扩展性。本文将详细描述我们的系统设计和实现,并公布在百度资讯图片栏目上的运行结果。 Considering the picture has the futures that a strong interpretation of events and convenient disseminating, this paper studies extraction of data from a large number of news web pages, and organizational structure chart presented to the users. It achieves dynamic pages based on page template extraction and analysis, processing converted to the corresponding sets of datastructure. Based on the news cosine correlation graph data sets from different sites are differentiated, and in accordance with the appropriate standards for data collection to score sorted. This system is based on hadoop distributed platform, considering the large number of users and imgsets. This paper will describe the design and implementation of our system in detail, and report the results of running the system on Baidu news image column.
Language: 中文
Content Type: 期刊论文
URI: http://ir.iscas.ac.cn/handle/311060/16964
Appears in Collections:软件所图书馆_期刊论文

Files in This Item:

There are no files associated with this item.


Recommended Citation:
江浩亮,左春. 资讯类新闻套图系统[J]. 计算机系统应用,2014-01-01(10):57-62.
Service
Recommend this item
Sava as my favorate item
Show this item's statistics
Export Endnote File
Google Scholar
Similar articles in Google Scholar
[江浩亮]'s Articles
[左春]'s Articles
CSDL cross search
Similar articles in CSDL Cross Search
[江浩亮]‘s Articles
[左春]‘s Articles
Related Copyright Policies
Null
Social Bookmarking
Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit
所有评论 (0)
暂无评论
 
评注功能仅针对注册用户开放,请您登录
您对该条目有什么异议,请填写以下表单,管理员会尽快联系您。
内 容:
Email:  *
单位:
验证码:   刷新
您在IR的使用过程中有什么好的想法或者建议可以反馈给我们。
标 题:
 *
内 容:
Email:  *
验证码:   刷新

Items in IR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

 

Valid XHTML 1.0!
Copyright © 2007-2019  中国科学院软件研究所 - Feedback
Powered by CSpace