基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
Automatic classification of blog entries is generally treated as a semi-supervised machine learning task, in which the blog entries are automatically assigned to one of a set of pre-defined classes based on the features extracted from their textual content. This paper attempts automatic classification of unstructured blog entries by following pre-processing steps like tokenization, stop-word elimination and stemming;statistical techniques for feature set extraction, and feature set enhancement using semantic resources followed by modeling using two alternative machine learning models—the na?ve Bayesian model and the artificial neural network model. Empirical evaluations indicate that this multi-step classification approach has resulted in good overall classification accuracy over unstructured blog text datasets with both machine learning model alternatives. However, the na?ve Bayesian classification model clearly out-performs the ANN based classification model when a smaller feature-set is available which is usually the case when a blog topic is recent and the number of training datasets available is restricted.
推荐文章
Blog应用的技术解析
Blog
RSS
TrackBack
Tag
信息传播
基于 EPICS 的 J-TEXT CODAC系统
CODAC系统
托卡马克
ITER
EPICS
Blog空间的特征初探
关键词Blog
RSS
特征发现
信息传播
关键词提取
基于Blog支持的知识管理
Blog
知识管理
学习型组织
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 Automatic Classification of Unstructured Blog Text
来源期刊 智能学习系统与应用(英文) 学科 工学
关键词 Automatic BLOG TEXT Classification FEATURE Extraction Machine LEARNING Models SEMI-SUPERVISED LEARNING
年,卷(期) 2013,(2) 所属期刊栏目
研究方向 页码范围 108-114
页数 7页 分类号 TP39
字数 语种
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2013(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
Automatic
BLOG
TEXT
Classification
FEATURE
Extraction
Machine
LEARNING
Models
SEMI-SUPERVISED
LEARNING
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
智能学习系统与应用(英文)
季刊
2150-8402
武汉市江夏区汤逊湖北路38号光谷总部空间
出版文献量(篇)
166
总下载数(次)
0
总被引数(次)
0
论文1v1指导