基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
Document Frequency(DF)is reported to be a simple yet quite effective measure for feature selection in text classification,which is a key step in processing big textual data collections.The calculation is based on how many documents in a collection contain a feature,which can be a word,a phrase,a n-gram,or a specially derived attribute.It is an unsupervised and class independent metric.Features of the same DF value may have quite different distribution over different categories,and thus have different discriminative power over categories.For example,in a binary classification problem,if feature A only appears in one category,but feature B,which has the same DF value as feature A,is evenly distributed in both categories.Then,feature A is obviously more effective than feature B for classification.To overcome this weakness of the original document frequency feature selection metric,we,therefore,propose a class based document frequency strategy to further refine the original DF to some extent.Extensive experiments on three text classification datasets demonstrate the effectiveness of the proposed measures.Using Class Based Document Frequency to Select Features
推荐文章
Spatial prediction of landslide susceptibility using GIS-based statistical and machine learning mode
Landslide susceptibility mapping
Statistical model
Machine learning model
Four cases
标准SQL语言中Select语句模糊扩展
SQL语言
Select语句
模糊扩展
基于 EPICS 的 J-TEXT CODAC系统
CODAC系统
托卡马克
ITER
EPICS
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 Using Class Based Document Frequency to Select Features in Text Classification
来源期刊 国际计算机前沿大会会议论文集 学科 社会科学
关键词 Document FREQUENCY DIFFERENT DISTRIBUTION
年,卷(期) 2015,(B12) 所属期刊栏目
研究方向 页码范围 50-52
页数 3页 分类号 C5
字数 语种
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2015(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
Document
FREQUENCY
DIFFERENT
DISTRIBUTION
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
国际计算机前沿大会会议论文集
半年刊
北京市海淀区西三旗昌临801号
出版文献量(篇)
616
总下载数(次)
6
总被引数(次)
0
论文1v1指导