基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
Nowadays a common problem when processing data sets with the large number of covariates compared to small sample sizes (fat data sets) is to estimate the parameters associated with each covariate. When the number of covariates far exceeds the number of samples, the parameter estimation becomes very difficult. Researchers in many fields such as text categorization deal with the burden of finding and estimating important covariates without overfitting the model. In this study, we developed a Sparse Probit Bayesian Model (SPBM) based on Gibbs sampling which utilizes double exponentials prior to induce shrinkage and reduce the number of covariates in the model. The method was evaluated using ten domains such as mathematics, the corpuses of which were downloaded from Wikipedia. From the downloaded corpuses, we created the TFIDF matrix corresponding to all domains and divided the whole data set randomly into training and testing groups of size 300. To make the model more robust we performed 50 re-samplings on selection of training and test groups. The model was implemented in R and the Gibbs sampler ran for 60 k iterations and the first 20 k was discarded as burn in. We performed classification on training and test groups by calculating P (yi = 1) and according to [1] [2] the threshold of 0.5 was used as decision rule. Our model’s performance was compared to Support Vector Machines (SVM) using average sensitivity and specificity across 50 runs. The SPBM achieved high classification accuracy and outperformed SVM in almost all domains analyzed.
推荐文章
基于 EPICS 的 J-TEXT CODAC系统
CODAC系统
托卡马克
ITER
EPICS
J-TEXT托卡马克数据采集系统设计
J-TEXT
数据采集
MDSplus
Lyocell与Model织物风格比较
再生纤维素纤维
Lyocell织物
Model织物
风格特征
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 A Fully Bayesian Sparse Probit Model for Text Categorization
来源期刊 统计学期刊(英文) 学科 医学
关键词 BAYESIAN LASSO SHRINKAGE PARAMETER Estimation GENERALIZED Linear MODELS Machine Learning
年,卷(期) 2014,(8) 所属期刊栏目
研究方向 页码范围 611-619
页数 9页 分类号 R73
字数 语种
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2014(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
BAYESIAN
LASSO
SHRINKAGE
PARAMETER
Estimation
GENERALIZED
Linear
MODELS
Machine
Learning
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
统计学期刊(英文)
半月刊
2161-718X
武汉市江夏区汤逊湖北路38号光谷总部空间
出版文献量(篇)
584
总下载数(次)
0
总被引数(次)
0
论文1v1指导