基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
This paper examines automatic recognition and extraction of tables from a large collection of het-erogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which an algorithm recognises the table portion of the documents. Hidden Markov Model (HMM) is then applied to the HTML code in order to extract the tables. The model was trained and tested with five hundred and twenty six self-generated tables (three hundred and twenty-one (321) tables for training and two hundred and five (205) tables for testing). Viterbi algorithm was implemented for the testing part. The system was evaluated in terms of accuracy, precision, recall and f-measure. The overall evaluation results show 88.8% accuracy, 96.8% precision, 91.7% recall and 88.8% F-measure revealing that the method is good at solving the problem of table extraction.
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 Automatic Table Recognition and Extraction from Heterogeneous Documents
来源期刊 电脑和通信(英文) 学科 工学
关键词 Hidden MARKOV Model Table Recognition and EXTRACTION HYPERTEXT MARKUP Language HETEROGENEOUS DOCUMENTS
年,卷(期) 2015,(12) 所属期刊栏目
研究方向 页码范围 100-110
页数 11页 分类号 TP39
字数 语种
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2015(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
Hidden
MARKOV
Model
Table
Recognition
and
EXTRACTION
HYPERTEXT
MARKUP
Language
HETEROGENEOUS
DOCUMENTS
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
电脑和通信(英文)
月刊
2327-5219
武汉市江夏区汤逊湖北路38号光谷总部空间
出版文献量(篇)
783
总下载数(次)
0
总被引数(次)
0
论文1v1指导