基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
One of the most useful Information Extraction (IE) solutions to Web information harnessing is Named Entity Recognition (NER). Hand-coded rule methods are still the best performers. These methods and statistical methods exploit Natural Language Processing (NLP) features and characteristics (e.g. Capitalization) to extract Named Entities (NE) like personal and company names. For entities with multiple sub-entities of higher cardinality (e.g. linux command, citation) and which are non-speech, these systems fail to deliver efficiently. Promising Machine Learning (ML) methods would require large amounts of training examples which are impossible to manually produce. We call these entities Named High Cardinality Entities (NHCEs). We propose a sequence validation based approach for the extraction and validation of NHCEs. In the approach, sub-entities of NHCE candidates are statistically and structurally characterized during top-down annotation process and guided to transformation into either value types (v-type) or user-defined types (u-type) using a ML model. Treated as sequences of sub-entities, NHCE candidates with transformed sub-entities are then validated (and subsequently labeled) using a series of validation operators. We present a case study to demonstrate the approach and show how it helps to bridge the gap between IE and Intelligent Systems (IS) through the use of transformed sub-entities in supervised learning.
推荐文章
Rapid estimation of soil heavy metal nickel content based on optimized screening of near-infrared sp
Heavy metal
Band extraction
Partial least squares regression
Extreme learning machine
Near infrared spectroscopy
An experimental study of interaction between pure water and alkaline feldspar at high temperatures a
Alkaline feldspar
Autoclave
High-temperature and high-pressure experiments
Thermodynamic properties of San Carlos olivine at high temperature and high pressure
San Carlos olivine
Thermodynamic property
Thermal expansion
Heat capacity
Temperature gradient
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 Sequence Validation Based Extraction of Named High Cardinality Entities
来源期刊 智能科学国际期刊(英文) 学科 医学
关键词 ENTITY Recognition Supervised Learning SEQUENCE VALIDATION Intelligent Systems TEXT MINING
年,卷(期) 2012,(4) 所属期刊栏目
研究方向 页码范围 190-202
页数 13页 分类号 R73
字数 语种
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2012(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
ENTITY
Recognition
Supervised
Learning
SEQUENCE
VALIDATION
Intelligent
Systems
TEXT
MINING
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
智能科学国际期刊(英文)
季刊
2163-0283
武汉市江夏区汤逊湖北路38号光谷总部空间
出版文献量(篇)
102
总下载数(次)
0
总被引数(次)
0
论文1v1指导