基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
This paper proposes a non-segmented document clustering method using self-organizing map (SOM) and frequent max substring technique to improve the efficiency of information retrieval. SOM has been widely used for document clustering and is successful in many applications. However, when applying to non-segmented document, the challenge is to identify any interesting pattern efficiently. There are two main phases in the propose method: preprocessing phase and clustering phase. In the preprocessing phase, the frequent max substring technique is first applied to discover the patterns of interest called Frequent Max substrings that are long and frequent substrings, rather than individual words from the non-segmented texts. These discovered patterns are then used as indexing terms. The indexing terms together with their number of occurrences form a document vector. In the clustering phase, SOM is used to generate the document cluster map by using the feature vector of Frequent Max substrings. To demonstrate the proposed technique, experimental studies and comparison results on clustering the Thai text documents, which consist of non-segmented texts, are presented in this paper. The results show that the proposed technique can be used for Thai texts. The document cluster map generated with the method can be used to find the relevant documents more efficiently.
推荐文章
Spatial prediction of landslide susceptibility using GIS-based statistical and machine learning mode
Landslide susceptibility mapping
Statistical model
Machine learning model
Four cases
Statistics matters in interpretations of non-traditional stable isotopic data
Isotopic data processing
Error propagation
Significant digits
Difference between means with uncertainties
Forest carbon storage in Guizhou Province based on field measurement dataset
Forest carbon storage
Field measurement dataset
Karst landform
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 A SOM-Based Document Clustering Using Frequent Max Substrings for Non-Segmented Texts
来源期刊 智能学习系统与应用(英文) 学科 医学
关键词 Frequent MAX SUBSTRING SELF-ORGANIZING Map Document Clustering
年,卷(期) 2010,(3) 所属期刊栏目
研究方向 页码范围 117-125
页数 9页 分类号 R73
字数 语种
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2010(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
Frequent
MAX
SUBSTRING
SELF-ORGANIZING
Map
Document
Clustering
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
智能学习系统与应用(英文)
季刊
2150-8402
武汉市江夏区汤逊湖北路38号光谷总部空间
出版文献量(篇)
166
总下载数(次)
0
总被引数(次)
0
论文1v1指导