基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
Sampling is a fundamental method for generating data subsets. As many data analysis methods are deve-loped based on probability distributions, maintaining distributions when sampling can help to ensure good data analysis performance. However, sampling a minimum subset while maintaining probability distributions is still a problem. In this paper, we decompose a joint probability distribution into a product of conditional probabilities based on Bayesian networks and use the chi-square test to formulate a sampling problem that requires that the sampled subset pass the distribution test to ensure the distribution. Furthermore, a heuristic sampling algorithm is proposed to generate the required subset by designing two scoring functions: one based on the chi-square test and the other based on likelihood functions. Experiments on four types of datasets with a size of 60000 show that when the significant difference level,α, is set to 0.05, the algorithm can exclude 99.9%, 99.0%, 93.1% and 96.7% of the samples based on their Bayesian networks—ASIA, ALARM, HEPAR2, and ANDES, respectively. When subsets of the same size are sampled, the subset generated by our algorithm passes all the distribution tests and the average distribution difference is approximately 0.03; by contrast, the subsets generated by random sampling pass only 83.8%of the tests, and the average distribution difference is approximately 0.24.
推荐文章
Using seismic surveys to investigate sediment distribution and to estimate burial fluxes of OC, N, a
Dongfeng Reservoir
Seismic survey
Sedimentation
Nutrients burial fluxes
Altitude-dependent distribution of 137Cs in the environment: a case study of Aragats massif, Armenia
137Cs
Distribution by altitude
Naturally occurring radionuclides
Topsoil
Dry atmospheric depositions
Gamma radiation
Mountain regions
Distribution of rare earth elements of granitic regolith under the influence of climate
Rare earth elements
Granitic regolith
Weathering
Ce anomaly
Eu anomaly
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 A Heuristic Sampling Method for Maintaining the Probability Distribution
来源期刊 计算机科学技术学报(英文版) 学科
关键词
年,卷(期) 2021,(4) 所属期刊栏目 Regular Paper
研究方向 页码范围 896-909
页数 14页 分类号
字数 语种 英文
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (16)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
1934(1)
  • 参考文献(1)
  • 二级参考文献(0)
1948(1)
  • 参考文献(1)
  • 二级参考文献(0)
1961(1)
  • 参考文献(1)
  • 二级参考文献(0)
1971(1)
  • 参考文献(1)
  • 二级参考文献(0)
1988(1)
  • 参考文献(1)
  • 二级参考文献(0)
1993(1)
  • 参考文献(1)
  • 二级参考文献(0)
1994(1)
  • 参考文献(1)
  • 二级参考文献(0)
1997(2)
  • 参考文献(2)
  • 二级参考文献(0)
2001(1)
  • 参考文献(1)
  • 二级参考文献(0)
2004(1)
  • 参考文献(1)
  • 二级参考文献(0)
2015(1)
  • 参考文献(1)
  • 二级参考文献(0)
2018(4)
  • 参考文献(4)
  • 二级参考文献(0)
2021(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
引文网络交叉学科
相关学者/机构
期刊影响力
计算机科学技术学报(英文版)
双月刊
1000-9000
11-2296/TP
16开
北京中关村科学院南路6号 《计算机科学技术学报(英)》编辑部
1986
eng
出版文献量(篇)
2207
总下载数(次)
1
期刊文献
相关文献
推荐文献
  • 期刊分类
  • 期刊(年)
  • 期刊(期)
  • 期刊推荐
论文1v1指导