基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
Purpose: In the open science era, it is typical to share project-generated scientific data by depositing it in an open and accessible database. Moreover, scientific publications are preserved in a digital library archive. It is challenging to identify the data usage that is mentioned in literature and associate it with its source. Here, we investigated the data usage of a government-funded cancer genomics project, The Cancer Genome Atlas(TCGA), via a full-text literature analysis.Design/methodology/approach: We focused on identifying articles using the TCGA dataset and constructing linkages between the articles and the specific TCGA dataset. First, we collected 5,372 TCGA-related articles from Pub Med Central(PMC). Second, we constructed a benchmark set with 25 full-text articles that truly used the TCGA data in their studies, and we summarized the key features of the benchmark set. Third, the key features were applied to the remaining PMC full-text articles that were collected from PMC.Findings: The amount of publications that use TCGA data has increased significantly since 2011, although the TCGA project was launched in 2005. Additionally, we found that the critical areas of focus in the studies that use the TCGA data were glioblastoma multiforme, lung cancer, and breast cancer; meanwhile, data from the RNA-sequencing(RNA-seq) platform is the most preferable for use.Research limitations: The current workflow to identify articles that truly used TCGA data is labor-intensive. An automatic method is expected to improve the performance.Practical implications: This study will help cancer genomics researchers determine the latest advancements in cancer molecular therapy, and it will promote data sharing and data-intensive scientific discovery.Originality/value: Few studies have been conducted to investigate data usage by governmentfunded projects/programs since their launch. In this preliminary study, we extracted articles that use TCGA data from PMC, and we created a link between the full-text articles and the sourc
推荐文章
Citation-KNN算法改进及其应用
特征选择
多示例学习
Citation-KNN
基于语义的Data Cube数字水印技术
数字水印
语义
数据立方体
版权
Data Transfer Object模式探讨
Data Transfer Object 三层应用 DataSet
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation
来源期刊 数据与情报科学学报:英文版 学科 社会科学
关键词 科学的数据 全文广告 开启出入口 PUBMED 中心的 数据传唤
年,卷(期) 2016,(2) 所属期刊栏目
研究方向 页码范围 32-44
页数 13页 分类号 G35
字数 语种
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2016(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
科学的数据
全文广告
开启出入口
PUBMED
中心的
数据传唤
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
数据与情报科学学报:英文版
季刊
2096-157X
10-1394/G2
北京市中关村北四环西路33号
82-563
出版文献量(篇)
445
总下载数(次)
1
总被引数(次)
0
论文1v1指导