基本信息来源于合作网站,原文需代理用户跳转至来源网站获取       
摘要:
Web crawlers have been misused for several malicious purposes such as downloading server data without permission from the website administrator.Moreover,armoured crawlers are evolving against new anti-crawler mechanisms in the arm races between crawler developers and crawler defenders.In this paper,based on one observation that normal users and malicious crawlers have different short-term and long-term download behaviours,we develop a new anti-crawler mechanism called PathMarker to detect and constrain persistent distributed crawlers.By adding a marker to each Uniform Resource Locator (URL),we can trace the page that leads to the access of this URL and the user identity who accesses this URL.With this supporting information,we can not only perform more accurate heuristic detection using the path related features,but also develop a Support Vector Machine based machine learning detection model to distinguish malicious crawlers from normal users via inspecting their different patterns of URL visiting paths and URL visiting timings.In addition to effectively detecting crawlers at the earliest stage,PathMarker can dramatically suppress the scraping efficiency of crawlers before they are detected.We deploy our approach on an online forum website,and the evaluation results show that PathMarker can quickly capture all 6 open-source and in-house crawlers,plus two external crawlers (i.e.,Googlebots and Yahoo Slurp).
推荐文章
期刊_丙丁烷TDLAS测量系统的吸收峰自动检测
带间级联激光器
调谐半导体激光吸收光谱
雾剂检漏 中红外吸收峰 洛伦兹光谱线型
期刊_联合空间信息的改进低秩稀疏矩阵分解的高光谱异常目标检测
高光谱图像
异常目标检测 低秩稀疏矩阵分解 稀疏矩阵 残差矩阵
内容分析
关键词云
关键词热度
相关文献总数  
(/次)
(/年)
文献信息
篇名 PathMarker: protecting web contents against inside crawlers
来源期刊 网络空间安全科学与技术(英文版) 学科
关键词 Anti-Crawler mechanism Stealthy distributed inside crawler Confidential Website content protection
年,卷(期) 2018,(3) 所属期刊栏目
研究方向 页码范围 1-17
页数 17页 分类号
字数 语种 中文
DOI
五维指标
传播情况
(/次)
(/年)
引文网络
引文网络
二级参考文献  (0)
共引文献  (0)
参考文献  (0)
节点文献
引证文献  (0)
同被引文献  (0)
二级引证文献  (0)
2018(0)
  • 参考文献(0)
  • 二级参考文献(0)
  • 引证文献(0)
  • 二级引证文献(0)
研究主题发展历程
节点文献
Anti-Crawler mechanism
Stealthy distributed inside crawler
Confidential Website content protection
研究起点
研究来源
研究分支
研究去脉
引文网络交叉学科
相关学者/机构
期刊影响力
网络空间安全科学与技术(英文版)
季刊
2096-4862
10-1537/T
eng
出版文献量(篇)
54
总下载数(次)
0
论文1v1指导