Defect recognition based on picture analysis is one of the most important means to detect key failure points or damages. However, the recognition rate is low due to the limited number of pictures collected on site, so the computer training set is limited, which leads to a lack of studying and training. Meanwhile, the internet provides a large number of related pictures which can be used as an important data source for training picture analyzing engines. By using an internet spider under certain rules, one can freely collect information on the internet. The internet spider described in this paper can automatically collect related images from the internet as well as search for similar images by leveraging on a local seed picture. This spider also has a parallel version, which can give significant performance boost when run.