Aerial image sequence mosaicking is one of the chal-lenging research fields in computer vision. To obtain large-scale orthophoto maps with object detection information, we propose a vision-based image mosaicking algorithm without any extra location data. According to object detection results, we define a complexity factor to describe the importance of each input ima-ge and dynamically optimize the feature extraction process. The feature points extraction and matching processes are mainly guided by the speeded-up robust features (SURF) and the grid motion statistic (GMS) algorithm respectively. A robust refer-ence frame selection method is proposed to eliminate the trans-formation distortion by searching for the center area based on overlaps. Besides, the sparse Levenberg-Marquardt (LM) al-gorithm and the heavy occluded frames removal method are ap-plied to reduce accumulated errors and further improve the mo-saicking performance. The proposed algorithm is performed by using multithreading and graphics processing unit (GPU) accel-eration on several aerial image datasets. Extensive experiment results demonstrate that our algorithm outperforms most of the existing aerial image mosaicking methods in visual quality while guaranteeing a high calculation speed.