Feature extraction and fusion is important in big data, but the dimension is too big to learn a good representation. To learn a better feature extraction, a method that combines the KL divergence with feature extraction is proposed. Firstly the initial feature was extracted from the primitive data by matrix decomposition. Then the feature was further optimized by using KL divergence, where KL divergence was introduced to the loss function to make the goal function with the shortest KL distance. The experiment is implemented in four datasets such as COIL-20, COIL-100, CBCI 3000 and USPclassifyAL. The result shows that the proposed method outperforms the other four methods in the accuracy when using least number of features.