The human vision system has a very strong capability in perceiving surround-ings.Such visual perception starts in the retina,which receives and preprocesses the visual input in the form of light,and ends up with high-level processing in the visual cortex of the brain(Fig.la).As such,we can understand what the visual inputs represent while consuming little energy.It has been a long-sought dream for human beings to build a powerful and energy-efficient intelligent vision system that has a superior ability,similar to the human brain.Computer vision,as a similar model to the human brain,aims at viewing,processing and understanding images in the same way as human beings,and has become one major technological advancement in building intelligent machines[1].However,the mainstream technology for computer vision is based on algorithms running on a von Neu-mann architecture computer,and cannot emulate the hierarchical organizations and biological functions of the human vision system.In particular,traditional computer vision in conjunction with the conventional charge-coupled device(CCD)and complementary metal-oxide-semiconductor(CMOS)image sensors suffers from challenges in latency and power consumption as a high volume of redundant visual information is sensed and then has to be processed.