Consider the geo-localization task of finding the pose of a camera in a large 3D scene from a single image.Most existing CNN-based methods use as input textured images.We aim to experimentally explore whether texture and correlation between nearby images are necessary in a CNN-based solution for the geo-localization task.To do so,we consider lean images,textureless projections of a simple 3D model of a city.They only contain information related to the geometry of the scene viewed(edges,faces,and relative depth).The main contributions of this paper are:(i)to demonstrate the ability of CNNs to recover camera pose using lean images;and(ii)to provide insight into the role of geometry in the CNN learning process.