This paper addresses the problem of viewpoint estimation of an object in a given image. It presents five key insights and a CNN that is based on them. The network's major properties are as follows. (i) The architecture jointly solves detection, classi cation, and viewpoint estimation. (ii) New types of data are added and trained on. (iii) A novel loss function, which takes into account both the geometry of the problem and the new types of data, is propose. Our network allows a substantial boost in performance: from 36.1% gained by SOTA algorithms to 45.9%.