Two fundamental issues surrounding research on the image of the city focus on the city's external and internal representations. The external representation in the context of this article refers to the city itself, external to human minds, whereas the internal representation concerns how the city is represented in human minds internally. This article deals with the first issue; that is, what traits the city has that make it imageable. I develop an argument that the image of the city arises from the underlying scaling of city artifacts or locations. This scaling refers to the fact that, in an imageable city (a city that can easily be imaged in human minds), small city artifacts are far more common than large ones; or, alternatively, low-density locations are far more common than high-density locations. The sizes of city artifacts in a rank-size plot exhibit a heavy-tailed distribution consisting of the head, which is composed of a minority of unique artifacts (vital and very important), and the tail, which is composed of redundant other artifacts (trivial and less important). Eventually, those extremely unique and vital artifacts in the top head or those largest, so to speak, what Lynch called city elements, make up the image of the city. I argue that the ever-increasing amount of geographic information on cities, in particular obtained from social media such as Flickr and Twitter, can turn research on the image of the city, or cognitive mapping in general, into a quantitative manner. The scaling property might be formulated as a law of geography.