Map generalization is a process of producing maps at different levels of detail by retaining essential properties of the underlying geographic space. In this paper, we explore how the map generalization process can be guided by the underlying scaling of geographic space. The scaling of geographic space refers to the fact that in a geographic space small things are far more common than large ones. In the corresponding rank-size distribution, this scaling property is characterized by a heavy tailed distribution such as a power law, lognormal, or exponential function. In essence, any heavy tailed distribution consists of the head of the distribution (with a low percentage of vital or large things) and the tail of the distribution (with a high percentage of trivial or small things). Importantly, the low and high percentages constitute an imbalanced contrast, e.g., 20 versus 80. We suggest that map generalization is to retain the objects in the head and to eliminate or aggregate those in the tail. We applied this selection rule or principle to three generalization experiments, and found that the scaling of geographic space indeed underlies map generalization. We further relate the universal rule to T\"opfer's radical law (or trained cartographers' decision making in general), and illustrate several advantages of the universal rule. Keywords: Head/tail division rule, head/tail breaks, heavy tailed distributions, power law, and principles of selection