hig.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Topological and Scaling Analysis of Geospatial Big Data
University of Gävle, Faculty of Engineering and Sustainable Development, Department of Industrial Development, IT and Land Management, Land management, GIS. (Geospatial informationsvetenskap)ORCID iD: 0000-0001-9328-9584
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Geographic information science and systems face challenges related to understanding the instinctive heterogeneity of geographic space, since conventional geospatial analysis is mainly founded on Euclidean geometry and Gaussian statistics. This thesis adopts a new paradigm, based on fractal geometry and Paretian statistics for geospatial analysis. The thesis relies on the third definition of fractal geometry: A set or pattern is fractal if the scaling of far more small things than large ones recurs multiple times. Therefore, the terms fractal and scaling are used interchangeably in this thesis. The new definition of fractal is well-described by Paretian statistics, which is mathematically defined as heavy-tailed distributions. The topology of geographic features is the key prerequisite that enables us to see the fractal or scaling structure of the geographic space. In this thesis, topology refers to the relationship among meaningful geographic features (such as natural streets and natural cities).

The thesis conducts topological and scaling analyses of geographic space and its involved human activities in the context of geospatial big data. The thesis utilizes the massive, volunteered, geographic information coming from LBSM platforms, which are the global OpenStreetMap database and countrywide, geo-referenced tweets and check-in locations. The thesis develops geospatial big-data processing and modeling techniques, and employs complexity science methods, including heavy-tailed distribution detection and head/tail breaks, along with some complex network analysis. Head/tail breaks and the induced ht-index are a powerful tool for geospatial big-data analytics and visualization. The derived scaling hierarchies, power-law metrics, and network measures provide quantitative insights into the heterogeneity of geographic space and help us understand how it shapes human activities at city, country, and world scales. 

Place, publisher, year, edition, pages
Gävle: Gävle University Press , 2018. , p. 73
Series
Studies in the Research Profile Built Environment. Doctoral thesis ; 7
Keywords [en]
Third definition of fractal, scaling, topology, power law, head/tail breaks, ht-index, complex network, geospatial big data, natural cities, natural streets
National Category
Computer and Information Sciences Earth and Related Environmental Sciences
Identifiers
URN: urn:nbn:se:hig:diva-26197ISBN: 978-91-88145-24-6 (print)ISBN: 978-91-88145-25-3 (electronic)OAI: oai:DiVA.org:hig-26197DiVA, id: diva2:1187391
Public defence
2018-05-16, Lilla Jadwiga-salen, Kungsbäcksvägen 47, Gävle, 10:00 (English)
Opponent
Supervisors
Available from: 2018-04-24 Created: 2018-03-04 Last updated: 2018-04-25
List of papers
1. Characterizing the Heterogeneity of the OpenStreetMap Data and Community
Open this publication in new window or tab >>Characterizing the Heterogeneity of the OpenStreetMap Data and Community
2015 (English)In: ISPRS International Journal of Geo-Information, ISSN 2220-9964, Vol. 4, no 2, p. 535-550Article in journal (Refereed) Published
Abstract [en]

OpenStreetMap (OSM) constitutes an unprecedented, free, geographical information source contributed by millions of individuals, resulting in a database of great volume and heterogeneity. In this study, we characterize the heterogeneity of the entire OSM database and historical archive in the context of big data. We consider all users, geographic elements and user contributions from an eight-year data archive, at a size of 692 GB. We rely on some nonlinear methods such as power law statistics and head/tail breaks to uncover and illustrate the underlying scaling properties. All three aspects (users, elements, and contributions) demonstrate striking power laws or heavy-tailed distributions. The heavy-tailed distributions imply that there are far more small elements than large ones, far more inactive users than active ones, and far more lightly edited elements than heavy-edited ones. Furthermore, about 500 users in the core group of the OSM are highly networked in terms of collaboration.

Keywords
OpenStreetMap, big data, power laws, head/tail breaks, ht-index
National Category
Other Computer and Information Science Physical Geography
Identifiers
urn:nbn:se:hig:diva-20223 (URN)10.3390/ijgi4020535 (DOI)000358987600006 ()2-s2.0-84948967039 (Scopus ID)
Available from: 2015-09-09 Created: 2015-09-09 Last updated: 2018-03-13Bibliographically approved
2. A Socio-Geographic Perspective on Human Activities in Social Media
Open this publication in new window or tab >>A Socio-Geographic Perspective on Human Activities in Social Media
2017 (English)In: Geographical Analysis, ISSN 0016-7363, E-ISSN 1538-4632, Vol. 49, no 3, p. 328-342Article in journal (Refereed) Published
Abstract [en]

Location-based social media make it possible to understand social and geographic aspects of human activities. However, previous studies have mostly examined these two aspects separately without looking at how they are linked. The study aims to connect two aspects by investigating whether there is any correlation between social connections and users' check-in locations from a socio-geographic perspective. We constructed three types of networks: a people–people network, a location–location network, and a city–city network from former location-based social media Brightkite and Gowalla in the U.S., based on users' check-in locations and their friendships. We adopted some complexity science methods such as power-law detection and head/tail breaks classification method for analysis and visualization. Head/tail breaks recursively partitions data into a few large things in the head and many small things in the tail. By analyzing check-in locations, we found that users' check-in patterns are heterogeneous at both the individual and collective levels. We also discovered that users' first or most frequent check-in locations can be the representatives of users' spatial information. The constructed networks based on these locations are very heterogeneous, as indicated by the high ht-index. Most importantly, the node degree of the networks correlates highly with the population at locations (mostly with R2 being 0.7) or cities (above 0.9). This correlation indicates that the geographic distributions of the social media users relate highly to their online social connections.

National Category
Other Social Sciences Other Civil Engineering
Identifiers
urn:nbn:se:hig:diva-24868 (URN)10.1111/gean.12122 (DOI)000405108800004 ()2-s2.0-85011710648 (Scopus ID)
Note

Funding agency:

Key Laboratory of Eco Planning & Green Building, Ministry of Education (Tsinghua University), China

Available from: 2017-08-17 Created: 2017-08-17 Last updated: 2018-03-22Bibliographically approved
3. A smooth curve as a fractal under the third definition
Open this publication in new window or tab >>A smooth curve as a fractal under the third definition
2018 (English)In: Cartographica, ISSN 0317-7173, E-ISSN 1911-9925, Vol. 53, no 3, p. 203-210Article in journal (Refereed) Published
Abstract [en]

It is commonly believed in the literature that smooth curves, such as circles, are not fractal, and only non-smooth curves, such as coastlines, are fractal. However, this paper demonstrates that a smooth curve can be fractal, under the new, relaxed, third definition of fractal – a set or pattern is fractal if the scaling of far more small things than large ones recurs at least twice. The scaling can be rephrased as a hierarchy, consisting of numerous smallest, a very few largest, and some in between the smallest and the largest. The logarithmic spiral, as a smooth curve, is apparently fractal because it bears the self-similar property, or the scaling of far more small squares than large ones recurs multiple times, or the scaling of far more small bends than large ones recurs multiple times. A half-circle or half-ellipse and the UK coastline (before or after smooth processing) are fractal, if the scaling of far more small bends than large ones recurs at least twice.

Abstract [fr]

Il est généralement convenu dans les écrits que les courbes douces, comme les cercles, ne sont pas fractales, et que seules les courbes qui ne sont pas douces, comme les littoraux, sont fractales. Les auteurs montrent toutefois qu'une courbe douce peut être fractale, en vertu d'une troisième définition, nouvelle et élargie, du terme fractal — un ensemble ou un motif est fractal si l'échelle d'un nombre beaucoup plus grand de petits éléments que de grands se répète au moins deux fois. L'échelle peut être interprétée comme étant la hiérarchie, soit un grand nombre d'éléments très petits, très peu d'éléments très grands, et des éléments se situant entre les plus petits et les plus grands. La spirale équangulaire, à titre de courbe douce, est en apparence fractale du fait qu'elle affiche la propriété d'autosimilitude, ou du fait que l'échelle d'un nombre beaucoup plus grand de petits carrés que de grands se répète plusieurs fois, ou l'échelle d'un nombre beaucoup plus grand de petite courbures que de grandes se répète plusieurs fois. Un demi-cercle ou une demi-ellipse et le littoral du Royaume-Uni (avant ou après lissage) sont fractals si l'échelle d'un nombre beaucoup plus grand de petites courbures que de grandes se répète au moins deux fois.

Keywords
Third definition of fractal, head/tail breaks, bends, ht-index, scaling hierarchy, courbures, échelle, hiérarchie, indice h-t, ruptures de tête ou de queue, troisième définition de fractal
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:hig:diva-26164 (URN)10.3138/cart.53.3.2017-0032 (DOI)
Available from: 2018-02-18 Created: 2018-02-18 Last updated: 2018-10-15Bibliographically approved
4. Why Topology Matters in Predicting Human Activities
Open this publication in new window or tab >>Why Topology Matters in Predicting Human Activities
Show others...
2018 (English)In: Environment and Planning B: Urban Analytics and City Science, ISSN 2399-8083Article in journal (Refereed) Epub ahead of print
Abstract [en]

Geographic space is best understood through the topological relationship of the underlying streets (note: entire streets rather than street segments), which enabales us to see scaling or fractal or living structure of far more less-connected streets than well-connected ones. It is this underlying scaling structure that makes human activities or urban traffic predictable, albeit in the sense of collective rather than individual human moving behavior. This power of topological analysis has not yet received its deserved attention in the literature, as many researchers continue to rely on segment analysis for predicting urban traffic. The segment-analysis-based methods are essentially geometric, with a focus on geometric details such as locations, lengths, and directions, and are unable to reveal the scaling property, which means they cannot be used for human activities prediction. We conducted a series of case studies using London streets and tweet location data, based on related concepts such as natural streets, and natural street segments (or street segments for short), axial lines, and axial line segments (or line segments for short). We found that natural streets are the best representation in terms of traffic prediction, followed by axial lines, and that neither street segments nor line segments bear a good correlation between network parameters and tweet locations. These findings point to the fact that the reason why axial lines-based space syntax, or the kind of topological analysis in general, works has little to do with individual human travel behavior or ways that human conceptualize distances or spaces. Instead, it is the underlying scaling hierarchy of streets – numerous least-connected, a very few most-connected, and some in between the least- and most-connected – that makes human activities or urban traffic predictable.

Keywords
Topological analysis, space syntax, segment analysis, natural streets, scaling of geographic space
National Category
Other Engineering and Technologies Social and Economic Geography
Identifiers
urn:nbn:se:hig:diva-26166 (URN)10.1177/2399808318792268 (DOI)2-s2.0-85052568351 (Scopus ID)
Available from: 2018-02-18 Created: 2018-02-18 Last updated: 2018-09-24Bibliographically approved
5. Defining least community as a homogeneous group in complex networks
Open this publication in new window or tab >>Defining least community as a homogeneous group in complex networks
2015 (English)In: Physica A: Statistical Mechanics and its Applications, ISSN 0378-4371, E-ISSN 1873-2119, Vol. 428, p. 154-160Article in journal (Refereed) Published
Abstract [en]

This paper introduces a new concept of least community that is as homogeneous as a random graph, and develops a new community detection algorithm from the perspective of homogeneity or heterogeneity. Based on this concept, we adopt head/tail breaks-a newly developed classification scheme for data with a heavy-tailed distribution-and rely on edge betweenness given its heavy-tailed distribution to iteratively partition a network into many heterogeneous and homogeneous communities. Surprisingly, the derived communities for any self-organized and/or self-evolved large networks demonstrate very striking power laws, implying that there are far more small communities than large ones. This notion of far more small things than large ones constitutes a new fundamental way of thinking for community detection. © 2015 Elsevier B.V. All rights reserved.

Keywords
Classification, Head/tail breaks, ht-index, k-means, Natural breaks, Scaling, Classification (of information), Iterative methods, Population dynamics, Complex networks
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:hig:diva-19212 (URN)10.1016/j.physa.2015.02.029 (DOI)000352328100015 ()2-s2.0-84923791049 (Scopus ID)
Available from: 2015-04-16 Created: 2015-04-16 Last updated: 2018-03-13Bibliographically approved
6. Spatial Distribution of City Tweets and Their Densities
Open this publication in new window or tab >>Spatial Distribution of City Tweets and Their Densities
2016 (English)In: Geographical Analysis, ISSN 0016-7363, E-ISSN 1538-4632, Vol. 48, no 3, p. 337-351Article in journal (Refereed) Published
Abstract [en]

Social media outlets such as Twitter constitute valuable data sources for understanding human activities in the virtual world from a geographic perspective. This article examines spatial distribution of tweets and densities within cities. The cities refer to natural cities that are automatically aggregated from a country’s small street blocks, so called city blocks. We adopted street blocks (rather than census tracts) as the basic geographic units and topological center (rather than geometric center) to assess how tweets and densities vary from the center to the peripheral border. We found that, within a city from the center to the periphery, the tweets first increase and then decrease, while the densities decrease in general. These increases and decreases fluctuate dramatically, and differ significantly from those if census tracts are used as the basic geographic units. We also found that the decrease of densities from the center to the periphery is less significant, and even disappears, if an arbitrarily defined city border is adopted. These findings prove that natural cities and their topological centers are better than their counterparts (conventionally defined cities and city centers) for geographic research. Based on this study, we believe that tweet densities can be a good surrogate of population densities. If this belief is proved to be true, social media data could help solve the dispute surrounding exponential or power function of urban population density.

Keywords
urban-population densities, head/tail breaks
National Category
Civil Engineering
Identifiers
urn:nbn:se:hig:diva-22228 (URN)10.1111/gean.12096 (DOI)000380333200006 ()2-s2.0-84959080817 (Scopus ID)
External cooperation:
Available from: 2016-08-16 Created: 2016-08-16 Last updated: 2018-03-13Bibliographically approved
7. How complex is a fractal?: Head/tail breaks and fractional hierarchy
Open this publication in new window or tab >>How complex is a fractal?: Head/tail breaks and fractional hierarchy
2018 (English)In: Journal of Geovisualization and Spatial Analysis, ISSN 2509-8810Article in journal (Refereed) Epub ahead of print
Abstract [en]

A fractal bears a complex structure that is reflected in a scaling hierarchy, indicating that there are far more small things than large ones. This scaling hierarchy can be effectively derived using head/tail breaks—a clustering and visualization tool for data with a heavy-tailed distribution—and quantified by a head/tail breaks-induced integer, called ht-index, indicating the number of clusters or hierarchical levels. However, this integral ht-index has been found to be less precise for many fractals at their different phrases of development. This paper refines the ht-index as a fraction to measure the scaling hierarchy of a fractal more precisely within a coherent whole and further assigns a fractional ht-index—the fht-index—to an individual data value of a data series that represents the fractal. We developed two case studies to demonstrate the advantages of the fht-index, in comparison with the ht-index. We found that the fractional ht-index or fractional hierarchy in general can help characterize a fractal set or pattern in a much more precise manner. The index may help create intermediate map scales between two consecutive map scales.

Keywords
Ht-index;Fractal;Scaling;Complexity;Fht-index
National Category
Natural Sciences
Identifiers
urn:nbn:se:hig:diva-26168 (URN)10.1007/s41651-017-0009-z (DOI)
Available from: 2018-02-18 Created: 2018-02-18 Last updated: 2018-03-22Bibliographically approved

Open Access in DiVA

fulltext(9442 kB)187 downloads
File information
File name FULLTEXT01.pdfFile size 9442 kBChecksum SHA-512
9ae28f4ff390c19b08d7c97a9ea885a5a57833c2b60f7a5db75801e6c1cb9de2ea6939838a0f0ea4f5e7036fd58e2e3f76c2486c19604eb762046f7ab92c2684
Type fulltextMimetype application/pdf
spikblad(27 kB)8 downloads
File information
File name SPIKBLAD01.pdfFile size 27 kBChecksum SHA-512
955dd9ff2d629c8cc8a9a1f33697eff24cfd62ea8f27ca2cb92f15da6d5603b4e00b9488f7b85e7fdafa654f0bae175e7e794d1b60a9158ccb28c4a23218afc9
Type spikbladMimetype application/pdf

Authority records BETA

Ma, Ding

Search in DiVA

By author/editor
Ma, Ding
By organisation
Land management, GIS
Computer and Information SciencesEarth and Related Environmental Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 187 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 4389 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard-cite-them-right
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • sv-SE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • de-DE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf