Skip to content

Commit

Permalink
geometry access with .geometry
Browse files Browse the repository at this point in the history
  • Loading branch information
michaeldorman committed Sep 13, 2023
1 parent af2abd9 commit 71f18c9
Show file tree
Hide file tree
Showing 5 changed files with 29 additions and 27 deletions.
20 changes: 11 additions & 9 deletions 01-spatial-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -222,16 +222,18 @@ gdf[gdf['name_long'] == 'Egypt'].explore()

A vital column in a `GeoDataFrame` is the geometry column, of class `GeoSeries`.
The geometry column contains the geometric part of the vector layer.
The geometry column can be accessed by name, which typically (e.g., when reading from a file) is `'geometry'`, as in `gdf['geometry']`.
However, the recommendation is to use the fixed `.geometry` property, which refers to the geometry column regardless whether its name is `'geometry'` or any other name.
In the case of the `gdf` object, the geometry column contains `'MultiPolygon'`s associated with each country:

```{python}
gdf['geometry']
gdf.geometry
```

The geometry column also contains the spatial reference information, if any:

```{python}
gdf['geometry'].crs
gdf.geometry.crs
```

Many geometry operations, such as calculating the centroid, buffer, or bounding box of each feature involve just the geometry.
Expand All @@ -243,14 +245,14 @@ gdf.centroid
```

```{python}
gdf['geometry'].centroid
gdf.geometry.centroid
```

Note that `.centroid`, and other similar operators in `geopandas` such as `.buffer` (@sec-buffers) or `.convex_hull`, return only the geometry (i.e., a `GeoSeries`), not a `GeoDataFrame` with the original attribute data. In case we want the latter, we can create a copy of the `GeoDataFrame` and then "overwrite" its geometry (or, we can overwrite the geometries directly in case we don't need the original ones, as in `gdf['geometry']=gdf.centroid`):
Note that `.centroid`, and other similar operators in `geopandas` such as `.buffer` (@sec-buffers) or `.convex_hull`, return only the geometry (i.e., a `GeoSeries`), not a `GeoDataFrame` with the original attribute data. In case we want the latter, we can create a copy of the `GeoDataFrame` and then "overwrite" its geometry (or, we can overwrite the geometries directly in case we don't need the original ones, as in `gdf.geometry=gdf.centroid`):

```{python}
gdf2 = gdf.copy()
gdf2['geometry'] = gdf.centroid
gdf2.geometry = gdf.centroid
gdf2
```

Expand All @@ -259,13 +261,13 @@ Note that the types of geometries contained in a geometry column (and, thus, a v
Accordingly, the `.type` property returns a `Series` (of type `string`), rather than a single value:

```{python}
gdf['geometry'].type
gdf.geometry.type
```

To summarize the occurrence of different geometry types in a geometry column, we can use the **pandas** method called `value_counts`:

```{python}
gdf['geometry'].type.value_counts()
gdf.geometry.type.value_counts()
```

In this case, we see that the `gdf` layer contains only `'MultiPolygon'` geometries.
Expand Down Expand Up @@ -353,13 +355,13 @@ Each element in the geometry column is a geometry object, of class `shapely`.
For example, here is one specific geometry selected by implicit index (that of Canada):

```{python}
gdf['geometry'].iloc[3]
gdf.geometry.iloc[3]
```

We can also select a specific geometry based on the `'name_long'` attribute:

```{python}
gdf[gdf['name_long'] == 'Egypt']['geometry'].iloc[0]
gdf[gdf['name_long'] == 'Egypt'].geometry.iloc[0]
```

The **shapely** package is compatible with the Simple Features standard.
Expand Down
14 changes: 7 additions & 7 deletions 03-spatial-operations.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ canterbury

```{python}
# Does each 'nz_height' point intersect with 'canterbury'?
sel = nz_height.intersects(canterbury['geometry'].iloc[0])
sel = nz_height.intersects(canterbury.geometry.iloc[0])
sel
```

Expand Down Expand Up @@ -112,7 +112,7 @@ Alternatively, we can evaluate other methods, such as `.disjoint` to obtain all

```{python}
# Is each 'nz_height' point disjoint from 'canterbury'?
sel = nz_height.disjoint(canterbury['geometry'].iloc[0])
sel = nz_height.disjoint(canterbury.geometry.iloc[0])
canterbury_height2 = nz_height[sel]
```

Expand Down Expand Up @@ -345,8 +345,8 @@ We can check if any of the points are the same by creating a pairwise boolean ma
Note that the `.to_numpy` method is applied to go from a `DataFrame` to a `numpy` array, for which `.any` gives a global rather than a row-wise summary, which is what we want in this case:

```{python}
m = cycle_hire['geometry'].apply(
lambda x: cycle_hire_osm['geometry'].intersects(x)
m = cycle_hire.geometry.apply(
lambda x: cycle_hire_osm.geometry.intersects(x)
)
m.to_numpy().any()
```
Expand All @@ -360,7 +360,7 @@ Note that we transform the data to a projected CRS (`27700`) to use real buffer
```{python}
crs = 27700
cycle_hire_buffers = cycle_hire.copy().to_crs(crs)
cycle_hire_buffers['geometry'] = cycle_hire_buffers.buffer(20)
cycle_hire_buffers.geometry = cycle_hire_buffers.buffer(20)
z = gpd.sjoin(
cycle_hire_buffers,
cycle_hire_osm.to_crs(crs)
Expand All @@ -377,7 +377,7 @@ We also go back from buffers to points using `.centroid`:
z = z[['id', 'capacity', 'geometry']] \
.dissolve(by='id', aggfunc='mean') \
.reset_index()
z['geometry'] = z.centroid
z.geometry = z.centroid
z
```

Expand Down Expand Up @@ -598,7 +598,7 @@ co
The distance matrix `d` is obtained as follows (technically speaking, this is a `DataFrame`). In plain language, we take the geometry from each each row in `nz_height.iloc[:3, :]`, and apply the `.distance` method on `co` with that row as the argument:

```{python}
d = nz_height.iloc[:3, :].apply(lambda x: co.distance(x['geometry']), axis=1)
d = nz_height.iloc[:3, :].apply(lambda x: co.distance(x.geometry), axis=1)
d
```

Expand Down
6 changes: 3 additions & 3 deletions 04-geometry-operations.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ In the common scenario when the original attributes of the input features need t

```{python}
seine_buff_5km = seine.copy()
seine_buff_5km['geometry'] = seine.buffer(5000)
seine_buff_5km.geometry = seine.buffer(5000)
seine_buff_5km
```

Expand Down Expand Up @@ -469,7 +469,7 @@ What is going on in terms of the geometries? Behind the scenes, `.dissolve` comb
```{python}
us_west = us_states[us_states['REGION'] == 'West']
us_west_union = us_west['geometry'].unary_union
us_west_union = us_west.geometry.unary_union
```
Note that the result is a `shapely` geometry, as the individual attributes are "lost" as part of dissolving. The result is shown in @fig-dissolve2.
Expand All @@ -486,7 +486,7 @@ To dissolve two (or more) groups of a `GeoDataFrame` into one geometry, we can e
```{python}
sel = (us_states['REGION'] == 'West') | (us_states['NAME'] == 'Texas')
texas_union = us_states[sel]
texas_union = texas_union['geometry'].unary_union
texas_union = texas_union.geometry.unary_union
```
or concatenate the two separate subsets:
Expand Down
14 changes: 7 additions & 7 deletions 05-raster-vector.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ To mask the image, i.e., convert all pixels which do not intersect with the `zio
```{python}
out_image_mask, out_transform_mask = rasterio.mask.mask(
src_srtm,
zion['geometry'],
zion.geometry,
crop=False,
nodata=9999
)
Expand Down Expand Up @@ -351,7 +351,7 @@ To count occurrences of categorical raster values within polygons, we can use ma
```{python}
out_image, out_transform = rasterio.mask.mask(
src_nlcd,
zion['geometry'].to_crs(src_nlcd.crs),
zion.geometry.to_crs(src_nlcd.crs),
crop=False,
nodata=9999
)
Expand Down Expand Up @@ -435,14 +435,14 @@ Now, we can rasterize. Rasterization is a very flexible operation: the results d
To illustrate this flexibility we will try three different approaches to rasterization. First, we create a raster representing the presence or absence of cycle hire points (known as presence/absence rasters). In this case, we transfer the value of `1` to all pixels where at least one point falls in. To transform the point `GeoDataFrame` into a generator of `shapely` geometries and the (fixed) values, we use the following expression:

```{python}
((g, 1) for g in cycle_hire_osm_projected['geometry'].to_list())
((g, 1) for g in cycle_hire_osm_projected.geometry.to_list())
```

Therefore, the rasterizing expression is:

```{python}
ch_raster1 = rasterio.features.rasterize(
((g, 1) for g in cycle_hire_osm_projected['geometry'].to_list()),
((g, 1) for g in cycle_hire_osm_projected.geometry.to_list()),
out_shape=shape,
transform=transform
)
Expand All @@ -458,7 +458,7 @@ To count the number of stations, we can use the fixed value of `1` combined with

```{python}
ch_raster2 = rasterio.features.rasterize(
((g, 1) for g in cycle_hire_osm_projected['geometry'].to_list()),
((g, 1) for g in cycle_hire_osm_projected.geometry.to_list()),
out_shape=shape,
transform=transform,
merge_alg=rasterio.enums.MergeAlg.add
Expand Down Expand Up @@ -514,7 +514,7 @@ california
Second, we "cast" the polygon into a `"MultiLineString"` geometry, using the `.boundary` property that `GeoSeries` have:

```{python}
california_borders = california['geometry'].boundary
california_borders = california.geometry.boundary
california_borders
```

Expand Down Expand Up @@ -551,7 +551,7 @@ Compare it to a polygon rasterization, with `all_touched=False` by default, whic

```{python}
california_raster2 = rasterio.features.rasterize(
((g, 1) for g in california['geometry'].to_list()),
((g, 1) for g in california.geometry.to_list()),
out_shape=shape,
transform=transform,
fill=np.nan
Expand Down
2 changes: 1 addition & 1 deletion 06-reproj.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ lonlat2UTM(174.7, -36.9)
Here is another example for London (where we "unpack" the coordinates of the 1^st^ geometry in `lnd_layer` into the `lonlat2UTM` function arguments):

```{python}
lonlat2UTM(*lnd_layer['geometry'].iloc[0].coords[0])
lonlat2UTM(*lnd_layer.geometry.iloc[0].coords[0])
```

Currently, we also have tools helping us to select a proper CRS.
Expand Down

0 comments on commit 71f18c9

Please sign in to comment.