Adds function to find and plot local extrema #3110

kgoebber · 2023-07-09T21:47:00Z

Description Of Changes

This PR adds a function to find local maximum and minimum using the spicy.ndimage.maximum_filter and spicy.ndimage.minimum_filter and a separate function to easily plot extrema on a map. The plotting sets up some sensible defaults while allowing for use of kwargs to set matplotlib.pyplot.Text settings.

I haven't yet incorporated the PathEffects suggestion from #1234, and implemented in #1237, but can easily do so once we are sure about current implementation, if desired. This PR would supersede #1237.

The operationalizes example code that has traditionally lived in the MetPy Example Gallery.

Checklist

Closes Clean up High/Low Plotting Function A Bit #1234
Tests added
Fully documented

tests/plots/test_util.py

src/metpy/calc/tools.py

+        extreme_val = maximum_filter(var.values, nsize, mode='nearest')
+    elif extrema == 'min':
+        extreme_val = minimum_filter(var.values, nsize, mode='nearest')
+    return var.where(extreme_val == var.values)


tests/plots/test_util.py

+@pytest.mark.mpl_image_compare(remove_text=True)
+def test_plot_extrema():
+    """Test plotting of local max/min values."""
+    data = xr.open_dataset(get_test_data('GFS_test.nc', as_file_obj=False))


src/metpy/calc/tools.py

@@ -781,6 +781,37 @@
    return take


+@exporter.export
+def find_local_extrema(var, nsize, extrema):


tests/plots/test_util.py

+
+    # Plot MSLP
+    clevmslp = np.arange(800., 1120., 4)
+    cs2 = ax.contour(mslp.lon, mslp.lat, mslp.metpy.convert_units('hPa'),


tests/plots/test_util.py

dopplershift · 2023-07-21T20:55:20Z

I think the test failure is caused by minimum having a 3 1/2 year old version of xarray, which we could consider updating...but see my incoming review first.

dopplershift

Some interesting design decisions to be made here. See below.

dopplershift · 2023-07-21T20:57:48Z

src/metpy/plots/_util.py

+def plot_local_extrema(ax, extreme_vals, symbol, plot_val=True, **kwargs):
+    """Plot the local extreme (max/min) values of an array.
+
+    The behavior of the plotting will have the symbol horizontal/vertical alignment
+    be center/bottom and any value plotted will be center/top. The text size of plotted
+    values is 0.65 of the symbol size.
+
+    Parameters
+    ----------
+    ax : `matplotlib.axes`
+        The axes which to plot the local extrema
+    extreme_vals : `xarray.DataArray`
+        The DataArray that contains the variable local extrema


This would mark a change, especially for plotting outside of declarative, where we require a DataArray. I'm not sure it's worth it in this case. I think it would be more cohesive with the rest of the API to do:

def plot_local_extrema(ax, x, y, extreme_vals, symbol, plot_val=True, **kwargs):

Am I missing something that xarray is buying us here?

Nothing other than I was creating this with declarative in mind, but I see how dropping this back to not using it would make it more widely applicable and not a pain to deal with in declarative.

That's what i figured.

dopplershift · 2023-07-21T20:58:32Z

src/metpy/plots/_util.py

+    if plot_val:
+        kwargs.pop('verticalalignment')
+    size = kwargs.pop('size')
+    textsize = size * .65


Would we want to allow users to override this? Could make the 0.65 * size the default like:

textsize = kwargs.pop('textsize', size * 0.65)

dopplershift · 2023-07-21T21:14:57Z

src/metpy/calc/tools.py

+@exporter.export
+def find_local_extrema(var, nsize, extrema):
+    r"""Find the local extreme (max/min) values of a 2D array.
+
+    Parameters
+    ----------
+    var : `xarray.DataArray`
+        The variable to locate the local extrema using the nearest method
+        from the maximum_filter or minimum_filter from the scipy.ndimage module.
+    nsize : int
+        The minimum number of grid points between each local extrema.
+    extrema: str
+        The value 'max' for local maxima or 'min' for local minima.
+
+    Returns
+    -------
+    var_extrema: `xarray.DataArray`
+        The values of the local extrema with other values as NaNs
+


Is requiring xarray buying us anything here? (It looks like maximum_filter, etc. work fine when given a DataArray. Also, what about an API where instead of getting back the data field with values masked, just getting back the mask directly? Something like (which also addresses a CodeQL complaint):

from scipy.ndimage import maximum_filter, minimum_filter if extrema == 'max': extreme_val = maximum_filter(var.values, nsize, mode='nearest') elif extrema == 'min': extreme_val = minimum_filter(var.values, nsize, mode='nearest') else: raise ValueError(f'Invalid value for "extrema": {extrema}. Valid options are "max" or "min".') return var == extreme_val

Or alternatively we could return arrays of x,y indices of the locations of the maxima? A benefit of both of these alternatives is that they lend themselves to more use cases and avoid dealing with units.

src/metpy/calc/tools.py

dopplershift · 2023-07-21T21:17:58Z

src/metpy/plots/_util.py

+    stack_vals = extreme_vals.stack(x=[extreme_vals.metpy.x.name, extreme_vals.metpy.y.name])
+    for extrema in stack_vals[stack_vals.notnull()]:
+        x = extrema[extreme_vals.metpy.x.name].values
+        y = extrema[extreme_vals.metpy.y.name].values


I'm guessing this might change based on some of the other discussion points I raised.

tests/plots/test_util.py

dopplershift · 2023-07-24T21:45:35Z

I put together a version of this that uses the ax.scattertext() method we monkeypatch (😬) on Axes (we could instead use our TextCollection class that's used within), as well as a few other cleanups:

def plot_local_extrema(ax, x, y, extrema_mask, symbol, values=None, fmt='{0:.0f}', **kwargs):
    defaultkwargs = {'size': 20, 'clip_on': True, 'clip_box': ax.bbox, 'color': 'black',
                     'fontweight': 'bold',
                     'horizontalalignment': 'center', 'verticalalignment': 'center'}
    kwargs = {**defaultkwargs, **kwargs}
    size = kwargs.pop('size')

    if x.ndim == 1 or y.ndim == 1:
        x, y = np.meshgrid(x, y)
    extreme_x = x[extrema_mask]
    extreme_y = y[extrema_mask]
    symbol, _ = np.broadcast_arrays(symbol, extreme_x)

    if values is None:
        return ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
    else:
        extreme_vals = values[extrema_mask]
        valuesize = kwargs.pop('valuesize', size * 0.65)
        p1 = ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
        p2 = ax.scattertext(extreme_x, extreme_y, [fmt.format(v) for v in extreme_vals],
                            size=valuesize, loc=(0, -14), **kwargs)
        return p1, p2

I'm torn because on one hand this feels a bit specific/fringe, but any further simplification makes it not too much more useful than scattertext() itself. For instance, fmt and symbol could really be elided together. In some ways the most useful thing about that function is automatically meshgridding the x/y coords. Anyone have any other thoughts?

dopplershift · 2023-07-28T23:05:38Z

After some discussion with @dcamron, one use case we realized that needed to be covered was that the plotting needs to work for the analyzed highs and lows from the WPC surface bulletins, so including the masking is kinda out. Luckily, getting the appropriate x/y indices is pretty easy and seems to work. Here's an implementation that updates scattertext a bit as a the plotting and has the find_local_extrema return indices along each of the dimensions:

import numbers

from matplotlib import cbook
import numpy as np
from metpy.plots._mpl import TextCollection

def find_local_extrema(var, nsize=15, extrema='max'):
    from scipy.ndimage import maximum_filter, minimum_filter

    if extrema == 'max':
        extreme_val = maximum_filter(var, nsize, mode='nearest')
    elif extrema == 'min':
        extreme_val = minimum_filter(var, nsize, mode='nearest')
    else:
        raise ValueError(f'Invalid value for "extrema": {extrema}. '
                         'Valid options are "max" or "min".')
    return (var == extreme_val).nonzero()


def scattertext(ax, x, y, values, loc=(0, 0), fmt='{0:.0f}', **kw):
    # Start with default args and update from kw
    new_kw = {
        'verticalalignment': 'center',
        'horizontalalignment': 'center',
        'transform': ax.transData,
        'clip_on': False}
    new_kw.update(kw)

    # Handle masked arrays
    x, y, values = cbook.delete_masked_points(*np.broadcast_arrays(x, y, values))
    print(x, y, values)

    # If there is nothing left after deleting the masked points, return None
    if x.size == 0:
        return None

    if not isinstance(values[0], str):
        if isinstance(fmt, str):
            fmt = fmt.format
        values = [fmt(v) for v in values]
    
    # Make the TextCollection object
    text_obj = TextCollection(x, y, values, offset=loc, **new_kw)

    # The margin adjustment is a hack to deal with the fact that we don't
    # want to transform all the symbols whose scales are in points
    # to data coords to get the exact bounding box for efficiency
    # reasons.  It can be done right if this is deemed important.
    # Also, only bother with this padding if there is anything to draw.
    xm, ym = ax.margins()
    if xm < 0.05:
        ax.set_xmargin(0.05)

    if ym < 0.05:
        ax.set_ymargin(0.05)

    # Add it to the axes and update range
    ax.add_artist(text_obj)

    # Matplotlib at least up to 3.2.2 does not properly clip text with paths, so
    # work-around by setting to the bounding box of the Axes
    # TODO: Remove when fixed in our minimum supported version of matplotlib
    text_obj.clipbox = ax.bbox

    ax.update_datalim(text_obj.get_datalim(ax.transData))
    ax.autoscale_view()
    return text_obj

and heres the example using it:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr

data = xr.open_dataset(get_test_data('GFS_test.nc', as_file_obj=False))
mslp = data.Pressure_reduced_to_MSL_msl.squeeze().metpy.convert_units('hPa')

max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')

fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))

# Plot MSLP
clevmslp = np.arange(800., 1120., 4)
ax.contour(mslp.lon, mslp.lat, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())

scattertext(ax, mslp.lon[max_xind], mslp.lat[max_yind], 'H', size=20, color='red',
            fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], 'L', size=20, color='blue',
            fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())


ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)

@kgoebber How does that seem?

kgoebber · 2023-08-01T22:13:33Z

I think this is okay, but I attempted to try it with some NAM data (using geopotential heights) and I am running into an issue with plotting. Is there something in the TextCollection piece that is making it specifically look for lat/lon values? I get a 'NaN' error...

Test Code:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
from metpy.cbook import get_test_data
import matplotlib.pyplot as plt
import xarray as xr

data = xr.open_dataset(get_test_data('NAM_test.nc', as_file_obj=False)).metpy.parse_cf()
mslp = data.Geopotential_height_isobaric.metpy.sel(vertical=850*units.hPa).squeeze()
dataproj = mslp.metpy.cartopy_crs

max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')

fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))

# Plot MSLP
clevmslp = np.arange(0., 2000., 30)
ax.contour(mslp.x, mslp.y, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())

scattertext(ax, mslp.x[max_xind], mslp.y[max_yind], 'H', size=20, color='red',
            fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], 'L', size=20, color='blue',
            fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())


ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)

dopplershift · 2023-08-01T22:23:33Z

Well I can get a good plot if I correct your call to contour() and your last call to scattertext to use dataproj rather than ccrs.PlateCarree():

ax.contour(mslp.x, mslp.y, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=dataproj)
...
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=dataproj)

Though this is the image:

so there's definitely something weird with the values, not to mention the highs near New Orleans.

EDIT: Ok, so the values are due to using heights, so that makes sense. Still not sure what's up with some of those close locations.

dopplershift · 2023-08-01T22:39:29Z

Ok, so the "too close" points are a flaw in the algorithm. Essentially we're using a filter to smear the extrema within the kernel, then using == to find where the original source datapoint is. Unfortunately, there's nothing prohibiting multiple points within that area having the same value.

Thoughts:

Filter found points and take the centroid of any extrema that are clustered
A different technique?

kgoebber · 2023-08-02T14:21:32Z

D'oh, missed those transforms...shouldn't work that hard at the end of a day...

Okay, so there could be two potential options:

Institute one smoothing pass with the nine point filter to help ensure unique values to find the extrema points based on the average of the surrounding points, then give back just the boolean array so that the actual gridded values are retrieved at the end.
Institute some form of reduce_point_density to eliminate close by, same value points.

In a test with implementing option 1, it did in fact eliminate the double "H" by New Orleans. There is still a chance that you would end up with two same values close by (or at some other random point). This option is nice in that it doesn't create another setting the user needs to think about. We're only using the smoothing to help isolate the "correct" grid point in a unique way, but not giving back any altered values. Below is the modified function for finding the local extrema.

def find_local_extrema(var, nsize=15, extrema='max'):
    from scipy.ndimage import maximum_filter, minimum_filter
    from metpy.calc import smooth_n_point

    smooth_var = smooth_n_point(var, 9, 1)
    if extrema == 'max':
        extreme_val = maximum_filter(smooth_var, nsize, mode='nearest')
    elif extrema == 'min':
        extreme_val = minimum_filter(smooth_var, nsize, mode='nearest')
    else:
        raise ValueError(f'Invalid value for "extrema": {extrema}. '
                         'Valid options are "max" or "min".')
    return (smooth_var == extreme_val).nonzero()

A downside to the smoothing solution is that it doesn't always get the original extreme values as the extrema smoothed value may be a surrounding point to the actual point. Clearly not ideal. Likely off by at most a rounding level of error in reporting out.

kgoebber · 2023-08-02T14:49:43Z

For what it's worth, here is what GEMPAK does...

https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dghilo.c
https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclmp.c
https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclcn.c

To summarize: its finds min/max points within a radius, then looks for clusters, and keeps the center of the cluster and removes the rest.

dopplershift · 2023-08-02T19:30:04Z

So for reference, this is the same as skimage's peak_local_max. Looks like in 0.18 they addressed the "return multiple close points" problem by picking one arbitrarily.

I'm peaking at this post which gives something with some scary math ("Homology") behind it, but seems like a straightforward method that gives the peaks as if the water level around them was dropping. I don't want to overthink this one, but now that I've seen the post my brain won't give it up.

ThomasMGeo · 2023-08-07T16:58:00Z

Scipy find_peaks with the distance argument might also work? Will code up an example.

dopplershift · 2023-08-07T17:39:16Z

@ThomasMGeo The problem with find_peaks is that it only works with 1D input. I have an implementation of the persistent homology approach that I just need to finish with:

An implementation of find_local_extrema (handling max/min and choosing the number of peaks)
Tests

ThomasMGeo · 2023-08-07T17:58:25Z

Yep, after fiddling with it, came to the same conclusion :)

sgdecker · 2023-10-30T19:03:18Z

I would amend the implementation so that the highs/lows at the grid edge are omitted, in line with what GEMPAK does. I believe the for loops in the GEMPAK code are performing the search on a grid that excludes the first and last row/column.

dopplershift · 2023-12-22T20:55:30Z

Punting from this (now December) release because there are some API decisions I don't want to rush on plotting and I'm still struggling with auto-thresholding of the persistent homology approach. Not worth holding up for this.

kgoebber · 2023-12-22T21:53:18Z

Sounds good. Happy to test implementation in the new year to help solidify choices for implementation.

kgoebber requested a review from a team as a code owner July 9, 2023 21:47

kgoebber requested review from dopplershift and removed request for a team July 9, 2023 21:47

github-advanced-security bot found potential problems Jul 9, 2023

View reviewed changes

kgoebber force-pushed the find_plot_extrema branch from 9258464 to 8320849 Compare July 10, 2023 00:24

github-advanced-security bot found potential problems Jul 10, 2023

View reviewed changes

kgoebber force-pushed the find_plot_extrema branch 3 times, most recently from 357fbc8 to f4091d6 Compare July 10, 2023 15:02

github-advanced-security bot found potential problems Jul 10, 2023

View reviewed changes

tests/plots/test_util.py Fixed Show fixed Hide fixed

kgoebber force-pushed the find_plot_extrema branch 2 times, most recently from ab9a619 to b70542b Compare July 10, 2023 16:03

kgoebber force-pushed the find_plot_extrema branch 2 times, most recently from 42d2cc4 to 6aa26f9 Compare July 17, 2023 22:06

dopplershift reviewed Jul 21, 2023

View reviewed changes

kgoebber added 2 commits July 24, 2023 09:14

add function to find and plot local extrema

3e7f2de

make recommended changes

d4ca452

kgoebber force-pushed the find_plot_extrema branch from df3c561 to d4ca452 Compare July 24, 2023 14:18

dopplershift added this to the September 2023 milestone Aug 30, 2023

dopplershift added Type: Feature New functionality Area: Plots Pertains to producing plots Area: Calc Pertains to calculations labels Dec 4, 2023

dopplershift modified the milestones: December 2023, January 2024 Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds function to find and plot local extrema #3110

Adds function to find and plot local extrema #3110

kgoebber commented Jul 9, 2023

dopplershift commented Jul 21, 2023

dopplershift left a comment

dopplershift Jul 21, 2023

kgoebber Jul 22, 2023

dopplershift Jul 24, 2023

dopplershift Jul 21, 2023

dopplershift Jul 21, 2023

dopplershift Jul 21, 2023

dopplershift commented Jul 24, 2023 •

edited

Loading

dopplershift commented Jul 28, 2023

kgoebber commented Aug 1, 2023 •

edited

Loading

dopplershift commented Aug 1, 2023 •

edited

Loading

dopplershift commented Aug 1, 2023

kgoebber commented Aug 2, 2023 •

edited

Loading

kgoebber commented Aug 2, 2023

dopplershift commented Aug 2, 2023

ThomasMGeo commented Aug 7, 2023

dopplershift commented Aug 7, 2023

ThomasMGeo commented Aug 7, 2023

sgdecker commented Oct 30, 2023

dopplershift commented Dec 22, 2023

kgoebber commented Dec 22, 2023

Adds function to find and plot local extrema #3110

Are you sure you want to change the base?

Adds function to find and plot local extrema #3110

Conversation

kgoebber commented Jul 9, 2023

Description Of Changes

Checklist

dopplershift commented Jul 21, 2023

dopplershift left a comment

Choose a reason for hiding this comment

dopplershift Jul 21, 2023

Choose a reason for hiding this comment

kgoebber Jul 22, 2023

Choose a reason for hiding this comment

dopplershift Jul 24, 2023

Choose a reason for hiding this comment

dopplershift Jul 21, 2023

Choose a reason for hiding this comment

dopplershift Jul 21, 2023

Choose a reason for hiding this comment

dopplershift Jul 21, 2023

Choose a reason for hiding this comment

dopplershift commented Jul 24, 2023 • edited Loading

dopplershift commented Jul 28, 2023

kgoebber commented Aug 1, 2023 • edited Loading

dopplershift commented Aug 1, 2023 • edited Loading

dopplershift commented Aug 1, 2023

kgoebber commented Aug 2, 2023 • edited Loading

kgoebber commented Aug 2, 2023

dopplershift commented Aug 2, 2023

ThomasMGeo commented Aug 7, 2023

dopplershift commented Aug 7, 2023

ThomasMGeo commented Aug 7, 2023

sgdecker commented Oct 30, 2023

dopplershift commented Dec 22, 2023

kgoebber commented Dec 22, 2023

dopplershift commented Jul 24, 2023 •

edited

Loading

kgoebber commented Aug 1, 2023 •

edited

Loading

dopplershift commented Aug 1, 2023 •

edited

Loading

kgoebber commented Aug 2, 2023 •

edited

Loading