Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds function to find and plot local extrema #3110

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kgoebber
Copy link
Collaborator

@kgoebber kgoebber commented Jul 9, 2023

Description Of Changes

This PR adds a function to find local maximum and minimum using the spicy.ndimage.maximum_filter and spicy.ndimage.minimum_filter and a separate function to easily plot extrema on a map. The plotting sets up some sensible defaults while allowing for use of kwargs to set matplotlib.pyplot.Text settings.

I haven't yet incorporated the PathEffects suggestion from #1234, and implemented in #1237, but can easily do so once we are sure about current implementation, if desired. This PR would supersede #1237.

The operationalizes example code that has traditionally lived in the MetPy Example Gallery.

Checklist

@kgoebber kgoebber requested a review from a team as a code owner July 9, 2023 21:47
@kgoebber kgoebber requested review from dopplershift and removed request for a team July 9, 2023 21:47
tests/plots/test_util.py Fixed Show fixed Hide fixed
extreme_val = maximum_filter(var.values, nsize, mode='nearest')
elif extrema == 'min':
extreme_val = minimum_filter(var.values, nsize, mode='nearest')
return var.where(extreme_val == var.values)

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable

Local variable 'extreme_val' may be used before it is initialized.
@pytest.mark.mpl_image_compare(remove_text=True)
def test_plot_extrema():
"""Test plotting of local max/min values."""
data = xr.open_dataset(get_test_data('GFS_test.nc', as_file_obj=False))

Check warning

Code scanning / CodeQL

File is not always closed

File is opened but is not closed.
@@ -781,6 +781,37 @@
return take


@exporter.export
def find_local_extrema(var, nsize, extrema):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.

# Plot MSLP
clevmslp = np.arange(800., 1120., 4)
cs2 = ax.contour(mslp.lon, mslp.lat, mslp.metpy.convert_units('hPa'),

Check notice

Code scanning / CodeQL

Unused local variable

Variable cs2 is not used.
@kgoebber kgoebber force-pushed the find_plot_extrema branch 3 times, most recently from 357fbc8 to f4091d6 Compare July 10, 2023 15:02
tests/plots/test_util.py Fixed Show fixed Hide fixed
@kgoebber kgoebber force-pushed the find_plot_extrema branch 2 times, most recently from ab9a619 to b70542b Compare July 10, 2023 16:03
@kgoebber kgoebber force-pushed the find_plot_extrema branch 2 times, most recently from 42d2cc4 to 6aa26f9 Compare July 17, 2023 22:06
@dopplershift
Copy link
Member

I think the test failure is caused by minimum having a 3 1/2 year old version of xarray, which we could consider updating...but see my incoming review first.

Copy link
Member

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interesting design decisions to be made here. See below.

Comment on lines 288 to 300
def plot_local_extrema(ax, extreme_vals, symbol, plot_val=True, **kwargs):
"""Plot the local extreme (max/min) values of an array.

The behavior of the plotting will have the symbol horizontal/vertical alignment
be center/bottom and any value plotted will be center/top. The text size of plotted
values is 0.65 of the symbol size.

Parameters
----------
ax : `matplotlib.axes`
The axes which to plot the local extrema
extreme_vals : `xarray.DataArray`
The DataArray that contains the variable local extrema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mark a change, especially for plotting outside of declarative, where we require a DataArray. I'm not sure it's worth it in this case. I think it would be more cohesive with the rest of the API to do:

def plot_local_extrema(ax, x, y, extreme_vals, symbol, plot_val=True, **kwargs):

Am I missing something that xarray is buying us here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing other than I was creating this with declarative in mind, but I see how dropping this back to not using it would make it more widely applicable and not a pain to deal with in declarative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what i figured.

if plot_val:
kwargs.pop('verticalalignment')
size = kwargs.pop('size')
textsize = size * .65
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to allow users to override this? Could make the 0.65 * size the default like:

textsize = kwargs.pop('textsize', size * 0.65)

Comment on lines 784 to 804
@exporter.export
def find_local_extrema(var, nsize, extrema):
r"""Find the local extreme (max/min) values of a 2D array.

Parameters
----------
var : `xarray.DataArray`
The variable to locate the local extrema using the nearest method
from the maximum_filter or minimum_filter from the scipy.ndimage module.
nsize : int
The minimum number of grid points between each local extrema.
extrema: str
The value 'max' for local maxima or 'min' for local minima.

Returns
-------
var_extrema: `xarray.DataArray`
The values of the local extrema with other values as NaNs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is requiring xarray buying us anything here? (It looks like maximum_filter, etc. work fine when given a DataArray. Also, what about an API where instead of getting back the data field with values masked, just getting back the mask directly? Something like (which also addresses a CodeQL complaint):

    from scipy.ndimage import maximum_filter, minimum_filter

    if extrema == 'max':
        extreme_val = maximum_filter(var.values, nsize, mode='nearest')
    elif extrema == 'min':
        extreme_val = minimum_filter(var.values, nsize, mode='nearest')
    else:
        raise ValueError(f'Invalid value for "extrema": {extrema}. Valid options are "max" or "min".')

    return var == extreme_val

Or alternatively we could return arrays of x,y indices of the locations of the maxima? A benefit of both of these alternatives is that they lend themselves to more use cases and avoid dealing with units.

src/metpy/calc/tools.py Show resolved Hide resolved
Comment on lines 337 to 340
stack_vals = extreme_vals.stack(x=[extreme_vals.metpy.x.name, extreme_vals.metpy.y.name])
for extrema in stack_vals[stack_vals.notnull()]:
x = extrema[extreme_vals.metpy.x.name].values
y = extrema[extreme_vals.metpy.y.name].values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this might change based on some of the other discussion points I raised.

tests/plots/test_util.py Outdated Show resolved Hide resolved
@dopplershift
Copy link
Member

dopplershift commented Jul 24, 2023

I put together a version of this that uses the ax.scattertext() method we monkeypatch (😬) on Axes (we could instead use our TextCollection class that's used within), as well as a few other cleanups:

def plot_local_extrema(ax, x, y, extrema_mask, symbol, values=None, fmt='{0:.0f}', **kwargs):
    defaultkwargs = {'size': 20, 'clip_on': True, 'clip_box': ax.bbox, 'color': 'black',
                     'fontweight': 'bold',
                     'horizontalalignment': 'center', 'verticalalignment': 'center'}
    kwargs = {**defaultkwargs, **kwargs}
    size = kwargs.pop('size')

    if x.ndim == 1 or y.ndim == 1:
        x, y = np.meshgrid(x, y)
    extreme_x = x[extrema_mask]
    extreme_y = y[extrema_mask]
    symbol, _ = np.broadcast_arrays(symbol, extreme_x)

    if values is None:
        return ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
    else:
        extreme_vals = values[extrema_mask]
        valuesize = kwargs.pop('valuesize', size * 0.65)
        p1 = ax.scattertext(extreme_x, extreme_y, symbol, size=size, **kwargs)
        p2 = ax.scattertext(extreme_x, extreme_y, [fmt.format(v) for v in extreme_vals],
                            size=valuesize, loc=(0, -14), **kwargs)
        return p1, p2

I'm torn because on one hand this feels a bit specific/fringe, but any further simplification makes it not too much more useful than scattertext() itself. For instance, fmt and symbol could really be elided together. In some ways the most useful thing about that function is automatically meshgridding the x/y coords. Anyone have any other thoughts?

@dopplershift
Copy link
Member

After some discussion with @dcamron, one use case we realized that needed to be covered was that the plotting needs to work for the analyzed highs and lows from the WPC surface bulletins, so including the masking is kinda out. Luckily, getting the appropriate x/y indices is pretty easy and seems to work. Here's an implementation that updates scattertext a bit as a the plotting and has the find_local_extrema return indices along each of the dimensions:

import numbers

from matplotlib import cbook
import numpy as np
from metpy.plots._mpl import TextCollection

def find_local_extrema(var, nsize=15, extrema='max'):
    from scipy.ndimage import maximum_filter, minimum_filter

    if extrema == 'max':
        extreme_val = maximum_filter(var, nsize, mode='nearest')
    elif extrema == 'min':
        extreme_val = minimum_filter(var, nsize, mode='nearest')
    else:
        raise ValueError(f'Invalid value for "extrema": {extrema}. '
                         'Valid options are "max" or "min".')
    return (var == extreme_val).nonzero()


def scattertext(ax, x, y, values, loc=(0, 0), fmt='{0:.0f}', **kw):
    # Start with default args and update from kw
    new_kw = {
        'verticalalignment': 'center',
        'horizontalalignment': 'center',
        'transform': ax.transData,
        'clip_on': False}
    new_kw.update(kw)

    # Handle masked arrays
    x, y, values = cbook.delete_masked_points(*np.broadcast_arrays(x, y, values))
    print(x, y, values)

    # If there is nothing left after deleting the masked points, return None
    if x.size == 0:
        return None

    if not isinstance(values[0], str):
        if isinstance(fmt, str):
            fmt = fmt.format
        values = [fmt(v) for v in values]
    
    # Make the TextCollection object
    text_obj = TextCollection(x, y, values, offset=loc, **new_kw)

    # The margin adjustment is a hack to deal with the fact that we don't
    # want to transform all the symbols whose scales are in points
    # to data coords to get the exact bounding box for efficiency
    # reasons.  It can be done right if this is deemed important.
    # Also, only bother with this padding if there is anything to draw.
    xm, ym = ax.margins()
    if xm < 0.05:
        ax.set_xmargin(0.05)

    if ym < 0.05:
        ax.set_ymargin(0.05)

    # Add it to the axes and update range
    ax.add_artist(text_obj)

    # Matplotlib at least up to 3.2.2 does not properly clip text with paths, so
    # work-around by setting to the bounding box of the Axes
    # TODO: Remove when fixed in our minimum supported version of matplotlib
    text_obj.clipbox = ax.bbox

    ax.update_datalim(text_obj.get_datalim(ax.transData))
    ax.autoscale_view()
    return text_obj

and heres the example using it:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
import matplotlib.pyplot as plt
import xarray as xr

data = xr.open_dataset(get_test_data('GFS_test.nc', as_file_obj=False))
mslp = data.Pressure_reduced_to_MSL_msl.squeeze().metpy.convert_units('hPa')

max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')

fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))

# Plot MSLP
clevmslp = np.arange(800., 1120., 4)
ax.contour(mslp.lon, mslp.lat, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())

scattertext(ax, mslp.lon[max_xind], mslp.lat[max_yind], 'H', size=20, color='red',
            fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], 'L', size=20, color='blue',
            fontweight='bold', transform=ccrs.PlateCarree())
scattertext(ax, mslp.lon[min_xind], mslp.lat[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())


ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)

@kgoebber How does that seem?

@kgoebber
Copy link
Collaborator Author

kgoebber commented Aug 1, 2023

I think this is okay, but I attempted to try it with some NAM data (using geopotential heights) and I am running into an issue with plotting. Is there something in the TextCollection piece that is making it specifically look for lat/lon values? I get a 'NaN' error...

Test Code:

import cartopy.crs as ccrs
import cartopy.feature as cfeature
from metpy.cbook import get_test_data
import matplotlib.pyplot as plt
import xarray as xr

data = xr.open_dataset(get_test_data('NAM_test.nc', as_file_obj=False)).metpy.parse_cf()
mslp = data.Geopotential_height_isobaric.metpy.sel(vertical=850*units.hPa).squeeze()
dataproj = mslp.metpy.cartopy_crs

max_yind, max_xind = find_local_extrema(mslp.values, 15, 'max')
min_yind, min_xind = find_local_extrema(mslp.values, 15, 'min')

fig = plt.figure(figsize=(8., 8.))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.LambertConformal(central_longitude=-95))

# Plot MSLP
clevmslp = np.arange(0., 2000., 30)
ax.contour(mslp.x, mslp.y, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=ccrs.PlateCarree())

scattertext(ax, mslp.x[max_xind], mslp.y[max_yind], 'H', size=20, color='red',
            fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], 'L', size=20, color='blue',
            fontweight='bold', transform=dataproj)
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=ccrs.PlateCarree())


ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.COASTLINE)

@dopplershift
Copy link
Member

dopplershift commented Aug 1, 2023

Well I can get a good plot if I correct your call to contour() and your last call to scattertext to use dataproj rather than ccrs.PlateCarree():

ax.contour(mslp.x, mslp.y, mslp,
           clevmslp, colors='k', linewidths=1.25, linestyles='solid', transform=dataproj)
...
scattertext(ax, mslp.x[min_xind], mslp.y[min_yind], mslp.values[min_yind, min_xind],
            size=12, color='blue', fontweight='bold', loc=(0,-15), transform=dataproj)

Though this is the image:

image

so there's definitely something weird with the values, not to mention the highs near New Orleans.

EDIT: Ok, so the values are due to using heights, so that makes sense. Still not sure what's up with some of those close locations.

@dopplershift
Copy link
Member

Ok, so the "too close" points are a flaw in the algorithm. Essentially we're using a filter to smear the extrema within the kernel, then using == to find where the original source datapoint is. Unfortunately, there's nothing prohibiting multiple points within that area having the same value.

Thoughts:

  1. Filter found points and take the centroid of any extrema that are clustered
  2. A different technique?

@kgoebber
Copy link
Collaborator Author

kgoebber commented Aug 2, 2023

D'oh, missed those transforms...shouldn't work that hard at the end of a day...

Okay, so there could be two potential options:

  1. Institute one smoothing pass with the nine point filter to help ensure unique values to find the extrema points based on the average of the surrounding points, then give back just the boolean array so that the actual gridded values are retrieved at the end.
  2. Institute some form of reduce_point_density to eliminate close by, same value points.

In a test with implementing option 1, it did in fact eliminate the double "H" by New Orleans. There is still a chance that you would end up with two same values close by (or at some other random point). This option is nice in that it doesn't create another setting the user needs to think about. We're only using the smoothing to help isolate the "correct" grid point in a unique way, but not giving back any altered values. Below is the modified function for finding the local extrema.

def find_local_extrema(var, nsize=15, extrema='max'):
    from scipy.ndimage import maximum_filter, minimum_filter
    from metpy.calc import smooth_n_point

    smooth_var = smooth_n_point(var, 9, 1)
    if extrema == 'max':
        extreme_val = maximum_filter(smooth_var, nsize, mode='nearest')
    elif extrema == 'min':
        extreme_val = minimum_filter(smooth_var, nsize, mode='nearest')
    else:
        raise ValueError(f'Invalid value for "extrema": {extrema}. '
                         'Valid options are "max" or "min".')
    return (smooth_var == extreme_val).nonzero()

download

A downside to the smoothing solution is that it doesn't always get the original extreme values as the extrema smoothed value may be a surrounding point to the actual point. Clearly not ideal. Likely off by at most a rounding level of error in reporting out.

@kgoebber
Copy link
Collaborator Author

kgoebber commented Aug 2, 2023

For what it's worth, here is what GEMPAK does...

https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dghilo.c
https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclmp.c
https://github.com/Unidata/gempak/blob/main/gempak/source/diaglib/dg/dgclcn.c

To summarize: its finds min/max points within a radius, then looks for clusters, and keeps the center of the cluster and removes the rest.

@dopplershift
Copy link
Member

So for reference, this is the same as skimage's peak_local_max. Looks like in 0.18 they addressed the "return multiple close points" problem by picking one arbitrarily.

I'm peaking at this post which gives something with some scary math ("Homology") behind it, but seems like a straightforward method that gives the peaks as if the water level around them was dropping. I don't want to overthink this one, but now that I've seen the post my brain won't give it up.

@ThomasMGeo
Copy link

Scipy find_peaks with the distance argument might also work? Will code up an example.

@dopplershift
Copy link
Member

@ThomasMGeo The problem with find_peaks is that it only works with 1D input. I have an implementation of the persistent homology approach that I just need to finish with:

  1. An implementation of find_local_extrema (handling max/min and choosing the number of peaks)
  2. Tests

@ThomasMGeo
Copy link

Yep, after fiddling with it, came to the same conclusion :)

@dopplershift dopplershift added this to the September 2023 milestone Aug 30, 2023
@sgdecker
Copy link
Contributor

I would amend the implementation so that the highs/lows at the grid edge are omitted, in line with what GEMPAK does. I believe the for loops in the GEMPAK code are performing the search on a grid that excludes the first and last row/column.

@dopplershift dopplershift added Type: Feature New functionality Area: Plots Pertains to producing plots Area: Calc Pertains to calculations labels Dec 4, 2023
@dopplershift
Copy link
Member

Punting from this (now December) release because there are some API decisions I don't want to rush on plotting and I'm still struggling with auto-thresholding of the persistent homology approach. Not worth holding up for this.

@kgoebber
Copy link
Collaborator Author

Sounds good. Happy to test implementation in the new year to help solidify choices for implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Calc Pertains to calculations Area: Plots Pertains to producing plots Type: Feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up High/Low Plotting Function A Bit
4 participants