Ability to specify cutoff #57

stevenpawley · 2017-06-07T23:36:56Z

Hello,

Thanks for your work in the very useful Pykrige. One feature that I haven't found is the ability to specify the variogram cutoff, i.e. the distance up to which the variogram is calculated. Currently it appears that the variogram in Pykrige is calculated across the full distance of the data, whereas the typical cutoff is 1/3 of the diagonal distance of the data (i.e. gstat's default), and being able to specify this is important for many datasets. Apologies if this already exists and I've missed it, but otherwise this would be a useful addition.

tomvansteijn · 2017-06-11T21:35:47Z

I second this request. The option "n_closest_points" is available for ordinary kriging (ok.py), but not for universal kriging. It would be useful to be able to specify a cutoff distance for both ordinary kriging and universal kriging. Indeed, as in gstat.

rth · 2017-06-11T21:53:17Z

One feature that I haven't found is the ability to specify the variogram cutoff, i.e. the distance up to which the variogram is calculated.

I believe that was discussed in issue #41, @basaks might know more about this..

The option "n_closest_points" is available for ordinary kriging (ok.py), but not for universal kriging. It would be useful to be able to specify a cutoff distance for both ordinary kriging and universal kriging.

This should be possible once the step 4 of refactoring in issue #31 is done. Unfortunately I am not able to work on this issue at the moment, and so unless somebody takes it over, I'm not sure when it would be done. In this particular case a simpler solution could be to add to uk.py the code relevant to n_closest_points (mostly this section, and possibly only the backend='loop') from ok.py. Pull request on this would be welcome @tomvansteijn !

bsmurphy · 2017-06-13T00:14:28Z

@stevenpawley, I'm not exactly sure what you mean by variogram cutoff... Do you mean localized variogram estimation like in issue #41, or do you mean specify the maximum lag distances used in estimating the variogram model, or do you mean kriging with a moving window (so only using a certain number of nearest points, or points within a certain distance) as in @tomvansteijn's comment?

Either way, @tomvansteijn's suggestion is definitely something that fits in with the long-term goals in issue #31, but I also won't have time to work on this (or any of the other refactors) for several more weeks (sorry for losing momentum on those, @rth). PRs are of course always welcome in the mean time!

stevenpawley · 2017-06-13T03:59:30Z

Hi Benjamin, By cutoff, I mean the maximum lag distance over which the variogram is calculated using auto fitting of variogram function parameters. Even with weight=True, autofitting in some cases leads to a poor variogram model fit. This is because the fitting is influenced by wild oscillations in the semivariance that are exhibited at very large lag distances in some datasets. The default cutoff in gstat is 1/3 of the diagonal distance of the dataset, and I think it is usually recommended not calculate the semivariance of point pairs that exceed 1/2 the maximum distance of the data due to this phenomenon.

…

Sent from my iPhone

On Jun 12, 2017, at 6:14 PM, Benjamin Murphy ***@***.***> wrote: @stevenpawley, I'm not exactly sure what you mean by variogram cutoff... Do you mean localized variogram estimation like in issue #41, or do you mean specify the maximum lag distances used in estimating the variogram model, or do you mean kriging with a moving window (so only using a certain number of nearest points, or points within a certain distance) as in @tomvansteijn's comment? Either way, @tomvansteijn's suggestion is definitely something that fits in with the long-term goals in issue #31, but I also won't have time to work on this (or any of the other refactors) for several more weeks (sorry for losing momentum on those, @rth). PRs are of course always welcome in the mean time! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bsmurphy · 2017-06-13T05:10:02Z

I agree it would be useful to have more flexibility to tune the auto variogram fitting routine. Currently, in the most recent version here (which is different than the most recent version on PyPI), the weighting forces lags ~> 70% of the max to go to zero (see comments here). This is hard-coded at the moment, but it would be easy enough to include a kwarg to allow the user more control. And maybe it makes sense to also enable this kind of weighting by default, with that arbitrary 70% set to 30% or 50% (I hadn't heard the 1/2 the max distance rule of thumb before, but I have certainly seen datasets where an auto fit to all lags would be very bad). @stevenpawley, if you'd like to take out a PR that'd be great, otherwise I'll add some tweaks as I get the chance in the coming weeks...

stevenpawley · 2017-06-21T14:14:14Z

Hi Benjamin, Thanks for this. I'm away with work at the moment but I can certainly let take a look at this when I get back. Steve

…

Sent from my iPhone

On Jun 12, 2017, at 11:10 PM, Benjamin Murphy ***@***.***> wrote: I agree it would be useful to have more flexibility to tune the auto variogram fitting routine. Currently, in the most recent version here (which is different than the most recent version on PyPI), the weighting forces lags ~> 70% of the max to go to zero (see comments here). This is hard-coded at the moment, but it would be easy enough to include a kwarg to allow the user more control. And maybe it makes sense to also enable this kind of weighting by default, with that arbitrary 70% set to 30% or 50% (I hadn't heard the 1/2 the max distance rule of thumb before, but I have certainly seen datasets where an auto fit to all lags would be very bad). @stevenpawley, if you'd like to take out a PR that'd be great, otherwise I'll add some tweaks as I get the chance in the coming weeks... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

smholsen · 2018-12-01T11:11:39Z

Has anybody found a workable solution for adding this functionality yet?

Edit:
Is there actually more to this than simply modifying the intialization of dmax in the function _initialize_variogram_model in core.py?

E.g.

dmax = cutoff_distance if cutoff_distance else np.amax(d)

I am a bit out of my domain of expertise here, but from what I gather this seems to provide expected results.
The distances are followingly binned;

    dmin = np.amin(d)
    dd = (dmax - dmin) / nlags
    bins = [dmin + n * dd for n in range(nlags)]
    dmax += 0.001
    bins.append(dmax)

And then for each lag the semivariance is updated;

    for n in range(nlags):
        # This 'if... else...' statement ensures that there are data
        # in the bin so that numpy can actually find the mean. If we
        # don't test this first, then Python kicks out an annoying warning
        # message when there is an empty bin and we try to calculate the mean.
        if d[(d >= bins[n]) & (d < bins[n + 1])].size > 0:
            lags[n] = np.mean(d[(d >= bins[n]) & (d < bins[n + 1])])
            semivariance[n] = np.mean(g[(d >= bins[n]) & (d < bins[n + 1])])
        else:
            lags[n] = np.nan
            semivariance[n] = np.nan

Which means that the g values where the corresponding d > cutoff_distance are ignored.

Are there any flaws to this logic described above?

MuellerSeb · 2020-01-27T07:03:44Z

Related to #97.

MuellerSeb · 2020-04-04T17:13:33Z

Since we will use the variogram estimation routines of GSTools in the future, we will discuss things like this here: #136
There, we refactor the variogram estimation submodule ATM: GeoStat-Framework/GSTools#55
Closing for now. Feel free to re-open or (better) discuss in the linked issue.

bsmurphy mentioned this issue Feb 5, 2018

[Refactoring] N-dimenstional Kriging #31

Closed

9 tasks

bsmurphy mentioned this issue May 4, 2018

Maximal distance in variogram estimation #97

Closed

bsmurphy added this to the v2.0 milestone May 4, 2018

MuellerSeb added enhancement help wanted new feature labels Jan 27, 2020

MuellerSeb mentioned this issue Jan 27, 2020

GeoStat-Framework integration: PyKrige v2 #136

Open

10 tasks

MuellerSeb mentioned this issue Mar 26, 2020

Adaptation to sklearn #143

Open

MuellerSeb closed this as completed Apr 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to specify cutoff #57

Ability to specify cutoff #57

stevenpawley commented Jun 7, 2017

tomvansteijn commented Jun 11, 2017

rth commented Jun 11, 2017 •

edited

Loading

bsmurphy commented Jun 13, 2017

stevenpawley commented Jun 13, 2017 via email

bsmurphy commented Jun 13, 2017

stevenpawley commented Jun 21, 2017 via email

smholsen commented Dec 1, 2018 •

edited

Loading

MuellerSeb commented Jan 27, 2020

MuellerSeb commented Apr 4, 2020

Ability to specify cutoff #57

Ability to specify cutoff #57

Comments

stevenpawley commented Jun 7, 2017

tomvansteijn commented Jun 11, 2017

rth commented Jun 11, 2017 • edited Loading

bsmurphy commented Jun 13, 2017

stevenpawley commented Jun 13, 2017 via email

bsmurphy commented Jun 13, 2017

stevenpawley commented Jun 21, 2017 via email

smholsen commented Dec 1, 2018 • edited Loading

MuellerSeb commented Jan 27, 2020

MuellerSeb commented Apr 4, 2020

rth commented Jun 11, 2017 •

edited

Loading

smholsen commented Dec 1, 2018 •

edited

Loading