Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to specify cutoff #57

Closed
stevenpawley opened this issue Jun 7, 2017 · 9 comments
Closed

Ability to specify cutoff #57

stevenpawley opened this issue Jun 7, 2017 · 9 comments

Comments

@stevenpawley
Copy link

Hello,

Thanks for your work in the very useful Pykrige. One feature that I haven't found is the ability to specify the variogram cutoff, i.e. the distance up to which the variogram is calculated. Currently it appears that the variogram in Pykrige is calculated across the full distance of the data, whereas the typical cutoff is 1/3 of the diagonal distance of the data (i.e. gstat's default), and being able to specify this is important for many datasets. Apologies if this already exists and I've missed it, but otherwise this would be a useful addition.

@tomvansteijn
Copy link

I second this request. The option "n_closest_points" is available for ordinary kriging (ok.py), but not for universal kriging. It would be useful to be able to specify a cutoff distance for both ordinary kriging and universal kriging. Indeed, as in gstat.

@rth
Copy link
Contributor

rth commented Jun 11, 2017

One feature that I haven't found is the ability to specify the variogram cutoff, i.e. the distance up to which the variogram is calculated.

I believe that was discussed in issue #41, @basaks might know more about this..

The option "n_closest_points" is available for ordinary kriging (ok.py), but not for universal kriging. It would be useful to be able to specify a cutoff distance for both ordinary kriging and universal kriging.

This should be possible once the step 4 of refactoring in issue #31 is done. Unfortunately I am not able to work on this issue at the moment, and so unless somebody takes it over, I'm not sure when it would be done. In this particular case a simpler solution could be to add to uk.py the code relevant to n_closest_points (mostly this section, and possibly only the backend='loop') from ok.py. Pull request on this would be welcome @tomvansteijn !

@bsmurphy
Copy link
Contributor

@stevenpawley, I'm not exactly sure what you mean by variogram cutoff... Do you mean localized variogram estimation like in issue #41, or do you mean specify the maximum lag distances used in estimating the variogram model, or do you mean kriging with a moving window (so only using a certain number of nearest points, or points within a certain distance) as in @tomvansteijn's comment?

Either way, @tomvansteijn's suggestion is definitely something that fits in with the long-term goals in issue #31, but I also won't have time to work on this (or any of the other refactors) for several more weeks (sorry for losing momentum on those, @rth). PRs are of course always welcome in the mean time!

@stevenpawley
Copy link
Author

stevenpawley commented Jun 13, 2017 via email

@bsmurphy
Copy link
Contributor

I agree it would be useful to have more flexibility to tune the auto variogram fitting routine. Currently, in the most recent version here (which is different than the most recent version on PyPI), the weighting forces lags ~> 70% of the max to go to zero (see comments here). This is hard-coded at the moment, but it would be easy enough to include a kwarg to allow the user more control. And maybe it makes sense to also enable this kind of weighting by default, with that arbitrary 70% set to 30% or 50% (I hadn't heard the 1/2 the max distance rule of thumb before, but I have certainly seen datasets where an auto fit to all lags would be very bad). @stevenpawley, if you'd like to take out a PR that'd be great, otherwise I'll add some tweaks as I get the chance in the coming weeks...

@stevenpawley
Copy link
Author

stevenpawley commented Jun 21, 2017 via email

@smholsen
Copy link

smholsen commented Dec 1, 2018

Has anybody found a workable solution for adding this functionality yet?

Edit:
Is there actually more to this than simply modifying the intialization of dmax in the function _initialize_variogram_model in core.py?

E.g.

dmax = cutoff_distance if cutoff_distance else np.amax(d)

I am a bit out of my domain of expertise here, but from what I gather this seems to provide expected results.
The distances are followingly binned;

    dmin = np.amin(d)
    dd = (dmax - dmin) / nlags
    bins = [dmin + n * dd for n in range(nlags)]
    dmax += 0.001
    bins.append(dmax)

And then for each lag the semivariance is updated;

    for n in range(nlags):
        # This 'if... else...' statement ensures that there are data
        # in the bin so that numpy can actually find the mean. If we
        # don't test this first, then Python kicks out an annoying warning
        # message when there is an empty bin and we try to calculate the mean.
        if d[(d >= bins[n]) & (d < bins[n + 1])].size > 0:
            lags[n] = np.mean(d[(d >= bins[n]) & (d < bins[n + 1])])
            semivariance[n] = np.mean(g[(d >= bins[n]) & (d < bins[n + 1])])
        else:
            lags[n] = np.nan
            semivariance[n] = np.nan

Which means that the g values where the corresponding d > cutoff_distance are ignored.

Are there any flaws to this logic described above?

@MuellerSeb
Copy link
Member

Related to #97.

@MuellerSeb
Copy link
Member

Since we will use the variogram estimation routines of GSTools in the future, we will discuss things like this here: #136
There, we refactor the variogram estimation submodule ATM: GeoStat-Framework/GSTools#55
Closing for now. Feel free to re-open or (better) discuss in the linked issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants