-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatch requirements for reuse in other libraries #1
Comments
To add a few other aspects that we discussed
|
Some general principles I'd personally like to see implemented:
|
There has to be a way to make a backend selection decision that takes into account more than just the types of the arguments. For example when passing a large image as a numpy array it can/is worth transferring it to the GPU, doing the work and transferring the result back. Whether or not doing the conversion is worth it might depend on the particular operation/function/algorithm though. Similarly, it should be possible to pass "small arguments" as Python lists, instead of having to make them arrays of the same type as "the thing" you want to operate on. This is mostly a UX/convenience thing for interactive use. So you need a way to pick the backend while ignoring those args. For something like scikit-learn you need to be able to do dispatching for stateful classes. The call to Another interesting thing is how to make sure that input validation is the same for different backends. In the sense that if you pass some invalid combination of arguments (or arguments + constructor arguments in the case of a class) you should get the same exception for different backends. Having to duplicate the validation code is probably not great, so we need some ideas on how to do this. You can probably take this one step further and say that no matter what the backend, the possible exceptions and when they are raised should always be the same. Enforce this via testing?! |
I agree with logging, but I think I'd prefer off by default. In my experience users typically ignore loud outputs unless they're requested.
Just want to note that due to associativity issues GPU backends are unlikely to ever produce numerical identical results, but should be fine within some tolerance. More generally, there may be APIs in the frontend library that have historically implicitly provided results with some property (e.g. consistent/stable ordering) that has not actually been promised as part of the API but that users have come to rely on. Dispatching will potentially force frontends to become more explicit about what exactly is being promised about results. I think most of what @betatim suggested sounds good and is consistent with the other proposals discussed above. |
To some degree, I think this is the elephant in the room, together with the point of reproducibility.
All of the above make sense, what currently bothers me most is that there are two possible "cucim" backend meanings:
But when you think of
I see logging as a very solvable issue. For example, we could keep counters about which backend was called for which function. So that the issue template can say: Run your code and then give us the information from Verbose print-out on dispatching by default seems strange to me. But I think it can make sense in a scenario where
could make sense to me and should be the choice of scikit-image (even if it is probably hard to enforce that choice).
The only reason I can see for this would be to help users by having a stub-backend that says: Oh, you passed a cupy array, maybe you want to install a backend for that? But you can probably hook that into the array-conversion step failure if wanted. |
What do you think of an additional option of "if cucim is installed it is used as a backend"? Maybe at first the user would have to activate it/dispatching in general via something like In scikit-learn the current thinking is that users will likely only install one additional backend, but maybe more than one. For now users have to opt-in to dispatching. As part of the opting in they can specify their preference for the order in which backends are tried. Each backend is asked "do you want to handle this?". The first one to say "yes" gets it. If none say yes the default backend handles it (aka the code that is currently in scikit-learn). This way a backend can say yes/no to work based on the input type, input size and algorithm hyper-parameters. An open question is how to determine the order in which backends are tried in the case where the user opts in to dispatching without specifying one or in the future when it is on by default.
I think it would be confusing for a user if there were two cucim backends. Why do you need that? One backend could handle both cases you described no? But it would have to keep track of the input type so that it can do the "back to numpy" conversion if needed. |
cuCIM follows CuPy's policy of not automatically casting from NumPy arrays. An error will be raised on NumPy inputs to cuCIM functions. |
That was supposed to be the last point, but not too clear :). I would see it as scikit-image's job to make guidlines, and I think it could go either way based on how much confidence we have in the mechanism (and confidence can grow over time). Note that I think that these worries are unnecessary when we are dealing e.g. with CuPy inputs that scikit-image currently doesn't support (or at least only in niche things). I do think such a backend should just auto-activate always (via entry points).
In an ideal world (but happy to not aim too high if it starts getting in the way). I might make it a two stage process:
I don't think starting with a type-based approach makes implementation significantly harder. The question below remains identical.
A backend can keep track of the input type, i.e. I could write a So it is not much of problem to have a cuCIM backend (and maybe that is enough):
But if you for example look at cuGraph if you do But the point is, you can't do both, without some user choice. And that choice could look very different. If you go a bit of type-dispatching "all-in", you could even do:
using a separate mechanism in case you ever care about types rather than backends (the last is what some type dispatching libraries do, IIRC). |
I was discussing how the NetworkX dispatching could be generalized to support reuse inside scikit-image for dispatching to cucim with @JoOkuma, @rlratzel, and @lagru. @JoOkuma put together a prototype in scikit-image/scikit-image#7466. Some of the thoughts that came up in this discussion (some of this functionality seems to already be supported, others are new, just collecting all of them here for completeness):
The text was updated successfully, but these errors were encountered: