cyclic foreign vars #177

feefladder · 2021-03-20T20:44:57Z

Closes Explicit cyclic foreign vars #164 Apply transitive reduction on graph of processes? #120
Tests added
Passes black . && flake8
Fully documented, including whats-new.rst for all changes and api.rst for new API

feefladder · 2021-03-20T21:01:39Z

proof-of-concept PR for allowing user-defined custom dependencies,
still needs:

tests,
documentation
solution for on_demand variables
(optionally) a way to quickly add chains of succesive processes. (this could be a wrapper that builds a dependency dict, an open question is how to find corresponding variable names)

the arrows from inout still also show up if the process that uses the inout var is set to explicitly depend on it

feefladder · 2021-03-23T00:19:18Z

So, we probably still have to move the check to a place where we can actually view the model.

For the checking, I used a dictionary of {p_name:{dep1,de2,dep3}}, that is also very useful for any other graph inspection. Do you think it is a small enough datastructure to be kept? (this allows for very concise/faster code in other functions.

feefladder · 2021-03-23T10:19:35Z

xsimlab/dot.py

+    def add_inout_arrows(self):
+        for p_name, p_obj in self.model._processes.items():
+            p_cls = type(p_obj)
+            for var_name, var in variables_dict(p_cls).items():
+                # test if the variable is inout
+                if (
+                    var.metadata["intent"] == VarIntent.INOUT
+                    and var.metadata["var_type"] != VarType.FOREIGN
+                ):
+                    target_keys = _get_target_keys(p_obj, var_name)
+
+                    # now again cycle through all processes to see if there is a variable with the same reference
+                    for p2_name, p2_obj in self.model._processes.items():
+                        p2_cls = type(p2_obj)
+
+                        # skip this if it is a dependent process or the process itself
+                        if (
+                            p_name in self.model.dependent_processes[p2_name]
+                            or p_name == p2_name
+                        ):
+                            continue
+
+                        for var2_name, var2 in variables_dict(p2_cls).items():
+                            # if the variable is
+                            target2_keys = _get_target_keys(p2_obj, var2_name)
+                            if len(set(target_keys) & set(target2_keys)):
+                                edge_ends = p_name, p2_name
+                                self.g.edge(
+                                    *edge_ends, weight="200", **INOUT_EDGE_ATTRS
+                                )


need some work still, would be much easier if we have the deps_dict

benbovy

Thank you @joeperdefloep for tackling this, that's awesome!!

This is a very important update in xarray-simlab, so I'd suggest to break this PR into several ones if you don't mind, at least one for each of the following:

add user-defined process dependencies
update how the graph is shown (dot.py)
implement and/or expose graph reduction

It will be much easier for me to review this (it's gonna take me some time too).

benbovy · 2021-03-23T11:10:22Z

xsimlab/model.py

        return self._dep_processes

+    def _check_inout_vars(self):


It's gonna take a little while for me to digest all of this :-).

of course, we can have multiple variables that have an inout somewhere, so we make two dicts with inout an in variables and sets of processes in which that variable is in/inout, so we can run the algorithm on every variable separately.
from now, for one variable:
so a good graph should look like:

inout1->inout2->inout3 / \ ^ \ ^ \ / \ / \ / \ in1 in2 in3 in4

in_ps = {in1,in2,in3,in4}, io_ps = {inout1,inout2,inout3}
let's say we start at inout2:

io_stack = [inout2]

since inout1 is in deps_dict[inout2], and not in verified_ios=[], we add it to the stack

io_stack =[inout2,inout1]

since there are no inouts in deps_dict[inout1], we are at the root node.
a. all in variables in deps_dict[inout1] are safe, so delete them: in_ps = {in2,in3,in4}
b. add to verified_ios = [inout1]
c. pop: io_stack=[inout2]

inout1 is in deps_dict[inout2] = {inout1} == set(verified_ios) so we have a linear graph in inout variables
a. child in variables: {in2}
b. check for in2 if verified_io[-1] = inout1 is in deps_dict[in2], e.g. in2 comes after inout1
c. add to verified_ios = [inout1,inout2]
d. remove child in variables: in_ps = {in3,in4}
e. pop: io_stack = []

now start again from inout3:

io_stack = [inout3]

{inout1,inout2} in deps_dict[inout3], equals set(verified_ios), so safe (same as 10)
a. for all child in processes: {in3} check if verified_ios[-1]=inout2 in deps_dict[in3] -> good
b. verified_ios=[io1,io2,io3]
b. remove child in variables: in_ps={in4}
c. pop io_stack=[]

now start from inout1: in verified_ios, so skip

final check:

all remaining in_ps={in4}, check for inout4 in in_deps[in4]->good

I think the errors will become more clear if I write some tests

benbovy · 2021-03-23T11:30:26Z

xsimlab/model.py

+
+        return descendants
+
+    def transitive_reduction(self):


As I said in #164 (comment), I don't think we should apply transitive reduction on the graph, as it may loose some information that is potentially useful to know. Consider the following graph of processes:

/->-\ A->B->C

It's good to know that if we remove B there will still be a dependency between A and C.

Let me show a real-world example taken from fastscape:

without reduction:

with reduction:

While the reduced graph looks much cleaner, there's a lot of hidden relationships and it's hard to figure out that, e.g., bedrock depends directly on init_bedrock.

I'd rather keep this feature disabled by default and allow enabling it via several options:

a model visualization option: model.visualize(reduce_graph=True)

a Model method, e.g., reduced_model = model.reduce_graph() (let's keep models immutable)

a Model constructor option, e.g., model = xs.Model({...}, reduce_graph=True).

now it's an option at model creation

benbovy · 2021-03-23T11:49:22Z

xsimlab/model.py

@@ -1108,7 +1293,7 @@ def drop_processes(self, keys):
        processes_cls = {
            k: type(obj) for k, obj in self._processes.items() if k not in keys
        }
-        return type(self)(processes_cls)
+        return type(self)(processes_cls, self._custom_dependencies)


I suspect this won't be as simple as that, but we can look at this in follow-up PRs if we're sure that errors are raised in case of any inconsistency or ambiguity.

Thank you @joeperdefloep for tackling this, that's awesome!!

This is a very important update in xarray-simlab, so I'd suggest to break this PR into several ones if you don't mind, at least one for each of the following:

* add user-defined process dependencies * update how the graph is shown (`dot.py`) * implement and/or expose graph reduction

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

how the graph is shown turns out to become more difficult: the arrows should be only from the topmost inout variable to the lowest in variables. For this we need the deps_dict and all inout/in variables (the same data as for the sorting algorithm, so we even need to perform a DFS)

benbovy · 2021-03-23T14:28:53Z

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

Besides the implementation, we still have to decide how to expose this feature, whether to enable it or not by default, etc.

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

Honestly, it's too much for me to follow and discuss in a single PR.

Adding user-defined dependencies is a already a big feature that has a high potential to make the framework more flexible, so I'd really appreciate if we could progress incrementally to make sure that we get the design right and the feature robust.

jvail · 2021-03-23T15:12:49Z

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

I feel guilty having started this particular feature discussion :) So let me give a practical example from the model I am working on (it's a bit messy right now - but work-in-progress):

The process topology (and others) depend on the result of the arch_dev process (t minus one). And imho it would be a good improvement if that could be visualized somehow.

feefladder · 2021-03-23T16:32:36Z

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

Besides the implementation, we still have to decide how to expose this feature, whether to enable it or not by default, etc.

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

Honestly, it's too much for me to follow and discuss in a single PR.

Adding user-defined dependencies is a already a big feature that has a high potential to make the framework more flexible, so I'd really appreciate if we could progress incrementally to make sure that we get the design right and the feature robust.

Ok. I get that, I can move the reduction to another PR, as well as maybe adding custom dependencies.
Then we have:

custom dependencies
dependent on the deps_dict:
- strict checking
- graph reduction
maybe: dotted arrows

This will quite test my Git skills....

@jvail no worries, the main part of this PR is about enabling (and verifying) inout variables, that was already an issue.
The dotted arrows part will need some more work and is actually quite complex, not sure if I'll find time for that

benbovy · 2021-03-23T16:38:57Z

Thanks for the illustration @jvail.

I was getting confused because we (I feel guilty too) involve two separate problems in the discussion here and in #164:

we have a quantity that we want to use and possibly update during the same timestep (or simulation stage) in one or more processes executed with some specific constraints on the workflow.
we have a quantity a which depends on quantity b, but b depends on a computed at the previous timestep or simulation stage.

Problem 1 is what I thought @joeperdefloep you describe in your #164 (comment) (as you mentioned Fastscape's SurfaceToErode which is a workaround to that specific problem). The introduction of user-defined process dependencies (+ allow intent='inout' for foreign variables) would solve this problem.

Problem 2 is what @jvail you initially describe in #164. The current workaround is to declare an inout variable for a, define an input value as the initial value at t = -1, and make sure that it is not updated during the same step/stage than when it is accessed to compute b. However, this approach is a bit fragile. With the introduction of user-defined process dependencies, you will instead need to provide an explicit dependency when building your model so that you can safely compute b before updating a during the same step/stage.

That said, for problem 2 you'll still need an inout variable (+ input value) for a. If you want to define a with intent='out', then the cleanest way to do this is to introduce a whole new concept of a 3rd dimension (or some kind of step/stage unroll) for the graph of processes in xarray-simlab (cf. your #164 (comment)). I don't think it's worth the extra complexity, though (the intent concept is also invariant of the simulation stage). Alternatively, we could add an exception to the variable intent heuristics (cf. declare a variable as seemingly_cyclic) but I don't really see much advantage of this approach compared to the inout + input value workaround.

feefladder · 2021-03-23T16:42:05Z

@jvail the difficulty:
imagine the graph:

  inout1->inout2->inout3
   /   \ /         \
 in1   in2          in3

then, we want arrows only from inout3 to in1, so we need some sort of sorting algorithm for that. For all inout variables. That is quite the algorithm.

feefladder · 2021-03-23T17:05:17Z

@benbovy I think I will close this PR until I have properly split it

benbovy · 2021-03-23T17:28:16Z

Yeah sorry @joeperdefloep a fresh start in new branches is probably the best thing to do.

What we need first is a PR that

extends the Model API so that we can provide explicit process dependencies
implements the logic in _ModelBuilder to merge those dependencies with the implicit depencies (i.e., those retrieved from variable intents)

In the same PR or the next one:

allow intent='inout' for foreign variables
update graph validation in _ModelBuilder: ensure that explicit dependencies exist for all ambiguous cases

Then in follow-up PRs:

implement graph reduction/optimization/simplification and expose that through some API (if better graph visualization is the only motivation I would even suggest to expose that only as an option to Model.visualize for now)
revisit how model output snapshots are saved
update the docs: we really need better docs about how process ordering is done in xarray-simlab
better handling of explicit deps propagation: maybe some smart dependencies reconstructions when explicit deps are broken after model.update_processes or model.drop_processes

Joeperdefloep added 4 commits March 19, 2021 18:52

some docs on avoiding cycles

ae181e0

show_inout_arrows and example notebook

7cbf487

working! moved as_variable_key to utils

6389f2f

fixed failing tests, need tests still

e7f76fd

feefladder closed this Mar 20, 2021

fixed arrows

de9d19e

feefladder reopened this Mar 21, 2021

feefladder mentioned this pull request Mar 21, 2021

Explicit cyclic foreign vars #164

Open

solved checking of inout vars, also transitive reduction

7ef2a51

some refactoring

4388932

feefladder commented Mar 23, 2021

View reviewed changes

improved refactoring

c0571d2

benbovy reviewed Mar 23, 2021

View reviewed changes

benbovy mentioned this pull request Mar 23, 2021

More flexible API for model creation and customization #62

Open

fixed drop_processes ++DFS YAY!

34bb82c

feefladder closed this Mar 23, 2021

feefladder mentioned this pull request Mar 24, 2021

Custom dependencies #178

Open

4 tasks

aufdenkampe mentioned this pull request Jan 24, 2024

Multi-index issue: simulations batches and xarray recent versions #193

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cyclic foreign vars #177

cyclic foreign vars #177

feefladder commented Mar 20, 2021 •

edited

Loading

feefladder commented Mar 20, 2021 •

edited

Loading

feefladder commented Mar 23, 2021

feefladder Mar 23, 2021

benbovy left a comment

benbovy Mar 23, 2021

feefladder Mar 23, 2021

benbovy Mar 23, 2021

feefladder Mar 23, 2021

benbovy Mar 23, 2021

feefladder Mar 23, 2021

benbovy commented Mar 23, 2021

jvail commented Mar 23, 2021 •

edited

Loading

feefladder commented Mar 23, 2021 •

edited

Loading

benbovy commented Mar 23, 2021 •

edited

Loading

feefladder commented Mar 23, 2021

feefladder commented Mar 23, 2021

benbovy commented Mar 23, 2021

cyclic foreign vars #177

cyclic foreign vars #177

Conversation

feefladder commented Mar 20, 2021 • edited Loading

feefladder commented Mar 20, 2021 • edited Loading

feefladder commented Mar 23, 2021

feefladder Mar 23, 2021

Choose a reason for hiding this comment

benbovy left a comment

Choose a reason for hiding this comment

benbovy Mar 23, 2021

Choose a reason for hiding this comment

feefladder Mar 23, 2021

Choose a reason for hiding this comment

benbovy Mar 23, 2021

Choose a reason for hiding this comment

feefladder Mar 23, 2021

Choose a reason for hiding this comment

benbovy Mar 23, 2021

Choose a reason for hiding this comment

feefladder Mar 23, 2021

Choose a reason for hiding this comment

benbovy commented Mar 23, 2021

jvail commented Mar 23, 2021 • edited Loading

feefladder commented Mar 23, 2021 • edited Loading

benbovy commented Mar 23, 2021 • edited Loading

feefladder commented Mar 23, 2021

feefladder commented Mar 23, 2021

benbovy commented Mar 23, 2021

feefladder commented Mar 20, 2021 •

edited

Loading

feefladder commented Mar 20, 2021 •

edited

Loading

jvail commented Mar 23, 2021 •

edited

Loading

feefladder commented Mar 23, 2021 •

edited

Loading

benbovy commented Mar 23, 2021 •

edited

Loading