Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cyclic foreign vars #177

Closed
wants to merge 9 commits into from
Closed

Conversation

feefladder
Copy link
Contributor

@feefladder feefladder commented Mar 20, 2021

@feefladder
Copy link
Contributor Author

feefladder commented Mar 20, 2021

proof-of-concept PR for allowing user-defined custom dependencies,
still needs:

  • tests,
  • documentation
  • solution for on_demand variables
  • (optionally) a way to quickly add chains of succesive processes. (this could be a wrapper that builds a dependency dict, an open question is how to find corresponding variable names)

the arrows from inout still also show up if the process that uses the inout var is set to explicitly depend on it

@feefladder feefladder closed this Mar 20, 2021
@feefladder
Copy link
Contributor Author

So, we probably still have to move the check to a place where we can actually view the model.

For the checking, I used a dictionary of {p_name:{dep1,de2,dep3}}, that is also very useful for any other graph inspection. Do you think it is a small enough datastructure to be kept? (this allows for very concise/faster code in other functions.

Comment on lines +111 to +140
def add_inout_arrows(self):
for p_name, p_obj in self.model._processes.items():
p_cls = type(p_obj)
for var_name, var in variables_dict(p_cls).items():
# test if the variable is inout
if (
var.metadata["intent"] == VarIntent.INOUT
and var.metadata["var_type"] != VarType.FOREIGN
):
target_keys = _get_target_keys(p_obj, var_name)

# now again cycle through all processes to see if there is a variable with the same reference
for p2_name, p2_obj in self.model._processes.items():
p2_cls = type(p2_obj)

# skip this if it is a dependent process or the process itself
if (
p_name in self.model.dependent_processes[p2_name]
or p_name == p2_name
):
continue

for var2_name, var2 in variables_dict(p2_cls).items():
# if the variable is
target2_keys = _get_target_keys(p2_obj, var2_name)
if len(set(target_keys) & set(target2_keys)):
edge_ends = p_name, p2_name
self.g.edge(
*edge_ends, weight="200", **INOUT_EDGE_ATTRS
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need some work still, would be much easier if we have the deps_dict

Copy link
Member

@benbovy benbovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @joeperdefloep for tackling this, that's awesome!!

This is a very important update in xarray-simlab, so I'd suggest to break this PR into several ones if you don't mind, at least one for each of the following:

  • add user-defined process dependencies
  • update how the graph is shown (dot.py)
  • implement and/or expose graph reduction

It will be much easier for me to review this (it's gonna take me some time too).

return self._dep_processes

def _check_inout_vars(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's gonna take a little while for me to digest all of this :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, we can have multiple variables that have an inout somewhere, so we make two dicts with inout an in variables and sets of processes in which that variable is in/inout, so we can run the algorithm on every variable separately.
from now, for one variable:
so a good graph should look like:

  inout1->inout2->inout3
  /  \    ^   \    ^ \
 /    \  /     \  /   \
in1   in2      in3    in4

in_ps = {in1,in2,in3,in4}, io_ps = {inout1,inout2,inout3}
let's say we start at inout2:

  1. io_stack = [inout2]
  2. since inout1 is in deps_dict[inout2], and not in verified_ios=[], we add it to the stack
  3. io_stack =[inout2,inout1]
  4. since there are no inouts in deps_dict[inout1], we are at the root node.
    a. all in variables in deps_dict[inout1] are safe, so delete them: in_ps = {in2,in3,in4}
    b. add to verified_ios = [inout1]
    c. pop: io_stack=[inout2]
  5. inout1 is in deps_dict[inout2] = {inout1} == set(verified_ios) so we have a linear graph in inout variables
    a. child in variables: {in2}
    b. check for in2 if verified_io[-1] = inout1 is in deps_dict[in2], e.g. in2 comes after inout1
    c. add to verified_ios = [inout1,inout2]
    d. remove child in variables: in_ps = {in3,in4}
    e. pop: io_stack = []

now start again from inout3:

  1. io_stack = [inout3]
  2. {inout1,inout2} in deps_dict[inout3], equals set(verified_ios), so safe (same as 10)
    a. for all child in processes: {in3} check if verified_ios[-1]=inout2 in deps_dict[in3] -> good
    b. verified_ios=[io1,io2,io3]
    b. remove child in variables: in_ps={in4}
    c. pop io_stack=[]

now start from inout1: in verified_ios, so skip

final check:

  1. all remaining in_ps={in4}, check for inout4 in in_deps[in4]->good

I think the errors will become more clear if I write some tests


return descendants

def transitive_reduction(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in #164 (comment), I don't think we should apply transitive reduction on the graph, as it may loose some information that is potentially useful to know. Consider the following graph of processes:

 /->-\
A->B->C

It's good to know that if we remove B there will still be a dependency between A and C.

Let me show a real-world example taken from fastscape:

without reduction:

image

with reduction:

image

While the reduced graph looks much cleaner, there's a lot of hidden relationships and it's hard to figure out that, e.g., bedrock depends directly on init_bedrock.

I'd rather keep this feature disabled by default and allow enabling it via several options:

  • a model visualization option: model.visualize(reduce_graph=True)
  • a Model method, e.g., reduced_model = model.reduce_graph() (let's keep models immutable)
  • a Model constructor option, e.g., model = xs.Model({...}, reduce_graph=True).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it's an option at model creation

xsimlab/model.py Outdated
@@ -1108,7 +1293,7 @@ def drop_processes(self, keys):
processes_cls = {
k: type(obj) for k, obj in self._processes.items() if k not in keys
}
return type(self)(processes_cls)
return type(self)(processes_cls, self._custom_dependencies)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this won't be as simple as that, but we can look at this in follow-up PRs if we're sure that errors are raised in case of any inconsistency or ambiguity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @joeperdefloep for tackling this, that's awesome!!

This is a very important update in xarray-simlab, so I'd suggest to break this PR into several ones if you don't mind, at least one for each of the following:

* add user-defined process dependencies

* update how the graph is shown (`dot.py`)

* implement and/or expose graph reduction

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

how the graph is shown turns out to become more difficult: the arrows should be only from the topmost inout variable to the lowest in variables. For this we need the deps_dict and all inout/in variables (the same data as for the sorting algorithm, so we even need to perform a DFS)

@benbovy
Copy link
Member

benbovy commented Mar 23, 2021

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

Besides the implementation, we still have to decide how to expose this feature, whether to enable it or not by default, etc.

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

Honestly, it's too much for me to follow and discuss in a single PR.

Adding user-defined dependencies is a already a big feature that has a high potential to make the framework more flexible, so I'd really appreciate if we could progress incrementally to make sure that we get the design right and the feature robust.

@jvail
Copy link

jvail commented Mar 23, 2021

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

I feel guilty having started this particular feature discussion :) So let me give a practical example from the model I am working on (it's a bit messy right now - but work-in-progress):

model-graph

The process topology (and others) depend on the result of the arch_dev process (t minus one). And imho it would be a good improvement if that could be visualized somehow.

@feefladder
Copy link
Contributor Author

feefladder commented Mar 23, 2021

graph reduction (now) heavily depends on the deps_dict (and is really just a few lines), so I don't think that shoudl be split.

Besides the implementation, we still have to decide how to expose this feature, whether to enable it or not by default, etc.

Also, it's not yet very clear to me what is the motivation behind the dashed arrows for inout variables.

Honestly, it's too much for me to follow and discuss in a single PR.

Adding user-defined dependencies is a already a big feature that has a high potential to make the framework more flexible, so I'd really appreciate if we could progress incrementally to make sure that we get the design right and the feature robust.

Ok. I get that, I can move the reduction to another PR, as well as maybe adding custom dependencies.
Then we have:

  • custom dependencies
  • dependent on the deps_dict:
    • strict checking
    • graph reduction
  • maybe: dotted arrows

This will quite test my Git skills....

@jvail no worries, the main part of this PR is about enabling (and verifying) inout variables, that was already an issue.
The dotted arrows part will need some more work and is actually quite complex, not sure if I'll find time for that

@benbovy
Copy link
Member

benbovy commented Mar 23, 2021

Thanks for the illustration @jvail.

I was getting confused because we (I feel guilty too) involve two separate problems in the discussion here and in #164:

  1. we have a quantity that we want to use and possibly update during the same timestep (or simulation stage) in one or more processes executed with some specific constraints on the workflow.
  2. we have a quantity a which depends on quantity b, but b depends on a computed at the previous timestep or simulation stage.

Problem 1 is what I thought @joeperdefloep you describe in your #164 (comment) (as you mentioned Fastscape's SurfaceToErode which is a workaround to that specific problem). The introduction of user-defined process dependencies (+ allow intent='inout' for foreign variables) would solve this problem.

Problem 2 is what @jvail you initially describe in #164. The current workaround is to declare an inout variable for a, define an input value as the initial value at t = -1, and make sure that it is not updated during the same step/stage than when it is accessed to compute b. However, this approach is a bit fragile. With the introduction of user-defined process dependencies, you will instead need to provide an explicit dependency when building your model so that you can safely compute b before updating a during the same step/stage.

That said, for problem 2 you'll still need an inout variable (+ input value) for a. If you want to define a with intent='out', then the cleanest way to do this is to introduce a whole new concept of a 3rd dimension (or some kind of step/stage unroll) for the graph of processes in xarray-simlab (cf. your #164 (comment)). I don't think it's worth the extra complexity, though (the intent concept is also invariant of the simulation stage). Alternatively, we could add an exception to the variable intent heuristics (cf. declare a variable as seemingly_cyclic) but I don't really see much advantage of this approach compared to the inout + input value workaround.

@feefladder
Copy link
Contributor Author

@jvail the difficulty:
imagine the graph:

  inout1->inout2->inout3
   /   \ /         \
 in1   in2          in3

then, we want arrows only from inout3 to in1, so we need some sort of sorting algorithm for that. For all inout variables. That is quite the algorithm.

@feefladder
Copy link
Contributor Author

@benbovy I think I will close this PR until I have properly split it

@feefladder feefladder closed this Mar 23, 2021
@benbovy
Copy link
Member

benbovy commented Mar 23, 2021

Yeah sorry @joeperdefloep a fresh start in new branches is probably the best thing to do.

What we need first is a PR that

  • extends the Model API so that we can provide explicit process dependencies
  • implements the logic in _ModelBuilder to merge those dependencies with the implicit depencies (i.e., those retrieved from variable intents)

In the same PR or the next one:

  • allow intent='inout' for foreign variables
  • update graph validation in _ModelBuilder: ensure that explicit dependencies exist for all ambiguous cases

Then in follow-up PRs:

  • implement graph reduction/optimization/simplification and expose that through some API (if better graph visualization is the only motivation I would even suggest to expose that only as an option to Model.visualize for now)
  • revisit how model output snapshots are saved
  • update the docs: we really need better docs about how process ordering is done in xarray-simlab
  • better handling of explicit deps propagation: maybe some smart dependencies reconstructions when explicit deps are broken after model.update_processes or model.drop_processes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explicit cyclic foreign vars
3 participants