-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments on Chemostat and ASTroCAT prototypes #1
Comments
Hi Benoît, thank you very much for your help! Sorry for the ambiguous questions, your reply definitely sent me in the right direction. I have written up a new chemostat model draft, that fixes a few of the problems the previous one had, and I am pretty sure that this could be the basic structure of all models in the package. You can have a look here: I have thought quite a bit about added functionality, that xsimlab as of yet does not provide, but that I would like the phydra package to provide at some point, and that is: 1. Solving the model via solve_ivp() or odeint(), i.e. without explicit time loop 2. Provide some diagnostics or visualizations that show all fluxes, and the components they act on, in a clear overview (without ModelSetup and connections for inheritance/variables, just the plain fluxes, connecting state variables with their direction. To display and potentially debug the system of ODEs/PDEs that describe the model, ideally also with units) I think 1. is already an issue you opened on the xsimlab repo, whilst 2. might just be a functionality of my package. Perhaps both can be built into xsimlab... i have a feeling that they are technically related (i.e. dealing with ODEs). My rough idea to solve 1.+2. within phydra now, would be to include a The key "problem" underlying this is that I am only dealing with differential equations, and the self.run_step() in @xs.process is of course not specifically optimized for this case. Perhaps a self.differential() method could be created instead? I think this would be possible similarly to how I define my "Fluxes" in this example. The way I understand it, the method could easily be backwards compatible with self.run_step() and explicit time steps. A possible self.differential() method, would need to
Sorry for my lack of knowledge to spin this further without spending quite some time of prototyping. Does this make any sense to you? If so, I'd be happy to work further on developing the ideas. I have also not yet had an idea for a useful way to define a "Flux" parent class, because like the dimensions of each variable, each flux needs a more specific assignment of "groups" in the child class. But even without any of these wish-list features, the functionality of the example chemostat model is enough to warrant finishing a first version and publishing that. I'd be happy to hear any suggestion you might have on improving it! Finally, to spin further on the dimensionality question, but more specifically here: |
@benbovy I have worked the structure out a bit more, and written a 2 dimensional model prototype. Now that I have a better practical understanding of the xsimlab structure, I see why the questions I wrote in the last comment are quite vague and rather complicated to solve. I will write the first draft of the package as is, and only then start thinking about implementing some of the additional functionality I mentioned. I'd be very happy to hear your comments about the 2D model implementation. Something I am not entirely clear on yet is how to achieve flexible dimensionality. I will start moving my prototype code from the notebooks to a |
I see that you have used an approach that is opposite to the one I used in my gist, i.e., your component classes declare their
I like the analogy between model construction vs. model runtime in xarray-simlab and compile time vs. runtime in compiled languages. Model construction is like compilation, it performs some operations and returns a static Model object. So, by design, it's not possible to change the model structure during runtime (i.e., from model input values). Variable dimensions are part of the model structure, they are static and hard-coded. You could still use some constants to define them in other to have a more DRY code, but you can't define them from model input values. One way to achieve flexible process classes is using class factories, e.g., something like: import attr
@xs.process
class BasePhyto:
# ...
def make_phyto_cls(new_cls_name, arg1, arg2):
attrib_dict = attr.fields_dict(BasePhyto).items()
attrib_dict.update(...) # some logic to re-define some variables
new_cls = attr.make_class(
new_cls_name,
attrib_dict,
bases=(BasePhyto,),
init=False,
repr=False
)
return xs.process(new_cls)
my_model = xs.Model({
'p1': make_phyto_cls('Phyto1', arg1, arg2),
'p2': make_phyto_cls('Phyto2', arg1, arg2)
}) This is used internally in xarray-simlab _ProcessBuilder. Alternatively, you could add a dimension for |
Thanks so much for the explanations! Class factories might just be what I was looking for, I'm currently testing some implementations. I have not tried it so far, but I will probably need to add a few wrapper functions to xsimlab for my specific application. Regarding the way that states are handled in the 2D chemostat prototype, this was inspired in particular by the advection example model presented in the xsimlab docs. I'll try to explain my reasoning below. It is not perfectly worked out yet, but I hope it explains a bit more of what I am trying to achieve.
This process collects all other processes acting on the property
So like the advection model, the marine ecosystem models I want to build can be setup on a grid (i.e. the Each Now I think the way the grid is built is quite straightforward, what I am not certain of yet is how to structure the Ideally it would be possible to define separate fluxes for each dimensional level: All my current ideas require some level of custom labeling/routing of fluxes, so that processes need to be rewritten for different setups, instead of having a modular toolbox where models can be assembled from simple components, only with different parameters (..which might also just be a bit too idealistic of a goal). I need to work on this a bit more, but there was one more specific question that I had:
|
Thank you for pointing me in the direction of class factories @benbovy. I was just working on implementing a minimal example, but
returns this error:
I have never used EDIT:
out: unfortunately this seems to only be able to rename the class, not change the variables. |
I could solve the previous problem simply by using a function that creates a new subclass of a
I am not sure if that is the best way to do it, but it does the job. This function is stored in a new utility module of phydra. EDIT: I just found that there are issues with using The last weeks I have been working a lot more on the core code, and one important thing I realized is that I need some kind of solver backend that handles solving these models with an implicit time step (since I am only dealing with ODEs & discretized PDEs). The solution to this that I found is using the python GEKKO package, that provides an interface to the AP Modeling language. It provides a lot of solving tools for ODE systems, I am specifically using the sequential dynamic simulation mode (IMODE=7). There are certainly more elegant ways to interface it with xarray-simlab.. what I have done now is: creating defaultdicts in a Context process, that are passed around and added to by processes and contain the specific model components as gekko objects (i.e State Variables, Intermediates & Equations). Model evaluation happens within the first time step of clock:[0,1] and output is returned from the state variables in the context defaultdict to xs.variables with the correct dimensions in the finalize_step() of each Component Process. A first prototype is shown in this notebook. Gekko compiles the model code before solving, which should mean faster solving times as well (I haven't run any meaningful tests yet). Unfortunately this looses some control & benefits from xarray-simlab's discrete time-step implementation, but I feel like it makes more sense for the types of models to utilize a different and more appropriate solving method. Any input or help would be very much appreciated. I need to finish a first version of phydra within the next few days, so I have to move forward with what I have. |
A few comments regarding process class factories: Looking at your comment, you can't use a dictionary of In your last comment, the class closure seems to work but it's not very nice: see this gist, in which I wrote an alternative solution. This solution does not use |
EDIT: I assume the question in this message is irrelevant, because there is an open issue about this exact functionality on the xsimlab repo. @benbovy, thanks a lot for the explanation! Initializing components with the Another problem that I just came across relates to the simulation Stages that are built into xsimlab by default. Is there any way to modify them for my specific implementation?
do not allow me to define some additional fluxes, because run_step (2.) is already the stage at which full model equations are assembled and gekko context solves the model (clock is [0,1], evaluation automatically in right order with current process structures) and finalize_step (3.) is the stage where the gekko output is stored back in xsimlab variables. I would need an additional step between (1.) and (2.), or a way that I can force a certain process to be evaluated at the beginning of run_step (2.) before the model equations are assembled within the More specifically, what I am trying to do: I need to initialize a defaultdict in Process A that I can dynamically add terms to in other processes (e.g. Process B). Process B inherits the defaultdict via xs.foreign() from Process A. Now I initialize Process A, then Process B and then I need to "initialize" Process A again with all items in the defaultdict. When I add the code to Is there some way to easily add to, or modify SimulationStage in the process.py? or another way this could be achieved? |
@benbovy This notebook highlights an error I receive by using the The error message points quite obviously to where the problem lies, that Concerning my last message, I realized there is already an open issue on the xsimlab repo. For now, I am working with the limited simulation stages. EDIT: found a working solution (at least as a prototype), that also allows flexible initialization of an index var, with also a specific variable name supplied:
I am not sure if this use of classes & attributes is good practice. |
That's right,
While it seems to work, it looks very hacky. I do think there are cleaner solutions. For example, you could avoid using |
Thank you for the feedback! From a user perspective I would expect the zarr storage to instead use the label supplied to the dims keyword for storage, but I assume from your wording that this behaviour of my current solution for flexible dimensions:I might be misunderstanding xarray-simlab design principles, but I am looking for a solution to flexibly create linkages and dimensions between variables in processes after model setup that provides a straightforward user interface. I am reluctant to simply base this on inheritance, because a custom subclass of a process written for each state variable in an ecosystem model seems rather cumbersome for an end user that might not be familiar with object-oriented python programming, particularly when dealing with complex food-webs and multiple functional groups of state variables. Combined with the caveat that class FunctionalGroup:
"""
undecorated class to create xs.process functional group
containing multiple state variables that share fluxes
"""
label = xs.variable(intent='out', description='the label supplied at model initialisation')
value = xs.variable(intent='out', dims=('not_initalized', 'time'),
description='stores output in dimensions supplied at model init')
num = xs.variable(intent='in', description='number of state variables within group')
initVal = xs.variable(intent='inout', description='initial value of component')
m = xs.foreign(GekkoCore, 'm') # this contains the GEKKO model instance (for assembly and solving)
def initialize(self):
self.label = self.__xsimlab_name__
print(f"functional group {self.label} is initialized with size {self.num}")
# this creates numbered index with label:
if self.num == 1:
index_list = [f"{self.label}"]
else:
index_list = [f"{self.label}-{i}" for i in range(self.num)]
setattr(self, self.label, index_list)
# self.m.SV creates a state variable in the GEKKO model
# that is stored in a custom defaultdict shared by all processes
self.m.phydra_SVs[self.label] = np.array([
self.m.SV(name=f"{self.label}_{i}", value=self.initVal) for i in range(self.num)])
self.value = [sv.value for sv in self.m.phydra_SVs[self.label]]
def run_step(self):
print(f"assembling Equations for {self.label}")
# This is the special step needed for the gekko backend, where the equations are assembled
# from all initialized influx in previous initialize step
gk_array = self.m.phydra_SVs[self.label]
fluxes = self.m.phydra_fluxes[self.label]
for i in range(self.num):
self.m.Equation(gk_array[i].dt() == sum([flux[i] for flux in fluxes]))
@classmethod
def setup(cls, dim_label):
""" create copy of process class with user specified name and dimension label """
# have to create a copy of the class, so that classmethod does not modify the base class
new_cls = type(cls.__name__ + dim_label, cls.__bases__, dict(cls.__dict__))
# add new index with variable name of label (to avoid Zarr storage conflicts)
new_dim = xs.index(dims=dim_label, groups='SV_index')
setattr(new_cls, dim_label, new_dim)
# modify dimensions
new_cls.value.metadata['dims'] = ((dim_label, 'time'),)
# return intialized xsimlab process
return xs.process(new_cls) This would then be called at model setup, like so: MODEL = xs.Model({'core':GekkoCore,
'solver':GekkoSequentialSolve,
'time':Time,
'N':StateVariable,
'P':FunctionalGroup.setup('P'), # same label needs to be supplied for now (todo)
...
}) Instead of hacking the flexibility like this, I could write a sub-process for each model component, but this seems to introduce unnecessary complexity to the library and requires a user to write a new process for every new component with a unique dimension and label. For now, I will probably have to stick to my hacky code of adding a custom index via a The only way I could think of solving this using xarray-simlab's built-in functionality, would involve initializing all state variables separately (perhaps with a wrapper process factory called at model creation and setup) and aggregating these subprocesses to a functional group via in reference to the github issue 52 on the xarray-simlab repo:As shown in the code snippet above, my current code structure requires using the It would be very useful if there was a way I could use something like the following: @xs.process
class Component
# x = xs.variable()
@xs.initialize(level=0)
def initialize_state variables(self):
# initializing state variabels
@xs.initialize(level=1)
def initialize_equations(self):
# initializing equations In order to assemble the model equations from all components and fluxes previously initialized. Then Phydra could have the option to use the This could probably be solved by adding another process per each SV (or functional group) that assembles the equations. But this seems again to unnecessary complicate the user interface. I am not sure if my intuition here is misguided or if this type of flexibility is better solved in another way. |
@benbovy I have completely redesigned the framework over the last months, and am now using a model core class stored as an Now the package works well even with multi-dimensional ODE-based models, and I feel confident to close this issue down, and open new & more specific issues in its stead. The code is far from clean, and not documented yet, so now comes the hard work of polishing and making it ready for publication. I don't know why you stopped responding to my emails, but I assume there is a good reason, if it is some technical issue (e.g. your email has changed) this is my last try to reach out to you. I definitely appreciate all the advice you have given me at the start of the project, and this work wouldn't be possible without xarray-simlab as a starting point and framework. Thank you! |
Impressive work @ben1post! I replied to you last email. |
Some answers on questions in https://github.com/ben1post/phydra/blob/master/prototypes/01_Chemostat_MODEL.ipynb and https://github.com/ben1post/phydra/blob/master/prototypes/02_ASTroCAT_MODEL.ipynb
(note: I had only a brief look and I need to look at it more in depth)
Instead of having separate process classes for each instance or having lists of lists, you could add those instances as a new dimension. It is compatible with the xarray data model, and I think it's cleaner code as you would deal with "flat" numpy arrays insteads of nested lists or many process classes. It would look something like:
The downside is that if you want to set different number of instances for each component, then in each component class you'll need to re-declare all the variables accepting the
x_number
dimension, with different labels for each component classes, e.g.,p_number
,z_number
, etc. It is not possible to set dimension labels for variables programmatically from other variable values. But that's not a big deal I think, only a few redundant lines of code.If you want to easily plot the model outputs for all instances of all components, there is a lot of convenient functions in xarray that can be used to re-arrange the data, as a post-processing task. Alternatively, you can create another process class that flattens the data, like I've done in my gist https://gist.github.com/benbovy/f1898ee97d62c6df46976bc93002a14b with the class
AllComponents
(which could be easily updated to support multiple instances per component).I suggest you to implement all component processes (i.e., uptake, etc) in separate process classes instead of internal methods in the components classes, like I've done in my gist. Not only this is much more flexible if you want to drop/replace specific processes, but the model graph visualization will show the components and the processes (although all of them as graph nodes without differentiating component classes and process classes). I find the graph shown in this gist pretty clear.
Could you provide a more detailed example?
Not sure to understand this one. xarray-simlab already stores all this information in output datasets.
If you want to use lmfit for model calibration, no problem, writing a residual function (like shown in lmfit's docs) that simply wraps a model run might do the job, e.g.,
The text was updated successfully, but these errors were encountered: