Cell methods: "within"|"over" "days"|"months" and time axis (Section 7.4) #372
Replies: 41 comments
-
@larsbarring I may be wrong, but I think that usage is wrong. I think the proper cell_methods should be "time: maximum". If this is a regular time sequence where the bounds for each time step are the beginning and end of each day, then there is no mean over days and the maximum is assumed to be within the bounds. |
Beta Was this translation helpful? Give feedback.
-
Since these are monthly files, I think what is being requested is the mean over all days of a month of the maximum temperature reached each day. I think the cell-methods is correct, but probably Note that these climatology bounds are the same as the bounds of the month itself, so I'm not absolutely sure that the Also note that if cell_methods were set to "time: maximum", the user would expect that the value recorded would be the absolute maximum occurring during the month (rather than the mean of daily maxima), so this would be incorrect. |
Beta Was this translation helpful? Give feedback.
-
Well,it seems that the conventions text is not quite as clear as it should be, as both @JimBiardCics and @taylor13 think the example is wrong (in different ways?). The example was aa random pick that I downloaded from ESGF CMIP6 (it was not a 'local produce'), so I I am afraid that I will have a hard time retracing my steps to find exactly which one. However, as I wrote, I have come across many CMIP5/6 files having the same metadata construct. To me it seems -- but I could be wrong -- that the example show the typical usage in CMIP5/6. So the question is, how should the CF text be interpreted, and is the typical CMIP5/6 use in line with that interpretation? For me the interpretation of the example is not difficult in itself:
tells me that the data is the mean -- over some period of time given by the time bounds variable -- of the maximum within days (days having its usual interpretation of hours in the interval [00..24[ (as nothing else is explained in a comment within parentheses). Freely admitting that I am newcomer to some of the more advanced aspects of the CF Conventions, I do not immediately see the need to have a |
Beta Was this translation helpful? Give feedback.
-
@taylor13 you are right! This data is, in fact, monthly average Tmax values (the CMIP "Amon" — which indicates atmospheric monthly data — in the variable name is what gives that away), so the cell_methods would be right if the bounds variable was indicated with an attribute named climatology. I agree with @larsbarring that the language in the document is not clear about this, but I've been down this road in the past and I recall being assured by a CF master (@JonathanGregory, I believe) that the "within" and "over" cell_method terms are not for use except in climatologies. It is clear from document that a climatological bounds is indicated using the In my opinion, the CF document suffers from quite a lot of imprecise language and lack of specificity for a standard. We keep running into "but that's what we meant" in a number of different areas. There is a Trac ticket #82 that was intended to address this very question. It's still open and hasn't progressed for three years. |
Beta Was this translation helpful? Give feedback.
-
Thanks @JimBiardCics! In fact I had an email [list] conversation with @JonathanGregory a couple of year back on a related issue where Trac ticket #82 was mentioned. For my purpose back then I came up with what felt a bit like an ad hoc solution. Nevertheless there are a couple of things that keep nagging me regarding all this:
|
Beta Was this translation helpful? Give feedback.
-
Dear Lars, Jim, Karl
I agree that this example shows that we didn't think quite carefully enough
about whether or when climatology bounds are truly needed. If we have
cell_methods = "time: maximum within days time: mean over days"
with climatology bounds of 1850-1-1 00:00 to 1850-2-1 00:00 we mean (according
to section 7.4) that "maximum within days" is applied for the time interval
00:00 to 00:00 within each day i.e. the entire day, and the values are meaned
over all the days within the interval i.e. all the days in Jan 1850. If the
bounds are 1850-1-1 00:00 to 1850-1-31 06:00, we mean that the maximum is
calculated for each day within the interval 00:00-06:00, and the maxima are
meaned over all days of the month. The rule (which I can't see stated in the
text) is that the first climatology bound is the beginning of the first
interval, and the second is the end of the last interval.
In the first case, the entire month is considered in calculating the statistic.
As Karl said, the climatology bounds therefore mean the same thing as ordinary
time bounds would do. In the second case, the two elements of the climatology
bounds imply that the statistic is calculated from 31 noncontiguous time
intervals viz. 00:00-06:00 on 1850-1-1, 00:00-06:00 on 1850-1-2, etc. Ordinary
time bounds describe the beginning and end of a single continuous interval of
time. Climatology bounds may describe a set of discontinuous intervals.
The noncontiguous case is maybe more common for the annual cycle e.g.
cell_methods = "time: maximum within years time: mean over years"
for 1850-1-1 to 1859-2-1. This says to calculate the maximum within the entire
interval of each January from 1850 to 1859, then calculate the mean of these
ten values.
Although the climatological bounds do have a different sort of meaning from
ordinary time bounds in the noncontiguous case, perhaps we don't really need
to use a different attribute for them. The possibility that the bounds *might*
refer to a set of discontinuous intervals is implied by the cell_methods.
Maybe we should use the presence of within/over days/years in the cell_methods
as the flag for climatological time, and use the ordinary bounds attribute
for climatological time bounds, since it's clear enough, as Lars says. That
simplification would be backwards-compatible if we continued to allow the
climatology attribute for climatological time, perhaps deprecated (so that the
CF checker gives a warning).
Best wishes
Jonathan
|
Beta Was this translation helpful? Give feedback.
-
Dear Jonathan, Thanks for this explanation. I agree that the continuous and non-overlapping, or non-continuous time axis is a key here. In your first example with a continuous time axis: In the second example the presence of the climatology attribute, or not, will make the distinction between the situation when a cell methods is applied to a set of noncontiguous time periods and the situation when same cell methods is applied to sequence of overlapping time intervals. Hence I would like suggest that i/ the This would clarify and clean up the meaning of the Kind regards, |
Beta Was this translation helpful? Give feedback.
-
Dear Lars
I agree with your analysis of the distinction and that the noncontiguous
implied intervals look like they're overlapping if you don't interpret them
as climatological. Nonetheless, I would go further than you. I don't think
we can rely on the presence of "within" and "over" in the cell_methods to tell
us it's climatological time, and we could thus get rid of the need for the
climatology attribute. Do you or others see any pitfall in this simplification?
Best wishes
Jonathan
|
Beta Was this translation helpful? Give feedback.
-
Dear all, Yes, I think this is how we should have originally indicated climatologies (no need for a "climatology" attribute with the bounds). And it would appear that implementing it now as the preferred method could be made backward compatible, but any software that identifies climatological data by looking for the the "climatology" attribute would have to be updated. I hope @davidhassel will comment on whether this would be a problem for the data model. best, |
Beta Was this translation helpful? Give feedback.
-
I fear that the last few comments have left me a bit confused. Would someone please summarize the current understanding? |
Beta Was this translation helpful? Give feedback.
-
What is being suggested, I think, is:
|
Beta Was this translation helpful? Give feedback.
-
Well, actually what I was suggesting was to keep the To me there are two reasons for keeping the To sum up, the Kind regards, |
Beta Was this translation helpful? Give feedback.
-
@larsbarring @taylor13 Thanks for your clarifications! I think Lars' suggestion about decoupling the use of |
Beta Was this translation helpful? Give feedback.
-
Very interesting thread, thanks @larsbarring for bringing this up. We, at ECMWF, use the construction One typical use case is "monthly means of daily means" where we define the following
|
Beta Was this translation helpful? Give feedback.
-
From my reading of the conventions, "within" must precede "over" in We can discuss whether the conventions need to be extended to include the kind of description used in the ECMWF file. It might not be obvious, but "climatology" as used in section 7.4 of the conventions extends the concept of climatology beyond the most common use case involving multiple years of data (e.g., 30-year climatology). In CF a climatology can refer to data from a single month, for example: |
Beta Was this translation helpful? Give feedback.
-
Hello All, this area of the convention causes a lot of confusion, so I agree with suggestions that the explanations could be improved. My provisional answers to Jim's 4 questions are: (1) Use for non-climatological variables: yes, if you consider monthly mean Tmax as non-climatological; (2 and 3) I’m not sure about the premise of these questions. I'll give more detail below, but I feel that the cell methods string should be used to give broad, notional information (using Jim’s words) and the climatology attribute can be used to add more detailed information. (4) The status quo is, as Jonathan explained, clearly restricted to the 3 listed forms. I support retaining this restriction to an explicit list, but I can see the case for extending the list. Jonathan has said that section 7.4 only applies to "climatological data", but I don't believe that monthly mean daily maximum temperature, which motivated Lars's query, is a variable would be considered as a climatological variable outside the CF Convention. Jim has pointed out that the Perhaps the section would be clearer if we stressed from the start that (currently) 3 forms are supported and turned these 3 supported forms of the cell methods string into 3 subsections:
The redundant repetition in |
Beta Was this translation helpful? Give feedback.
-
Getting back to the question posed by @larsbarring and his example of cell_method = "time: mean within hours (5 minute interval) time: maximum over hours" — I may be wrong, but I don't see any sense in which this example would be interpreted as a climatology. My working definition for a climatology is "a mean of a measure over a set of intervals that represent roughly equivalent parts of multiple diurnal or annual cycles." (There can be other meanings, but I think this covers the great majority of cases.) Using this rubric:
The connection to diurnal and seasonal cycles is why days and years are currently the only valid objects for within and over. The |
Beta Was this translation helpful? Give feedback.
-
Hi Jim, I would be happy with that definition of what constitutes "a climatology" .. but it looks to me as though this would exclude Example 7.14 from the convention, which is monthly maximum daily precipitation totals (the same as you last example except that it is over contiguous days rather than hours, and deals with precipitation rather than temperature). Do you also see Example 7.14 as being outside your definition of a climatology? Example 7.14 is very close to the CMIP example that Lars is asking about, which shows, I think, that the latter falls within the intended scope of section 7.4. There are also broader interpretations of "Climatological Statistics". All the formal definitions of climatology that I could find simply define it as a synonym for "climate science". Some institutions even include variables such as monthly mean temperature under the heading of climatological data. I think this latter usage is more common in the UK than in the US, so there may be a difference in usage between the two countries. |
Beta Was this translation helpful? Give feedback.
-
Hi Martin, Jim, Yes, example 7.14 is indeed similar (in principle the same) as the CMIP example I used in the initial post, implying that it falls under section 7.4. I can (easily) accept this, but then I do think that it is confusing that monthly mean temperature is not treated in the same way (i.e. having a Maybe the can of worms that I seem to have opened is difficult to recan using [only] the existing CF mechanisms. Either we end up with certain CMIP5/6 datasets being not fully consistent with CF, or the CF attribute |
Beta Was this translation helpful? Give feedback.
-
@martinjuckes Example 7.14 pushes my definition pretty hard. It is based on diurnal cycles, but it takes a maximum rather than a mean over the longer time interval. I didn't, for simplicity's sake, try to stretch my definition to include different operations for the longer time interval, so that last bit doesn't bother me too much. Having said that, I think a time series like Example 7.14 probably isn't really a climatology. If what we mean by climatology is a baseline profile that we can use to study long-term change over time, then the example fails the test, doesn't it? (If I understand correctly, a climatology could be spatial rather than temporal, but the CF convention and this discussion are about temporal climatologies.) So you are right, we already have a conventions example of the "climatology mechanism" being used for what I believe to be non-climatological data. And if the CMIP6 dataset had used a |
Beta Was this translation helpful? Give feedback.
-
@larsbarring At the moment CMIP5/6 datasets that look like your original example are not CF-compliant. In fact, even if we change the convention, they won't be compliant with the version of CF they declare themselves to be following via the |
Beta Was this translation helpful? Give feedback.
-
@larsbarring : I think there may be different views on compliance here. I don't understand the basis for the assertion of non-compliance from @JimBiardCics . The files in question are considered as error free by the CF Checker. |
Beta Was this translation helpful? Give feedback.
-
@martinjuckes This is part of the on-going imprecision problem in CF. There is no mention of the 'within x over y' formalism in any area other than the Climatological Statistics section. |
Beta Was this translation helpful? Give feedback.
-
@martin Maybe the following points will illustrate the problem:
|
Beta Was this translation helpful? Give feedback.
-
Hi @larsbarring , I noticed the discussion you refer to, but I don't agree with the statement that accepting these files as compliant renders the The CF interpretation of the There is plenty of room for debate here, but it is clearly possible to interpret these files as complying with the convention as it stands and retain the |
Beta Was this translation helpful? Give feedback.
-
Hi @martinjuckes , Yes, I find what you wrote constructive and going in the right direction. Some specific comments:
Yes, I certainly agree with this. But what implications does this have regarding when/where cell method constructions like
Trac ticket cf-convention/cf-conventions#82 is all about this, isn't it? So, a positive outcome of this discussion might be that this ticket is revived, and brought to acceptance with any relevant new ideas from this thread taken on board. A more detailed explanation along what you outline in your comment would go a long way towards clarifying section 7.4. Finally, several of the recent comments mention Example 7.14. While it shows how the convention can be used, rather than how it typically would be used to help a novice user understand how to apply the mechanisms. As such it is not a particularly good example, rather it is more confusing than enlightening. |
Beta Was this translation helpful? Give feedback.
-
Hi Lars, thanks, and well done for spotting the link with Trac ticket cf-convention/cf-conventions#82. My initial response was to try to keep that discussion separate but, on reflection, I think it is necessary and beneficial to consider it jointly with this issue. It may provide an alternative (perhaps clearer) means of expressing Concerning the multi-step operations use-case (Trac ticket cf-convention/cf-conventions#82), I support the idea of adding something to the conventions, but I would formulate it slightly differently. As it stands,
This is slightly different from the suggestion made by Jonathan in the Trac ticket: the approach would not allow the syntax to express the interval of the input data for the first method (mean in the above example). This construct can be introduced independently of
The relevance to this ticket is that This approach is more flexible than the With this approach, we would use the There would still be a use for the In brief, there is the potential for all the essential characteristics of the averaging periods of a climatology to be expressed within the If this isn't shot down (it would be nice if there was a simpler solution), I suggest we discuss it at the next CF meeting. |
Beta Was this translation helpful? Give feedback.
-
Hello, The timings and order of the breakout groups for the CF meeting next week has now been set (see http://cfconventions.org/Meetings/2020-Workshop.html), and the discussion of this issue will be on Wednesday 10 June from 17:30-19:00 UTC, in parallel with three other topics. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Notes from the 2020 CF Workshop Breakout session on Cell Methods is now available. Highlights:
|
Beta Was this translation helpful? Give feedback.
-
A use case for cell_methods involving climatology: We have an archive of regional climate model outputs. To improve their usability for impacts users, we generate various aggregations at different frequencies. We provide daily, monthly, and seasonal timeseries data, as well as monthly and seasonal climatology data. (To be explicit: a ten-year monthly timeseries file would contain 120 timesteps, one value for each month in sequential order; a ten-year monthly climatology file would contain 12 timesteps, each an average of the values for that month over all ten years.) The workflow for generating these files is chained. To generate a monthly climatology, we first calculate a monthly average timeseries from a daily timeseries by averaging together daily values for each month, then calculate a monthly climatology by averaging together monthly values across multiple years. If the data variable is something like tasmin or tasmax (daily minimum or maximum temperature), some models will output that directly, but in other cases we may need to calculate it from hourly values. The cell_methods attribute for tasmax monthly climatology thus ends up looking like this: As best I understand the spec, this is completely CF-compliant and correct, although it requires some human interpretation to understand that it means we started with daily maximum values, averaged them to monthly values, then averaged those to monthly climatology. The optional |
Beta Was this translation helpful? Give feedback.
-
This Discussion was opened as issue #197 in the conventions repository
In section 7.4 the use of cell method constructs "within year", "over days" and similar are explained in context of climatological time axis. From this I get the impression that these constructs are only allowed if a climatological time axis. But I guess that this is not the correct interpretation?
I am asking because I have come across numerous CMIP5/CMIP6 files of monthly tasmin/tasmax suggesting that the construct can be used also in connection with a 'normal' time axis. Here is an example from a CMIP6 file
I suggest it would be useful to clarify when/where the cell method constructs
within
|over
days
|years
can be used.Lars
Beta Was this translation helpful? Give feedback.
All reactions