Skip to content

Observation Model Resolution Mismatch

Tim Hoar edited this page Jan 19, 2021 · 1 revision

Representativeness Discussion

Ed note: this is a question on a topic we have encountered many times. As such, both the question and answer have been lightly edited to provide a broader context.

... potential issues related to differing resolutions. The footprint of the observation data is incredibly small compared to the model’s resolution, and we are concerned that this might be introducing some spurious noise into our satellite assimilations (an issue that doesn’t appear to be present in the perfect model assimilations, where the observation has the same resolution as the model grid). Is there a localization setting that can alleviate this issue?

We’re also wondering if either of you have had experience with assimilating observation data with very small footprints into a model with a larger footprint, or if you know of any work related to this question.

The problem you describe is called 'representativeness error' or 'representativity error'. The idea is that the forecast model is unable to represent all of the things going on in the real system that impact the observations. The most common problem of this type occurs because models are on a grid that can only resolve phenomena that are larger than a certain scale; something like 4 to 10 grid intervals is the lower bound for the numerical methods commonly used for earth system models. All observations that I am aware of are impacted by all scales that occur in the physical system although many observations apply some sort of implicit spatial (and sometimes temporal) smoothing; that would be related to what you call a footprint (statisticians call this 'support').

The most appropriate way to deal with this issue would be to account for it in the forward operators that map the state vector for a given ensemble member to the expected observation from the instrument. Since the most we know about the representativeness error are it's statistics, this would involve adding a random draw from some estimated representativeness error distribution to each forward operator. However, that approach has not generally been used for earth system DA to date. Instead, it is possible to show that one can get the same results in the case of linear/Gaussian prior distributions by adding additional uncertainty to the observation error distribution. Even this approach is almost always simplified. Almost all earth system ensemble filter applications assume that observational error distributions are Gaussian. Rep error is taken into account by increasing the variance of the specified observational error. In DART, this means that the observational error variance specified in obs_sequence files is increased to try to account for this.

For your particular question, you point out that the footprint of the obs is small compared to the model grid spacing. Of course, many conventional observations of the earth system are basically delta functions. An instantaneous reading from a thermistor gives the temperature at a single point and time (more or less). I think the first order solution to your problem is then to mimic what is done in other applications which is to increase the specified observation error variance.

There is another issue that is subtly related. Observations from something like ICESat may have a small footprint, but there may also be a lot of them relative to the model grid. This means that any correlated error in the observations will be applied to the model many times unless the observations are thinned in some way or averaged. The averaging approach is referred to as 'superob-ing' in the atmospheric DA literature. It is important to note that while averaging of observations would take place, one should not reduce the error variance of the resulting averaged observation as would be appropriate if there were no correlated error. Bottom line is that the first order solution to dense observations may also involve a larger error variance for the obs than would be used naively.

You suggested changing the localization as a solution to this problem and that is also an appropriate thing to try. The correlation between a real-world observation and a nearby model state variable is expected to be less (in absolute value) than the correlation between a synthetic observation (perfect model) and the state variable because the synthetic observation doesn't have all that unrepresented 'noise'. This means that it is often appropriate to localize more tightly for optimal performance in a real data assimilation than in an OSSE. There is some theoretical guidance on how much reduction needs to be made, but it is hard to get enough data to apply those methods and people usually just tune the localization empirically.

The bottom line is that the fact that your observations are from a satellite track is not the fundamental problem. Instead, you share a problem with almost everybody who does earth system DA.