-
Notifications
You must be signed in to change notification settings - Fork 145
1 degree, CAM6, ensemble reanalysis for CESM experiments (2011 thru 2019): DATM, hindcasts, model evaluation
- This page describes the data sets available in
- https://rda.ucar.edu/datasets/ds345.0
Register if you need to, then "Go to" 345.0. Then the "Data Access" tab.
They are the result of a multiyear ensemble reanalysis using DART as the assimilation software to assimilate several million observations per day into CAM6 . This results in 80, equally likely, CESM states consistent with the actual weather and CAM6 physics.
- consistent
- explicitly uses the 2 main sources of uncertainty (observation errors and model ensemble spread) to balance the information in the obs and in the model hindcast.
- Much of the data is available every 6 hours and spans 2011-2019.
- The ensemble provides realistic variability; the ensemble spreads embody the uncertainties in the model and the observations.
- This unique combination of a large ensemble and long time span can be used in many novel ways.
- These include:
- realistic atmospheric forcing of all surface components in ensemble simulations and assimilations, with justifiable variability.
- real world initial conditions for CAM (CLM, CICE, and MOSART?) ensembles with justifiable spread.
- model improvement through direct comparison with observations in all seasons through 9 years.
- The data have been packaged in convenient units of data (file sizes) for easy download.
- The files are organized by CESM component (cpl, atm, esp, ...).
- These appear on the RDA web page as the following PRODUCTS.
- The meaning of file and directory name parts, which appear in the description below this table.
-
YYYY or YY: the year of the data (or last 2 digits) MM: the month of the data DD: day of the month SSSSS: seconds of the day INST: the instance (CESM terminology) or ensemble member, from 1,...,80, usually padded with 0s to 4 digits. .tar: file can be unpacked with 'tar -x -f file_name.tar' or individual files in file_name.tar can be extracted. .tgz: file should be unpacked using 'tar -x -z -f file_name.tgz', .gz: file should be decompressed using 'gunzip file_name.gz',
- CESM 2.1 release, also used for CMIP 6.
- Atmosphere: CAM6.0.34
- 0.9 degree lat. x 1.2 degree longitude, 32 levels.
- Land: CLM 5.0 BGC-CROP version, same grid as CAM.
- SST: specified daily 0.25 degree from AVHRR.
- CICE: coverage specified in SST file, the rest prognostic.
- MOSART river model.
- Aerosols, greenhouse gases, volcanic forcing: from CESM; historical data through 2014, CMIP6 scenarios after that.
The Data Assimilation Research Testbed options enabled:
- ensemble adjustment Kalman filter
- enhanced adaptive inflation (reference DART page or el Gharamti 2019)
- horizontal and vertical localization of observations (0.30 radian full radius Gaspari-Cohn)
- 80 members
- sampling error correction
- Model State: PS, T, U, V, Q, CLDLIQ, CLDICE, stored in CAM "initial" files.
- ? add a pared down version of input.nml at the end?
We use wind and temperature observations from airplanes, radiosondes, and satellites, and GPS refractivity observations (basically, density). There are several million observations per day.
- refer to input.nml? Or list obs_types here?
- picture?
The ds345.0 site has a table with the following PRODUCTS, which are the section headers here. The item within each section starting with a CESM "component" (e.g. "cplINST) represents a set of subdirectories in the RDA which contain the data files. Files and directories with INST contain data for one member. For the others INST is not relevant.
- Use for hindcasts of surface components: CLM, POP(MOM), CICE, MOSART, CISM, WW3 including data assimilation experiments via CESM's Data Atmosphere mode (DATM).
- 1 year, 1 member (INST) per file.
- The date frequency varies among files as appropriate.
cplINST
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cpl_INST.ha2x1d.YYYY.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cpl_INST.ha2x1h.YYYY.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cpl_INST.ha2x1hi.YYYY.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cpl_INST.ha2x3h.YYYY.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cpl_INST.hr2x.YYYY.nc.gz
- (NOTE; file names can be sorted by clicking the "File Name" column header)
- The full ensemble ("allinst") of model states from the end of the hindcast ("forecast"), possibly inflated by DART's adaptive inflation algorithm ("preassim").
- They are available at 00Z every Monday, with extra dates at the beginning and near the end of each month.
- The ensemble means of these ensembles are available in "External System Processing (DART) Files".
- The files contained in each tar file are NetCDF, with CESM gridded data.
atmYYYYMM
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cam_allinst.e.preassim.YYYY-MM-DD-SSSSS.tar
- OR
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cam_allinst.e.forecast.YYYY-MM-DD-SSSSS.tar
The data is available every 6 hours within month-long files. The NetCDF files in 1.-8. are on the CAM6 grid. They contain only the model state and associated meta_data. To decode the file types: :mean: ensemble mean :sd: standard deviation (ensemble spread) :forecast: = prior = model state after the hindcast and before assimilation :preassim: = prior = "forecast" after inflation has been applied to the ensemble. NOTE: the forecast mean is equivalent to the preassim mean because the inflation preserves the ensemble mean. :output: = posterior = "analysis" = model state after the assimilation. :priorinf: the spatially and temporally varying inflation values to inflate the ensemble. espYYYYMM
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.e.cam_preassim_mean.YYYY-MM.tar
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.e.cam_preassim_sd.YYYY-MM.tar
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.i.cam_output_mean.YYYY-MM.tar
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.i.cam_output_sd.YYYY-MM.tar
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.rh.cam_output_priorinf_mean.YYYY-MM.tar
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.rh.cam_output_priorinf_sd.YYYY-MM.tar
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.rh.cam_preassim_priorinf_mean.YYYY-MM.tar
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.rh.cam_preassim_priorinf_sd.YYYY-MM.tar
-
Diags_NTrS_YYYY-MM.tgz
Observation space diagnostic pictures for month long time series and monthly summary vertical profiles of the RMSE, bias, and total spread of the model estimates of the observations relative to the actual observations. These are derived from the "obs_seq_final" files (see below) using DART's obs_diag.f90 program and Matlab scripts. The existing files illustrate the product for 3 regions:
Northern hemisphere: latitude > 20 Tropics: -20 < latitude < 20 Southern hemisphere: latitude < -20
Other regions can be defined and these files reprocessed.
-
f.e21.FHIST_BGC.f09_025.CAM6assim.011.cam_obs_seq_final.YYYY-MM.tgz
Files containing the actual observations and the ensemble of model estimates of the observations. There is 1 file at each assimilation time. These are binary files which can be processed by obs_diag.f90 into NetCDF files, which are further processed and displayed using DART's Matlab scripts (or your preferred software). Different regions than those used to generate Diags_NTrS_YYYY-MM.tgz can be specified when running obs_diag.f90.
CLM history files, each containing a year of output from 1 member (INST) with variables grouped as follows.
- h0 = 'TSA','ER','EFLX_LH_TOT','HR'
- h1 = 'CPHASE', 'GPP', 'GRAINC_TO_FOOD', 'GSSHALN', 'GSSUNLN', 'NPP', 'NPP_NUPTAKE', 'PLANT_NDEMAND', 'QVEGT', 'TLAI'
NOTE; 2011 may have these variables grouped differently in the files, and also different variables.
lndINST
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.clm2_INST.h0.YYYY.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.clm2_INST.h1.YYYY.nc.gz
- All the file types required to (re)start a single instance CAM6 hindcast.
- Some of the files have been compressed using gzip.
- Any sized ensemble (up to 80) can be selected from this set.
restYYYY-MM
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.INST.alltypes.YYYY-MM-DD-SSSSS.tar
The files in CESM Flux Coupler (cpl7) Files : cplINST" are ready for use by CESM in DATM mode. Each surface component needs a different subset of the 5 files (ha2x1d, ha2x1h, ha2x1hi, ha2x3h, hr2x), which are available for each year. These are often referred to as data "stream files".
Each "alltypes" file ( CESM Restart Files including Initial Files : restYYYY-MM) contains a set of files which can be used to (re)start a single instance (member) hindcast usign CAM6 andor other COMPSET.:
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cam_INST.i.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cam_INST.rs.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cam_INST.r.YYYY-MM-DD-SSSS.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cpl_INST.r.YYYY-MM-DD-SSSS.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.docn_INST.rs1.YYYY-MM-DD-SSSS.bin
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.clm2_INST.r.YYYY-MM-DD-SSSS.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.clm2_INST.rh1.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.clm2_INST.rh0.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.clm2_INST.h1.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.clm2_INST.h0.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cice_INST.r.YYYY-MM-DD-SSSS.nc.gz
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.mosart_INST.r.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.mosart_INST.rh0.YYYY-MM-DD-SSSS.nc
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.mosart_INST.h0.2019-09.nc
The files represent the CESM Earth system at the time in the file name, but, of course, could be used to start hindcasts on different dates (but those should be in the same season). Ensemble hindcasts (and assimilations) can be started by using more than one INST set.
NOTE: the CAM initial files (.i.) contain the CAM6 "model state" (see DART) which has been updated by the assimilation. The CAM6 restart file (.r.) has not been updated, so it is slightly inconsistent with the initial file. This may affect how you choose to start your hindcast; as a "startup" or "branch" run.
The restart files ending in .gz need to be decompressed with gunzip.
These do not include rpointer files, which are easily made by CESM.
In addition to evaluating CAM6's performance directly against the observations, it is often useful to examine the "increments" added to the model state variables by the assimilation process, also called "innovations". These (ensemble mean) increments can be calculated by taking the difference of 2 files (with the same date) extracted from " External System Processing (DART) Files : espYYYYMM":
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.e.cam_preassim_mean.YYYY-MM.tar
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.i.cam_output_mean.YYYY-MM.tar
"Preassim" is replaced by "forecast" in the later years. They are equivalent for the ensemble means.
"output - mean" will show how CAM6 was corrected to be closer to the observations as a function of location, time, and variable. These snapshots and time series of these differences can reveal important bias patterns. These mean increments are available every 6 hours for 9 years.
If you're interested in the increments applied to each ensemble member, those can be calculated on a weekly basis, but require a bit more work. The full ensemble of the forecast files is in " CESM Atmosphere (CAM6.0) Files : atm YYYYMM : "
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.cam_allinst.e.forecast.YYYY-MM-DD-SSSSS.tar
Note that only the files labeled "forecast" should be used for this purpose, because the "preassim" files contain fields that have been "inflated", which makes them non-physical. It's possible to deflate them, but that's beyond the scope of this document. The corresponding ensemble of "output" files is in a restart file set's cam_ INST.i. files in Initial Conditions for Hindcasts . These files can't be differenced directly, due to the differing contents, so the fields of interest need to be extracted, or both files read by some program and the contents differenced.
The ensemble spread ("sd" = standard deviation) resulting from these assimilations can be directly seen in the files in " External System Processing (DART) Files : esp YYYYMM :"
- f.e21.FHIST_BGC.f09_025.CAM6assim.011.dart.i.cam_output_sd.YYYY-MM.tar
This spread is the result of the balance between the model's 6 hour error growth and the constraining influence of the available observations. Small spread is the result of some combination of large numbers of (good) observations and high fidelity in the model physics and dynamics.
- Jeff Anderson notes:
- The ensemble means every 6 hours basically sample the climatological distribution as a function of season. The 80 member ensembles once a week add in information about the local model attractor structure given the observational constraints. Together, they describe both large-scale and small-scale characteristics of states related to the model attractor when it is constrained to be close to observations. This is the CMIP6 model configuration, so you can also get access to samples from the model's climatological attractor, unconstrained by atmospheric observations, from an identical model for any of the CMIP6 forcing experiments.
- ________ U. Utah (for forcing CLM)
- Yonghan Choi, KPRI (Arctic forecasting and DA)
- Bill Lipscomb, NCAR (forcing land ice model)
- Xueli Hou? (forcing CLM)
- Faycal Iamraoui (PostDoc at Harvard University)
- ESP; SMYLE? (first phase finished before more years could be available)
- Chris Riedel? (forcing CICE assimilations)