I. What CMIP5 data are available to conduct my analysis?
The CMIP5 tree CMIP5 can quickly seem confusing. Moreover, CMIP5 files are divided depending on frequency, model and institute over a given time period. The « aggregation » can be free of this constraint. Aggregations are OpenDAP links/URL that fictitiously concatenate the same variable of CMIP5 files into a single one along the time dimension. We highly recommand to use aggregation for your analyses.
The IPSL uses an ESGF (Earth System Grid Federation) datanode to publish and diffuse CMIP5 data located on CICLAD filesystem to its private reasearch community. The aggregation are generated during the publication process using THREDDS catalogs.
To avoid dealing with the files splitting to begin your analysis, we developped a Python command-line tool (called find_agg.py) allowing you to find and list the availale CMIP5 aggregations at IPSL in a fast and researcher-friendly way. You just have to fill a template with your required variables, experiments and ensembles. The results are two lists:
- a) A list of all available aggregations from a CMIP5 model satisfying all of your requirements.
- b) A list of missing date on CICLAD filesystem that you can complete submitting a SYNDA template.
find_agg.py is entirely detailed and downloadable on Github: http://cmip5-find-agg.readthedocs.org/en/latest/.
II. Unusable aggregations: what about time axis?
Several aggregations apprear unusable or corrupt leading to intrinsic errors and a great loss of information for your analysis. Indeed, CMIP5 project do not enought control time axis conventions as reference date, formatting, etc. Consequently, time management in CMIP5 files is left to the discretion of each data providers although selecting the appropriate time period is a crutial step in climate sutdies.
We are working on a script that will be included in SYNDA downloading workflow to deliver data with proper time axis to user.
This script will check the squareness of time axis:
- Taking the same time reference for all spllitted files of a variable,
- Deleting unecessary times boundaries for instant time axis,
- Rebuilding time axis without timesteps errors and associated boundaries if necessary,
- Controlling format and consistency with filename period.
III. How to do standard jobs on data?
We are currently exploring some ways using:
- The cdb_query Python package,
- Web Processing Service developments at DKRZ.