Running a global climate model on a brand new NERSC supercomputer, Cori.
When I arrived at UCI this summer, I began a research project that required running CESM on Edison. I’d never used a supercomputer before, and I’d never run a global climate model (although I had a fair amount of coding experience). In a matter of about six months, I’ve learned quite a bit about how both those things work and about how exciting it is to run climate experiments. I’ve also learned (and am still learning) that there are a million small mistakes and bugs that can occur along the way. The latest adventure? Running CESM on NERSC’s new supercomputer, Cori.
This changeover should have been pretty seamless, particularly because the folks at NERSC made a working version of CESM available on Cori (at /project/projectdirs/ccsm1/collections/cesm1_2_2/). My current research uses a slightly older version of CESM, but I copied over the machine files from the above directory, and with a few changes (really just one) the model built successfully! The key on my end was to remove a line of code in config_machines.xml that set CESMSCRATCHROOT (due to error message: set_machine: invalid id CESMSCRATCHROOT in machine corip1).
Since I managed to get the model building, the next step was to actually try a run. I ran into an error at this point that stumped me for days, but turned out to be a simple typo on my end. If you set ‘REST_N’ in env_run.xml to zero (hopefully by accident), the run will submit but will fail with a seg fault in the cesm.log file; seq_timemgr_mod.F90 will issue a call to SHR_SYS_ABORT and everything will stop. Luckily, it’s an easy fix – just set REST_N to a number greater than zero.
Current computer issues being confronted include an apparent inability to set the desired repo account to use when submitting a run on Cori, and long wait times in the queue. That being said, Cori has so far been just as fast as Edison with its runs and its stability seems to have increased in the last few weeks, not to mention the incredible support from NERSC as they try to resolve these problems.