I've just updated from 3.6 and 3.7, and found a bug in the quicksave code. I had quicksave files turned off in roms.in, but a few of my 512 processors were getting caught in the quicksave code anyway. Interestingly enough, this was preventing the next output files (averages or restart, depending on whether the AVERAGES option was on) from successfully leaving NetCDF definition mode, and the model crashed on an MPI broadcast call from netcdf_enddef.
The problem is with the variable LwrtQCK. It is allocated in mod_scalars but never initialised. If LdefQCK is true (i.e. the user has quicksave turned on), LwrtQCK is given a value in the IF block from lines 168-223 of output.F. But if LdefQCK is false, LwrtQCK is never given a value. Most of my processors give it an un-initialised value of false, but a few had it as true (not sure how un-initialised values work for booleans - might be platform dependent - I know it's just something random for floats). And so in the following "IF LwrtQCK(ng)" block (lines 229-244 of output.F), a few of the processors were entering that block, trying to set up a quicksave file on their own. Then clearly something was overflowing and crashing. (Or if I set NQCK=0 in roms.in, rather than just having all Qout(*)=F, it was crashing on a divide by zero when it tried to take modulo NQCK).
I fixed this by setting LwrtQCK(ng)=.FALSE. at the beginning of the quicksave code in output.F. If LdefQCK is true, this value is still updated as needed.
Quicksave bug
-
- Posts: 54
- Joined: Tue Jun 28, 2016 2:08 pm
- Location: CCRC (UNSW), ARCCSS, ACE CRC
- arango
- Site Admin
- Posts: 1368
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: Quicksave bug
Yes, thank you. It is always a good strategy to initialize all the switches and variables. I updated the repository. For more information check the following trac ticket.