History file archiving intervals: bug in output.F?

Bug reports, work arounds and fixes

Moderators: arango, robertson

Post Reply
Message
Author
kearneyb10k
Posts: 14
Joined: Tue Oct 16, 2018 4:26 am
Location: University of Washington, JISAO

History file archiving intervals: bug in output.F?

#1 Unread post by kearneyb10k »

I'm trying to diagnose a bug related to file archiving time periods. In my particular case this is manifesting as an error in the history file counts, but I think the same issue could possibly arise in all the multi-file-allowing output files (history, average, etc.).

In short, my application appears to be miscalculating when to create new files, such that too many time steps are placed in the first file. This then causes a crash under certain restart conditions because the restarted model expects a different number of output files than had been created by the initial run.

I have tracked down *what* is going wrong, but not quite the *why* of it.

In ROMS/Nonlinear/output.F:

I've marked executed lines with a + as it applies to the first time this code is called (iic = 1, ntstart = 1, nrrec = 0, nHIS=1008, ndefHIS=10080). The "! KK:" comments indicate the values on this first call in my application.

Code: Select all

  !
  !  Create output history NetCDF file or prepare existing file to
  !  append new data to it.  Also,  notice that it is possible to
  !  create several files during a single model run.
  !

+     IF (LdefHIS(ng)) THEN
+       IF (ndefHIS(ng).gt.0) THEN
+          IF (idefHIS(ng).lt.0) THEN
+            idefHIS(ng)=((ntstart(ng)-1)/ndefHIS(ng))*ndefHIS(ng)          ! KK: idefHIS(ng) = 0
+            IF (idefHIS(ng).lt.iic(ng)-1) THEN                             
-              idefHIS(ng)=idefHIS(ng)+ndefHIS(ng)                         
+            END IF
+          END IF
+          IF ((nrrec(ng).ne.0).and.(iic(ng).eq.ntstart(ng))) THEN
-            IF ((iic(ng)-1).eq.idefHIS(ng)) THEN
-              HIS(ng)%load=0                  ! restart, reset counter
-              Ldefine=.FALSE.                 ! finished file, delay
-            ELSE                              ! creation of next file
-              Ldefine=.TRUE.
-              NewFile=.FALSE.                 ! unfinished file, inquire
-            END IF                            ! content for appending
-            idefHIS(ng)=idefHIS(ng)+nHIS(ng)  ! restart offset
+          ELSE IF ((iic(ng)-1).eq.idefHIS(ng)) THEN
+            idefHIS(ng)=idefHIS(ng)+ndefHIS(ng)                            ! KK: idefHIS(ng) = 10080
+            IF (nHIS(ng).ne.ndefHIS(ng).and.iic(ng).eq.ntstart(ng)) THEN
+              idefHIS(ng)=idefHIS(ng)+nHIS(ng)  ! multiple record offset   ! KK: idefHIS(ng) = 11088
+            END IF
+            Ldefine=.TRUE.
+            NewFile=.TRUE.
+          ELSE
-            Ldefine=.FALSE.
+          END IF
...
+          IF (Ldefine) THEN
...
-            ifile=(iic(ng)-1)/ndefHIS(ng)+1
...
Based on this logic, it's not going to create a second file until step 11088 (ndefHIS+nHIS). Which is indeed what happens, leading to a file with 11 output time steps rather than the 10 I specified.

However, this does not match when the value of ifile increases; that occurs, as expected, between steps 10080 and 10081 (ndefHIS and ndefHIS+1). The file-creation index and ifile-switching index continue to be offset from each other by nHIS for the rest of the simulation. So if I stop and then restart a simulation between steps 10080 and 11088 (or between any ndefHIS*x+[0 nHIS] interval), the restart crashes (see error below).

It seems the quick fix here would be to comment out the "! multiple record offset" line, which seems to be the culprit in incorrectly shifting the first idefHIS value. But I assume this line was there for a reason, and I might be breaking something. Can anyone clarify whether that reason applies to a forward nonlinear model simulation? This seems too obvious an error for me to be the first one to encounter it... but then the logic in this section seems much more complicated than I would expect, so perhaps I'm misunderstanding something?


In case it helps, I also added a bunch of extra print statements to my standard output to double-check what was going on. Here are the last few time steps of the initial run, stopping as I requested after step 10944:

Code: Select all

     10943 1990-03-31 23:50:00.00  2.368000E-03  2.111055E+04  2.111055E+04  1.062594E+16
                     (174,025,30)  8.967006E-02  0.000000E+00  5.323323E+00  5.439568E+00
 LdefHIS = T idefHIS =    11088 iic     =    10944 ndefHIS =    10080 nrrec   =  0 ntstart =        1 Ldefine = F ifile   =        2
     10944 1990-04-01 00:00:00.00  2.343951E-03  2.111048E+04  2.111048E+04  1.062592E+16
                     (174,025,30)  6.630270E-02  0.000000E+00  4.540270E+00  5.565688E+00
 LdefHIS = T idefHIS =    11088 iic     =    10945 ndefHIS =    10080 nrrec   =  0 ntstart =        1 Ldefine = F ifile   =        2
 

And here is the printout (and error) seen on restart:

Code: Select all

 TIME-STEP YYYY-MM-DD hh:mm:ss.ss  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
                     C => (i,j,k)       Cu            Cv            Cw         Max Speed

     10944 1990-04-01 00:00:00.00  2.343942E-03  2.111048E+04  2.111048E+04  1.062592E+16
                     (170,025,30)  1.730109E-01  2.271564E-01  0.000000E+00  5.565688E+00
 LdefHIS = T idefHIS =    21168 iic     =    10945 ndefHIS =    10080 nrrec   = -1 ntstart =    10945 Ldefine = T ifile   =        2
   NewFile = F
      DEF_HIS     - inquiring history      file, Grid 01: bgcdebug_phys/Out/bgcdebug_phys_his_00002.nc
 Found Error: 02   Line: 8458     Source: ROMS/Modules/mod_netcdf.F, netcdf_open

 NETCDF_OPEN - unable to open existing NetCDF file:
               bgcdebug_phys/Out/bgcdebug_phys_his_00002.nc
               call from:  ROMS/Utility/def_his.F
 No such file or directory
 
You can see the idefHIS values for the same time step are different in the initial run vs the restart, and the ifile value always reflects the latter's logic.

User avatar
arango
Site Admin
Posts: 1368
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: History file archiving intervals: bug in output.F?

#2 Unread post by arango »

Thank you for the detailed information. Yes, this logic is tricky. First, what version of ROMSare you using, We have corrected issues for this logic in the past. It is very subtle and requires lots of time in the TotalView debugger. We need to be sure that you are using the latest version of ROMS files def_avg.F, def_his.F, and def_qck.F. If you are, then something in your configuration is breaking the logic.

kearneyb10k
Posts: 14
Joined: Tue Oct 16, 2018 4:26 am
Location: University of Washington, JISAO

Re: History file archiving intervals: bug in output.F?

#3 Unread post by kearneyb10k »

I'm using a fork of Kate Hedstrom's ice-enhanced version of the code, with some additional biological model development layered on top (see https://github.com/beringnpz/roms/tree/main). I try to keep up with the latest major updates; I regularly pull in changes from Kate's upstream repo and peruse major updates in the trunk version that haven't yet made it there. But of course a lot of differences remain, especially related to some of the most recent overhauls.

However, I did double-check that the relevant parts of output.F are the same between my version and the main-trunk clone I keep locally. I also scanned all recent tickets for anything that touched output.F and def_his and its brethren, and I didn't see anything that seemed specifically related to how these file counters are calculated. I will note that my version is missing the safeguard error message you mention in ticket 865, which does seem to refer to a similar error as I'm seeing, but from what I can tell that would just lead to ROMS throwing a slightly more informative error without actually getting around the problem.

User avatar
kate
Posts: 4091
Joined: Wed Jul 02, 2003 5:29 pm
Location: CFOS/UAF, USA

Re: History file archiving intervals: bug in output.F?

#4 Unread post by kate »

Note that my repo hasn't changed in well over a year. I'm not keeping up!

kearneyb10k
Posts: 14
Joined: Tue Oct 16, 2018 4:26 am
Location: University of Washington, JISAO

Re: History file archiving intervals: bug in output.F?

#5 Unread post by kearneyb10k »

Following up, any clues on the logic here? It's primarily this line that is problematic in my case:

Code: Select all

IF (nHIS(ng).ne.ndefHIS(ng).and.iic(ng).eq.ntstart(ng)) THEN
  idefHIS(ng)=idefHIS(ng)+nHIS(ng)  ! multiple record offset
END IF
(and yes, I've confirmed that the output.F logic is identical in my version of the code and the officially supported trunk code).

Does anyone recall what "multiple record offset" means? What sort of situation is it trying to handle?

User avatar
arango
Site Admin
Posts: 1368
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: History file archiving intervals: bug in output.F?

#6 Unread post by arango »

It is difficult to diagnose your problem because you are using a version of ROMS that we don't support, and we are blind to how your version departs from the code we release. In addition, there is not enough information here to diagnose the problem, nor do I have the time for it. It is not only the routine output.F that affects the logic, but there are important parameters in the derived-type structure T_IO that control the behavior of the multi-file option. If I recall correctly, the HIS(ng)%load variable plays an essential role here. It is used in several routines and drivers to get the correct behavior. In particular, inp_decode.F, read_phypar.F, wrt_*.F, and output.F, among others.

The History and Quicksave output NetCDF files are unique because the first file in the series has an additional record for salving the initial conditions. You should have 11 records in your file _0001.nc. Then, ten records in your file _0002.nc. One option that you may have is to create a file per output record. Of course, it will generate many files, which may affect efficiency and processing.

I am afraid that this is the only guidance I can provide you for a version of ROMS. I don't know what it has and what changes have been made compared to the version we distribute, which is outdated. I understand this is frustrating for a user, but it is a reality that it is out of our control. We cannot debug all the different versions of ROMS out there nor request that stagnant versions of ROMS be updated or removed from distribution.

By the way, our group uses the multifile option extensively, and it works for us.

kearneyb10k
Posts: 14
Joined: Tue Oct 16, 2018 4:26 am
Location: University of Washington, JISAO

Re: History file archiving intervals: bug in output.F?

#7 Unread post by kearneyb10k »

Okay, I have now reproduced the issue in the main trunk version of ROMS. I used the bio_toy Fennel application as my starting point, and made only a few minor tweaks to the input to replicate the starting time and archiving interval options from my primary app. Same basic scenario as described earlier: Run 1 ran to Apr 1, 1990 (which falls in the 10th week post-initialization) and finished cleanly. I then restarted Run 2 using the restart file, and it crashed due to not finding the expected history file:

Code: Select all

 NL ROMS/TOMS: started time-stepping: (Grid: 01 TimeSteps: 000000010945 - 000000023184)

  GET_2DFLD_NF90   - surface u-wind component,                            1990-04-01 12:00:00.00
                      (Grid=01, Rec=91, Index=2, File: bio_toy_frc_8990.nc)
                      (Tmin=      32867.5000 Tmax=      33236.5000)   t =      32962.5000
                      (Min =  1.26288188E+00 Max =  1.26288188E+00)   regrid = F
  GET_2DFLD_NF90   - surface v-wind component,                            1990-04-01 12:00:00.00
                      (Grid=01, Rec=91, Index=2, File: bio_toy_frc_8990.nc)
                      (Tmin=      32867.5000 Tmax=      33236.5000)   t =      32962.5000
                      (Min =  2.44336629E+00 Max =  2.44336629E+00)   regrid = F
  GET_2DFLD_NF90   - surface air pressure,                                1990-04-01 12:00:00.00
                      (Grid=01, Rec=91, Index=2, File: bio_toy_frc_8990.nc)
                      (Tmin=      32867.5000 Tmax=      33236.5000)   t =      32962.5000
                      (Min =  1.00800769E+03 Max =  1.00800769E+03)   regrid = F
  GET_2DFLD_NF90   - net solar shortwave radiation flux,                  1990-04-01 12:00:00.00
                      (Grid=01, Rec=91, Index=2, File: bio_toy_frc_8990.nc)
                      (Tmin=      32867.5000 Tmax=      33236.5000)   t =      32962.5000
                      (Min =  5.22951057E-05 Max =  5.22951057E-05)   regrid = F
  GET_2DFLD_NF90   - surface air temperature,                             1990-04-01 12:00:00.00
                      (Grid=01, Rec=91, Index=2, File: bio_toy_frc_8990.nc)
                      (Tmin=      32867.5000 Tmax=      33236.5000)   t =      32962.5000
                      (Min =  1.09922962E+01 Max =  1.09922962E+01)   regrid = F
  GET_2DFLD_NF90   - surface air relative humidity,                       1990-04-01 12:00:00.00
                      (Grid=01, Rec=91, Index=2, File: bio_toy_frc_8990.nc)
                      (Tmin=      32867.5000 Tmax=      33236.5000)   t =      32962.5000
                      (Min =  8.84156189E-01 Max =  8.84156189E-01)   regrid = F

 TIME-STEP YYYY-MM-DD hh:mm:ss.ss  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
                     C => (i,j,k)       Cu            Cv            Cw         Max Speed

     10944 1990-04-01 00:00:00.00  4.557692E-03  1.036783E+03  1.036787E+03  2.764606E+11
                         (1,1,28)  1.400216E-02  1.546656E-02  0.000000E+00  3.144560E-01
  DEF_HIS_NF90     - inquiring history file,           Grid 01: fenneltest_his_0002.nc
 Found Error: 02   Line: 8960     Source: ROMS/Modules/mod_netcdf.F, netcdf_open

 NETCDF_OPEN - unable to open existing NetCDF file:
               fenneltest_his_0002.nc
               call from:  ROMS/Utility/def_his.F, def_his_nf90
               No such file or directory                                                       
 Found Error: 03   Line: 2613     Source: ROMS/Utility/def_his.F, def_his_nf90

 DEF_HIS_NF90 - unable to open history NetCDF file: fenneltest_his_0002.nc
 Found Error: 03   Line: 79       Source: ROMS/Utility/def_his.F
 Found Error: 03   Line: 186      Source: ROMS/Nonlinear/output.F
 Found Error: 03   Line: 525      Source: ROMS/Nonlinear/main3d.F
 Found Error: 03   Line: 298      Source: ROMS/Drivers/nl_roms.h, ROMS_run

Elapsed wall CPU time for each process (seconds):

 Node   #    0 CPU:       0.017
 Total:                   0.017


Here's a quick diff of my input file for Run 1 compared to the default roms_bio_toy_fennel.in:

Code: Select all

<      VARNAME = ../roms_trunk/ROMS/External/varinfo.yaml
---
>      VARNAME = ../External/varinfo.yaml
224,226c224,226
<       NTIMES == 10944
<           DT == 600.0d0
<      NDTFAST == 40
---
>       NTIMES == 1600
>           DT == 540.0d0
>      NDTFAST == 30
231,232c231,232
<   NTIMES_ANA == 1440                               ! analysis interval
<   NTIMES_FCT == 1440                               ! forecast interval
---
>   NTIMES_ANA == 1600                               ! analysis interval
>   NTIMES_FCT == 1600                               ! forecast interval
254c254
<         NRST == 18
---
>         NRST == 80
262,263c262,263
<         NHIS == 1008
<      NDEFHIS == 10080
---
>         NHIS == 80
>      NDEFHIS == 0
267,268c267,268
<         NAVG == 1008
<      NDEFAVG == 10080
---
>         NAVG == 80
>      NDEFAVG == 0
270,271c270,271
<         NDIA == 1008
<      NDEFDIA == 10080
---
>         NDIA == 80
>      NDEFDIA == 0
422,424c422,424
<       DSTART =  32886.0d0                  ! days
<   TIDE_START =  -693962.0d0                ! days
<     TIME_REF =  19000101.0d0               ! yyyymmdd.dd
---
>       DSTART =  0.0d0                      ! days
>   TIDE_START =  0.0d0                      ! days
>     TIME_REF =  20010101.5d0               ! yyyymmdd.dd
964c964
<      ININAME == Data/bio_toy_ini_fennel_19900115.nc
---
>      ININAME == Data/bio_toy_ini_fennel.nc
1052c1052
<      FRCNAME == Data/bio_toy_frc_8990.nc
---
>      FRCNAME == Data/bio_toy_frc.nc
1056,1068c1056,1068
<      DAINAME == fenneltest_dai.nc
<      GSTNAME == fenneltest_gst.nc
<      RSTNAME == fenneltest1_rst.nc
<      HISNAME == fenneltest_his.nc
<      QCKNAME == fenneltest_qck.nc
<      TLMNAME == fenneltest_tlm.nc
<      TLFNAME == fenneltest_tlf.nc
<      ADJNAME == fenneltest_adj.nc
<      AVGNAME == fenneltest_avg.nc
<      HARNAME == fenneltest_har.nc
<      DIANAME == fenneltest_dia.nc
<      STANAME == fenneltest1_sta.nc
<      FLTNAME == fenneltest1_flt.nc
---
>      DAINAME == roms_dai.nc
>      GSTNAME == roms_gst.nc
>      RSTNAME == roms_rst.nc
>      HISNAME == roms_his.nc
>      QCKNAME == roms_qck.nc
>      TLMNAME == roms_tlm.nc
>      TLFNAME == roms_tlf.nc
>      ADJNAME == roms_adj.nc
>      AVGNAME == roms_avg.nc
>      HARNAME == roms_har.nc
>      DIANAME == roms_dia.nc
>      STANAME == roms_sta.nc
>      FLTNAME == roms_flt.nc
And a diff from Run1 to Run 2:

Code: Select all

<       NTIMES == 12240
---
>       NTIMES == 10944
252c252
<        NRREC == -1
---
>        NRREC == 0
964c964
<      ININAME == ./fenneltest1_rst.nc
---
>      ININAME == Data/bio_toy_ini_fennel_19900115.nc
1058c1058
<      RSTNAME == fenneltest2_rst.nc
---
>      RSTNAME == fenneltest1_rst.nc
1067,1068c1067,1068
<      STANAME == fenneltest2_sta.nc
<      FLTNAME == fenneltest2_flt.nc
---
>      STANAME == fenneltest1_sta.nc
>      FLTNAME == fenneltest1_flt.nc
Regarding the tweaked input files, I changed the initialization file to use the same starting date as my app (Jan 15, 1990). I was also having some trouble with the climatological forcing so I altered that file to be non-climatological and run from Dec 1989 - Dec 1990.

I don't see a way to attach files to posts in this forum, but I can provide all inputs necessary to reproduce this issue if needed. Thanks.

User avatar
arango
Site Admin
Posts: 1368
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: History file archiving intervals: bug in output.F?

#8 Unread post by arango »

What will happen if you do a single run without restarting? If ROMS creates the multifile correctly, the issue will be the restart mechanism, which is even more complicated in this case. The issue is no longer in output.F but in get_state.F, def_his.F, and check_multifile.F. I think that the value of NTIMES, NHIS, and NDEFHIS are more crucial because they are related to each other for the restart to do the multifile correctly. I believe that in your case, NTIMES has to be an exact multiple NDEFHIS for you to change the NTIMES to a different value during the restart. Notice that all the MOD operations need to give us zero.

I added this issue to my very long TODO list, but it is a very low priority, and I will have to reproduce it, requiring lots of TotalView debugging. My time is full, and I will probably dedicate some time before the summer.

User avatar
wilkin
Posts: 922
Joined: Mon Apr 28, 2003 5:44 pm
Location: Rutgers University
Contact:

Re: History file archiving intervals: bug in output.F?

#9 Unread post by wilkin »

It is hard to tell from those diffs, but I suspect you might have confused ROMS in the restart process.

It looks like you are asking ROMS to write 10 records to a history before refreshing a new file, but you are restarting only part way through that sequence. ROMS might be looking for file 0002 to have been created but it's not there. So, your short run debug strategy is throwing errors and if you just let it run from scratch you;'d be fine. Hard to know. Try just making single record history files and not be too clever.

That said, you have some odd choices, like ...

TIDE_START = -693962
What does it mean to shift the phase of the tidal harmonics by 1989 days?

DSTART = 32886
Why shift the time counter by 90 years?
John Wilkin: DMCS Rutgers University
71 Dudley Rd, New Brunswick, NJ 08901-8521, USA. ph: 609-630-0559 jwilkin@rutgers.edu

User avatar
arango
Site Admin
Posts: 1368
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: History file archiving intervals: bug in output.F?

#10 Unread post by arango »

I added information in :arrow: wikiROMS to be used as a guideline when designing the multi-file strategy for a ROMS application. Unfortunately, this is all I can do now in my free time.

kearneyb10k
Posts: 14
Joined: Tue Oct 16, 2018 4:26 am
Location: University of Washington, JISAO

Re: History file archiving intervals: bug in output.F?

#11 Unread post by kearneyb10k »

I understand that this isn't a priority for the main trunk code, so I'll just go ahead and implement the fix on my end for now. But I'll add a few responses to the queries raised in case it helps anyone else down the road.

First, it may help to clarify my context somewhat. I am running many of my simulations using the "extra-resources" queue on a shared cluster; jobs in that queue can use nodes that are currently inactive, with the caveat that the job will be canceled and returned to the queue if the owner of the nodes starts using them. I have written my own python module* to handle the input setup necessary to do this robustly (checking where the simulation was when it was killed, shifting the ININAME variable to the appropriate restart file, adjusting NTIMES to reach the proper end date, etc.). I have successfully used this method for many years with an older version of ROMS, but the output.F logic has changed considerably since then, and we're now hitting this restart issue. The key point here is that I cannot control exactly where the simulation might be when it requires a restart.

*The romscom module is still a WIP, but I welcome others to play around with it if they like. I've been meaning to update the example to use one from the standard ROMS test cases (and now that I have the trunk version of bio_toy up and running I guess I can add that to my to-do list...)

(arango)
What will happen if you do a single run without restarting?
Well, that would be ideal. And yes, it runs fine; the bug is due to the mismatch between the ifile and idefhis logic that only occurs when restarting.

(arango)
I believe that in your case, NTIMES has to be an exact multiple NDEFHIS for you to change the NTIMES to a different value during the restart
I'm not sure I follow this logic. I understand that NRST needs to be a factor of NHIS[/AVG/QCK] (and it is), and that NHIS should be a factor of NDEFHIS (again, it is), and that the initialization time should align with NRST (and b/c we are starting from a restart file that is always the case). But looking at the code, NTIMES doesn't appear to factor into the logic of whether to increment the file counter or switching the Ldefine flag or any of that. Stopping mid-NHIS[AVG/QCK] can result in small artifacts in that single archiving record (especially in average files) due to the non-perfect-restart nature of ROMS, but that's a side effect I'm aware of and willing to accept. It shouldn't affect restart ability.

(wilkin)
Try just making single record history files and not be too clever.
This could be Plan B, but I'd rather not have to fall back on that (and surely, trying to take advantage of a standard ROMS option isn't particularly "clever", is it?) We're running many 30-100-year sims, and the extra run time and storage created by a ten-fold increase in writing (across HIS, AVG, and DIAG files) is noticeable, and management of that many files is similarly undesirable.


(wilkin)
DSTART = 32886
Why shift the time counter by 90 years?
We use a fixed TIME_REF for all of our simulations (1900-01-01 00:00:00). That decision was before my time, but it seems reasonable and is the same convention used by a lot of CF standard datasets. And based on other input files I've browsed, it seems to be common practice in the ROMS community? My original example was a hindcast with forcing data starting in 1990, hence the 90-year DSTART shift.

(wilkin)
TIDE_START = -693962
What does it mean to shift the phase of the tidal harmonics by 1989 days?
No idea. I'm not a tide person, and I inherited that parameter (and the tidal forcing dataset) from previous sims. I'll look into it to make sure it's reasonable for our current applications, but meanwhile, that's irrelevant to the problem at hand.

User avatar
arango
Site Admin
Posts: 1368
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: History file archiving intervals: bug in output.F?

#12 Unread post by arango »

Okay, you don't understand how the multifile option works in ROMS. You CANNOT whatsoever change the values of NTIMES or DSTART to run correctly on your supercomputer with a queue time limits that don't finish your simulation. Then, you have to restart. I suspected this was your problem, so I wrote instructions in wikiROMS for guidelines.

The multifile option depends on the timestep counter iic. Therefore, the MOD Fortran operations depend on NTIMES, since it defines a unique sequence of multi-files. You need to think about it and decide for how long you want to run your application and stick to that value of NTIMES (say, six months, one year, or ten years). That's what I have done in my long triple-nested applications. I ran for three years with multiple files for three nested grids, but it had fixed NTIMES values for each nested grid.

If you decide to run longer than you estimated, You need to start a new series! No other way about it.

Anyway, you are free to follow or not follow our recommendations.

Dan_chan
Posts: 38
Joined: Wed Apr 17, 2019 2:37 am
Location: IAP, UCAS

Re: History file archiving intervals: bug in output.F?

#13 Unread post by Dan_chan »

I am sorry to post my error here. I met similar Error, inquiring average file

Code: Select all

TIME-STEP YYYY-MM-DD hh:mm:ss.ss  KINETIC_ENRG   POTEN_ENRG    TOTAL_ENRG    NET_VOLUME
                     C => (i,j,k)       Cu            Cv            Cw         Max Speed

    394560 1993-01-01 00:00:00.00  4.052343E-03  2.939434E+02  2.939475E+02  2.416712E+13
                     (127,076,31)  1.508281E-02  3.146365E-02  0.000000E+00  6.601196E-01
  DEF_AVG_NF90     - inquiring average file,           Grid 01: /data/zhengf/chenxd/myroms/output/baroclinic_tide/test2_avg_5km_0037.nc
 Found Error: 02   Line: 8961     Source: ROMS/Modules/mod_netcdf.F, netcdf_open

 NETCDF_OPEN - unable to open existing NetCDF file:
               /data/zhengf/chenxd/myroms/output/baroclinic_tide/test2_avg_5km_0037.nc
               call from:  ROMS/Utility/def_avg.F, def_avg_nf90
               No such file or directory                                                      
 Found Error: 03   Line: 2113     Source: ROMS/Utility/def_avg.F, def_avg_nf90

 DEF_AVG_NF90 - unable to open averages NetCDF file: /data/zhengf/chenxd/myroms/output/baroclinic_tide/test2_avg_5km_0037.nc
 Found Error: 03   Line: 85       Source: ROMS/Utility/def_avg.F
 Found Error: 03   Line: 393      Source: ROMS/Nonlinear/output.F
 Found Error: 03   Line: 526      Source: ROMS/Nonlinear/main3d.F
 Found Error: 03   Line: 299      Source: ROMS/Drivers/nl_roms.h, ROMS_run
I use ROMS/TOMS version 4.1, and try to restart using rst.nc

Code: Select all

       NRREC == -1
    LDEFOUT  == T
      DSTART =  0.0d0                      ! days
  TIDE_START =  0.0d0                      ! days
    TIME_REF =  19900101.0d0                      ! yyyymmdd.dd
The actual and latest ocean_time in rst.nc is 19930101, if I change DSTART==1096.0d0, it would to well work and write avg.nc from avg_0001.nc.

BUT, I am trying to add TIDE FORCING now, TIDE_START = 0.0d0 or TIDE_START==1096.0d0, both let uv bad. I read #97 in [TPXO to ROMS EXPRESS (tide extraction software) - Ocean Modeling Discussion (myroms.org)](viewtopic.php?p=25164&hilit=tide#p25164) :
The safest approach to start, to avoid any confusion with these time coordinates, would be to use a TIME_REF that matches t0 in the TPXO function, and then TIDE_START and DSTART both = 0. Any phase mismatch then would point to something other than these switches. _wilkin
So, if it need to change DSTART==1096.0d0 to DSTART==0.0d0, how to solve it?

Post Reply