4dvar run hangs using internal lapack module

General scientific issues regarding ROMS

Moderators: arango, robertson

Post Reply
Message
Author
stef
Posts: 196
Joined: Tue Mar 13, 2007 6:38 pm
Location: Independent researcher
Contact:

4dvar run hangs using internal lapack module

#1 Unread post by stef »

I have a toy test case using RBL4DVAR & RPCG with 1 outer loop and 7 inner loops. The run hangs at inner loop 6 at the computation of the Lanczos vectors.

This occurs when using the 'develop' branch at git commit b7a47408ba22 ([1]), which uses the internal lapack routines in lapack_mod.F ([2]).

When I uncomment the import in rpcg_lanczos.F ([3]) and link against lapack/scalapack, the run completes fine.

The output of the lanczos vector data after inner loop 6 is identical using either library:

Code: Select all

 (001,006): Lanczos vectors - cg_delta, cg_beta, zwork:

      001     2.02339198E+02      0.00000000E+00      4.33773431E+00
      002     3.19486738E+02      1.58615485E+02     -4.35794193E+00
      003     1.55982729E+02      1.18894086E+02      5.92353177E+00
      004     7.02759117E+01      6.48835208E+01     -6.25483043E+00
      005     1.68914552E+01      1.17606909E+01      4.69559661E+00
      006     8.48813486E+00      3.22482999E+00     -1.78410970E+00
      007     1.08467152E+01      4.66695559E+00      0.00000000E+00
      008                         0.00000000E+00
It's not a huge problem for me at the moment, because it seems to work using lapack/scalapack.

Am I making a configuration mistake, or is this a bug? Let me know if/how I should provide further information. I don't understand much of the linear algebra numeric stuff yet, still focusing on the basics of assimilation.

I'm using linux with

blas 3.12.0-3
cblas 3.12.0-3
glibc 2.38-7
gcc-fortran 13.2.1-3
lapack 3.12.0-3
scalapack-v2.2.0.tar.gz

Thanks for your help!

References:

[1] git commit b7a47408ba22:

Code: Select all

commit b7a47408ba224a4703d9122c33995eee3c5ed06c (origin/develop)
Author: Hernan G. Arango <arango@marine.rutgers.edu>
Date:   Sat Feb 24 21:57:43 2024 -0500

    src:trac:965 (#28)
    
    https://www.myroms.org/projects/src/ticket/965

[2] https://www.myroms.org/projects/src/ticket/959


[3]

--- a/ROMS/Utility/rpcg_lanczos.F
+++ b/ROMS/Utility/rpcg_lanczos.F
@@ -46,7 +46,7 @@
# ifdef DISTRIBUTE
USE distribute_mod, ONLY : mp_bcastf, mp_bcasti, mp_bcastl
# endif
-! USE lapack_mod, ONLY : DSTEQR
+ USE lapack_mod, ONLY : DSTEQR
USE strings_mod, ONLY : FoundError
!
implicit none
@@ -975,7 +975,7 @@
! eigenvalues of the tridiagonal matrix. If applicable, the
! eigenpairs are computed by master thread only.
!
- CALL dsteqr ('I', innLoop-1, cg_Ritz(1,outLoop), zwork(1,1),&
+ CALL DSTEQR ('I', innLoop-1, cg_Ritz(1,outLoop), zwork(1,1),&
& zgv, Ninner, work, info)
IF (info.ne.0) THEN
WRITE (stdout,*) ' RPCG_LANCZOS - Error in DSTEQR:', &

stef
Posts: 196
Joined: Tue Mar 13, 2007 6:38 pm
Location: Independent researcher
Contact:

Re: 4dvar run hangs using internal lapack module

#2 Unread post by stef »

I should add that in the successful run, the next lines are

Code: Select all

 (001,007): New Ritz eigenvalues and their accuracy,  RitzMaxErr =  1.00000E-01

      001   3.6293220E+00   3.6446995E-03  converged     (Good=001)
      002   8.5684099E+00   9.1391967E-03  converged     (Good=002)
      003   2.1050775E+01   2.1416478E-03  converged     (Good=003)
      004   8.2159724E+01   5.3967321E-05  converged     (Good=004)
      005   1.9458954E+02   4.4177871E-06  converged     (Good=005)
      006   4.6346639E+02   1.0041930E-07  converged     (Good=006)

... (output omitted)
Hope this helps.

User avatar
arango
Site Admin
Posts: 1368
Joined: Wed Feb 26, 2003 4:41 pm
Location: DMCS, Rutgers University
Contact:

Re: 4dvar run hangs using internal lapack module

#3 Unread post by arango »

You are using a very new version of gfortran. This seems like a compiler bug, which I will not be surprised by. If you check the :arrow: trac ticket, you will notice that we are trying to modernize the routines used from this legacy library since NCEP is complaining about the GOTOs.

stef
Posts: 196
Joined: Tue Mar 13, 2007 6:38 pm
Location: Independent researcher
Contact:

Re: 4dvar run hangs using internal lapack module

#4 Unread post by stef »

Good to know, it was not clear from the trac ticket (at least for me) that NCEP was complaining about the GOTO's. Do you mean their code doesn't compile due to the GOTOs?

stef
Posts: 196
Joined: Tue Mar 13, 2007 6:38 pm
Location: Independent researcher
Contact:

Re: 4dvar run hangs using internal lapack module

#5 Unread post by stef »

Oh, I just saw the comment in lapack_mod.F. That explains it, so it seems more a question of code style and easier future maintenance.
! This module includes modernized versions of the selected routines !
! from thw Linear Algebra Package (LAPACK) library, which are used !
! in ROMS 4D-Var algorithms. !
! !
! Adapted from LAPACK library version 2.0 !
! !
! NOTES: !
! !
! - The LAPACK library was written originally in Fortran-77 and has !
! been extensively tested and used in various compilers and !
! applications. We are modernizing a few of the functions used by !
! ROMS at the request of NOAA and NCEP to remove the undesirable !
! GOTOs statements with the modern capabilities of the Fortran !
! standard (1995 and 2003).

Post Reply