Different upwelling results using different compilers
Different upwelling results using different compilers
I have been using the ocean_upwelling.in file in order to test an issue I was having with a more complicated model, in which switching between different fortran compilers is significantly impacting my results. For the upwelling case I am using ifort/mpiifort with one build and gfortran/mpif90 with the other (both run with the FFLAG -O2 because -O3 causes a segmentation fault for the ifort build).
When running the resulting oceanM files the salinity data is consistent, but the temperature difference is on the order of 10^-6 after only one timestep! This is shocking, as I would expect the difference to build from an initial difference near the precision of the compiling languages, but that does not seem to be the case (I guess because a lot of calculations are performed before even one step is taken?).
If anyone has any ideas/questions please let me know!
When running the resulting oceanM files the salinity data is consistent, but the temperature difference is on the order of 10^-6 after only one timestep! This is shocking, as I would expect the difference to build from an initial difference near the precision of the compiling languages, but that does not seem to be the case (I guess because a lot of calculations are performed before even one step is taken?).
If anyone has any ideas/questions please let me know!
Re: Different upwelling results using different compilers
My best guess is that there is a literal constant like 0.1 that doesn't have a _r8 to force it into being double precision. You can test this by using a compiler flag like "-r8" (making sure of course to see exactly what it does). You might also investigate exactly what "-O2" means to your compilers. Does this happen at "-O0"? If those agree, does one match the "-O2" numbers?
Re: Different upwelling results using different compilers
I'd say that Kate's best guess is correct since a 32 bit representation of a number is only accurate to about 7 decimal places.
I'm curious why you're getting a segmentation fault with -O3 on ifort; might want to try compiling with -traceback to see where the code breaks.
I'm curious why you're getting a segmentation fault with -O3 on ifort; might want to try compiling with -traceback to see where the code breaks.
Re: Different upwelling results using different compilers
I tried running the ifort build with -r8 and I tried running the gfortran build with -fdefault-real-8 and the results were the same as before (the results for either did not change as a result of adding the flags).
As for what I said about the segmentation fault using -O3 with ifort. That was occurring while trying to run a different set of inputs (for Chesapeake Bay) and does not occur for the upwelling case, so that would seem to be a separate issue.
To be thorough I checked the affect of changing -O3 to -O2 for the different builds. This also had no effect.
As for what I said about the segmentation fault using -O3 with ifort. That was occurring while trying to run a different set of inputs (for Chesapeake Bay) and does not occur for the upwelling case, so that would seem to be a separate issue.
To be thorough I checked the affect of changing -O3 to -O2 for the different builds. This also had no effect.
Re: Different upwelling results using different compilers
Oo, I forgot to mention that the ifort build will run for 001x016 (I'm running on 16 cpus) but will blow-up if I try to partition the 16 differently. I'm not sure if this gives any clues...
- arango
- Site Admin
- Posts: 1368
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: Different upwelling results using different compilers
But why do you want to run the UPWELLING test case on 16 processors? As distributed, this application has only 41x80x16 points. It is an overkill to run this on 16 processors. If you want to play with a lot of CPUs, use the BENCHMARK application: 512x64x30, 1024x128x30, or 2048x512x30. Notice that in this application the horizontal dimensions are powers of two so you can distribute and balance equally all the tile partitions.
We keep getting this type of parallel overkill in this forum. There seems to be a misunderstanding about ROMS coarse grain parallelization, ROMS spatial discretization, and tile size. If you look correctly, the ROMS writes this information to standard output:
We keep getting this type of parallel overkill in this forum. There seems to be a misunderstanding about ROMS coarse grain parallelization, ROMS spatial discretization, and tile size. If you look correctly, the ROMS writes this information to standard output:
Code: Select all
Tile partition information for Grid 01: 0041x0080x0016 tiling: 002x002
tile Istr Iend Jstr Jend Npts
0 1 21 1 40 13440
1 22 41 1 40 12800
2 1 21 41 80 13440
3 22 41 41 80 12800
Re: Different upwelling results using different compilers
My mistake! I am new to ROMS and had chosen to use UPWELLING due to the boundary conditions and I wasn't aware that I should choose the number of cpus in a way that corresponds to the grid dimensions.
Thank you for the response and I will play around with the benchmark tests.
Thank you for the response and I will play around with the benchmark tests.
Re: Different upwelling results using different compilers
With regards to choosing the number of processors to use:
You always want to make sure that the memory requirements for the job fit within the physical memory of the system you are using.
On a stand alone machine you might as well compile the code under OMP (shared memory model); assuming of course that the computer you're using has enough memory for the job. You should not specify more cores than exist on the stand alone system. If you don't have enough physical memory on a single computer than you will have to run the job on a cluster and use MPI.
When compiling under MPI to run the on a cluster using too many cores (thereby increasing the number of nodes required) can actually decrease model execution time due to the overhead required to pass information from node to node. For very large jobs you want to make sure that you select enough cores so that the amount of memory required per core multiplied by the number of cores per node does not exceed the physical memory available on each node.
If you are compiling under MPI on a stand alone machine with multiple cores specifying more cores than are physically available on the machine can also lead to problems, this should be avoided.
I'm not sure exactly how ROMS stores the arrays but IF they are mostly held in common blocks then the linux command:
size -d oceanM
should provide an estimate of the amount of memory required per MPI task.
You always want to make sure that the memory requirements for the job fit within the physical memory of the system you are using.
On a stand alone machine you might as well compile the code under OMP (shared memory model); assuming of course that the computer you're using has enough memory for the job. You should not specify more cores than exist on the stand alone system. If you don't have enough physical memory on a single computer than you will have to run the job on a cluster and use MPI.
When compiling under MPI to run the on a cluster using too many cores (thereby increasing the number of nodes required) can actually decrease model execution time due to the overhead required to pass information from node to node. For very large jobs you want to make sure that you select enough cores so that the amount of memory required per core multiplied by the number of cores per node does not exceed the physical memory available on each node.
If you are compiling under MPI on a stand alone machine with multiple cores specifying more cores than are physically available on the machine can also lead to problems, this should be avoided.
I'm not sure exactly how ROMS stores the arrays but IF they are mostly held in common blocks then the linux command:
size -d oceanM
should provide an estimate of the amount of memory required per MPI task.
Re: Different upwelling results using different compilers
The myroms.org code stores nothing in common blocks. The large arrays are dynamically allocated.I'm not sure exactly how ROMS stores the arrays but IF they are mostly held in common blocks...