Time-dependent salinity boundary condition and MPI

Report or discuss software problems and other woes

Moderators: arango, robertson

Post Reply
Message
Author
kawase
Posts: 7
Joined: Tue Aug 31, 2004 4:40 pm
Location: University of Washington

Time-dependent salinity boundary condition and MPI

#1 Unread post by kawase »

I'm running ROMS 2.1 for an embayment with time-dependent river discharge (point sources) and external salinity. Time stamps for the discharge and b.c. data coincide (monthly).

When I run this setup on a cluster with MPI, the river discharge gets correctly interpolated between the past and the future values, but the external salinity seems to get interpolated between the past(?) and zero. This recurs after each boundary condition update so that the boundary salinity value in time looks like sawtooth.

This problem is not seen with single-thread and OpenMP executables running an identical setup on the same computer and others. Our cluster is Xserve G5 with IBM xlf; I've tried both LAM 7.1.2 and OpenMPI 1.1, and have the same problem.

Superficially, it looks as if time indexing got stuck, so that the updated value would always go into one of the two-step records while the other record would never get updated; but I don't understand (1) why this happens to b.c. and not to river discharge, since both are read in with get_ngfld; and (2) why it happens only with MPI.

Thank you for any help and pointers - Mitsuhiro.

kawase
Posts: 7
Joined: Tue Aug 31, 2004 4:40 pm
Location: University of Washington

#2 Unread post by kawase »

Update - Thanks to David Darr, I had an opportunity to compile and run the setup on a Linux/Opteron cluster with a PGI compiler and LAM. The model ran without the problem mentioned.

I'm suspecting this is an IBM compiler problem; next, I will try the Absoft Pro Fortran on our Apple cluster.

If anyone's doing a similar setup (time-dependent tracer boundary condition from netCDF file, Apple cluster, IBM compiler, LAM or OpenMPI) I'd appreciate hearing from you. Even if only "we are seeing no such problem here".

Mitsuhiro

longmtm
Posts: 55
Joined: Tue Dec 13, 2005 7:23 pm
Location: Univ of Maryland Center for Environmental Science

#3 Unread post by longmtm »

We have similar problems using MPI. The sequential run and MPI runs results are different especially at open Boundary. Also results are different when we use different number of partitions using MPI . We did use IBM cpp compiler and mvapich 1.2.6.

We have a 20 node linux cluster, if you don't mind I want to try your runs on my machine and send back results to see whether the same problems occur! Please write back to me at wenlong@hpl.umces.edu

Thanks,

Wen Long

kawase
Posts: 7
Joined: Tue Aug 31, 2004 4:40 pm
Location: University of Washington

#4 Unread post by kawase »

Oh thanks! And sorry for a late reply, it's summer :wink:

I was intrigued to hear that your results were dependent on domain partitioning. In our case, I tried several different domain decompositions, and results did not depend on them - the boundary value problem persisted no matter how I divided up the domain (within the possibilities of the grid).

Some of the files needed to run our model are rather large (~1GB) although not all information is used. Maybe I could extract the relevant portions and send them to you along with our cppdefs.h and code modifications.

Mitsuhiro.

jcwarner
Posts: 1204
Joined: Wed Dec 31, 2003 6:16 pm
Location: USGS, USA

#5 Unread post by jcwarner »

To Wen Long : A note on tiling -
We take extreme measures to ensure that the model behaves identically whether the domain is tiled or not. A simulation tiled with one configuration should yield identical results to a simulation tiled in a different manner (or with only 1 tile). If it does not, then there is something wrong.

IF the mpi and serial simulations differ, especially at open boundaries, then there might be a problem with the way the boundary information is being applied. Rivers can be tricky.

If you can reproduce a problem with a simple example that would be helpful.

-john

kawase
Posts: 7
Joined: Tue Aug 31, 2004 4:40 pm
Location: University of Washington

Resolved

#6 Unread post by kawase »

Forgot to mention that this problem has been resolved. It was indeed an IBM compiler issue and specifically the optimization flag (D'oh!) -O4. Downgrading to -O3 eliminated the problem.

I thank Bohyun Bahng for an alert spot on this. Mitsuhiro

Post Reply