I have been debugging on and off for several weeks what appears to be a bug (or my stupidity) in nesting in a large domain spherical model run with nesting. I have a 1/7 degree idealized North Atlantic domain with a 1/21 degree western boundary region nested entirely within it. The northern, southern, and western boundaries of the refined grid are masked; only the eastern boundary of the refined grid is open to the ocean. The equation of state is linear, and set so rho=S-T. The depth is everywhere uniform at 4000 meters.
When I run the model with no forcing of any kind, and uniform density, the model blows up in less than a day when two way nesting is enabled (coarse step 123). When either grid is run alone, or with only one way coupling, the model is stable. These results are not sensitive to the magnitude of horizontal or vertical mixing, and dramatic reductions of time step do not reduce the model time needed for the model to go unstable (i.e. halving dt roughly doubles the number of timesteps needed for the model to go unstable). The instability is the same when the model is compiled with gfortran or ifort.
When the model is run with two nested grids with Cartesian coordinates and uniform grid spacing (with a factor of three refinement between coarse and fine), the model is stable.
I have checked all the grid metrics, and they appear to be correct. The coarse grid was made with John Warner’s mat2roms_mw.m, and the fine grid extracted with it by a stock version of coarse2fine.m. The error persists with model revision 782, downloaded Sept 8 2015.
The earliest problematic symptoms are shown when NESTING_DEBUG is enabled. Along the Western Boundary, the ratio of fine to coarse grid transport on the eastern boundary is 9.99276761E-01. When the model is run with the nested rectangular grids, the ratio is always 1. This transport mismatch is reflected in a visible wave in free surface moving across the coarse grid away from the location of the eastern edge of the fine grid in the first few timesteps (the high spatial-frequency noise in the nesting region appears to be truncation error in fine2coarse() and its feedback to the fine grid). This wave, shown below, is not present in the Cartesian grids or with one way coupling (the noise is present in all model runs). The picture is from the 3 rd coarse time step, the 9 th fine time step.

Over time, a persistent zeta anomaly develops along the grid boundary that is strongest in the north, where the curvature of the grid is also strongest. These anomalies then feedback to the other fields. (However, it is puzzling why the curvature of the earth should effect the calculation of DU_avg2, since for this North/South and East/West oriented grid, the zonal grid spacing dy (and thus pn) does not vary meridionally. However, due to the way that coarse2fine.m interpolates in space, pn is not exactly uniform meridionally.)
I am at a loss as to how to debug this issue further. I have looked at put_refine2d until my head hurts, and the code that computes DU_avg2. I would appreciate any suggestions.
I have included the grid files, contact file and a zip files with the .in, .h, and ana_* files needed to run the simplest case. They will run against a clean version of the ROMS code. I have included a build.bash file for a mac running gfortran, but there is nothing custom in it besides the usual library and compiler changes, and the change of the model name to JPBASIN.
I would very much appreciate any help.
Jamie