Adjoint blow-up if misfit forcing is too far after initial time

Message

stef · #1 Unread post by **stef** » Thu Feb 08, 2024 9:45 am

I am making simple experiments with 4dvar and sometimes I get blow-ups in the first iteration of the adjoint model (it doesn't actually terminate during the adjoint, but the output is clearly nonsensical).

In one experiment I use:
* ARGO profiles from a single float, with one profile every 3 days
* assimilation window of 3 days
* initial conditions that have drifted away from current observations, e.g. a climatology

If I shift the start of the assimilation window far before the observation time, the adjoint model sometimes explodes in the first iteration. If I shift the window such that the adjoint forcing is close to the inital time, it runs fine. I think the closer the observation is to the start, the more it becomes "3dvar-like", i.e. the adjoint back-propagation becomes shorter and theoretically vanishes if the initial time coincides with the observation time, in which case only 3dvar is left.

Question: Is the blow-up of the adjoint model a misconfiguration, or is this normal when an assimilative model is spun up from a climatology?

In case it is normal, does it sometimes cause problems when a new instrument appears during an assimilation?

Background: I use strong constraint and the deprecated primal I4DVAR for debugging, because I can modify NADJ to write more freqently during the first inner loop. I found this good for debugging, since the growing instabilities make the final adjoint output of the first inner loop already completely nonsensical (i.e. during the adjoint run the instabilities spread from the vicinity of the misfit forcing to remote regions and are much stronger there).

jivica · #2 Unread post by **jivica** » Mon Feb 12, 2024 1:31 am

Hi Stef,
we need more info about your experiment (do you alter your boundary conditions, forcing, how frequently, and/or only the inital state, how you computed your model error cov, obs erros?) to help you more.
Do you start your application from a balanced run (i.e. some restart of already spun simulation)?
Just a few thoughts to think about;
Adjoint is all about perturbations and in that sense, you shouldn't start too far from the prior (you are not going along the basic principles of 4D-Var). In reality, you fix obvious problems within your system first and then you apply DA.
In your example, you provide one vertical transect (say ~30 points vertically) only once within the assimilation window and hope for some visible improvement while your system is (just a wild guess) 100x100x30x7 variables?
Departures from your background state will be hard if you have "B" matrix playing, remember you have 2 parts in the cost function.
If you are using 4DVar then you are assimilating at the exact space-time location, not sure what you mean with 3DVar type?
If you want to emulate 3DVar (not sure there is an option inside the current code) you might consider extending the obs time for more timesteps/duration.
In case you want to play with a similar example (not sure you are right now) then start from the already good solution, sample the system at some location, and perturb the initial state (taking care you are not introducing funny stuff i.e. inversion etc).
From the sampled data create synthetic observation you'll assimilate later, this is the truth you want to use and help you get to the original state you used to sample obs. This concept is well known - ocean sensitivity experiment and you can find vast literature.

Regarding the blow-up, as I said, ADJ and TLM are working in the perturbation space, and be sure you are within the basic principles of DA (we are evaluating only the first-order perturbations as in Taylor expansion, TLM is linearized hence L in name).

Cheers,
Ivica

stef · #3 Unread post by **stef** » Mon Feb 12, 2024 10:27 am

Hi Ivica,

Thanks for the suggestions! Yes I haven't provided enough details about the run, I will try to write it up in a blog post. My present set up is more of a test case for learning 4dvar than a realistic application, i.e. an addition to the tutorial excercises. Alongside I am reading the literature on theory.

That said, the far and distant objective is to do tropical cyclone/storm surge forecasting, so I pick a test case within that context. I should probably take a look at the ROMS example cases around the U.S. Coast, but instead I use the Bay of Bengal. I'm starting from a CMEMS NEMO simulation and do 5 days without any simulation to let the initial condition adjust a little bit. I'm aware that for real operational DA this may not make sense, but for experimenting/learning it should be fine (?)

The map below shows the domain. The track is from Cyclone Sitrang, but it's a distraction because the problems I'm currently looking at occur before the storm. Also, for storm DA the mixed layer is important, and the BoB is special because of the barrier layers etc. I'm not sure if the implications of the salinity-dominated mixed-layer depth in the BoB are well understood in terms of DA (in particular the assumption of temperature-dominated stratification in the balance equation), and I still have to read a lot about it. Looking at this would be current research and I think/hope my problem is much more basic than that.

Unfortunately my workstation keeps freezing for unkown reasons (bios update ... very bad idea) and I'm in the process of temporarily moving my workflow to my laptop. So I'm currently a bit limited in terms of experiments, but I was able to produce some figures.

Figure 1 is the domain (again, ignore the TC track).

Figures 2 and 3 are ARGO profiles, the first figure is from the float inside the domain, the second one from the one outside . I have satellite data too, but I intentionally would like to work on one instrument only until I sufficiently understand what's going on.

Black lines are the instrument.
Green lines are EN4 climatology.
Red lines are NEMO-CMEMS.
Blue lines are ROMS free run starting from Sept. 30, initialized from NEMO-CMEMS.

There are large temperature misfits in the thermocline.

*) I adjust initial conditions only.
*) I don't use the balance operator.
*) the background error cov. is the simple full variance of CMEMS over 10 years, without removing any trend, seasonal cycle or interannual variations of the seasonal anomalies. Does this yield a too large error cov. because I leave all the long-term signals in? But even if so, how could this cause a blow-up in the first adjoint iteration? Via preconditioning perhaps? I don't think I use preconditioning, I have Lprecond=F. Besides that, would it make any sense to precondition without balancing? In Moore et al. 2011 part I. it says that preconditioning is applied to the unbalanced increment, so I don't think it would?

If you are using 4DVar then you are assimilating at the exact space-time location, not sure what you mean with 3DVar type?

What I meant was if all observations occur at the initial time of a 4-dvar run, the 4-dvar run becomes much like a free-running forward model starting from a 3-dvar assimilated initial condition. Because there are no data assimilated other than at initial time.

In case you want to play with a similar example (not sure you are right now) then start from the already good solution, sample the system at some location, and perturb the initial state (taking care you are not introducing funny stuff i.e. inversion etc).
From the sampled data create synthetic observation you'll assimilate later, this is the truth you want to use and help you get to the original state you used to sample obs. This concept is well known - ocean sensitivity experiment and you can find vast literature.

Yes, that's great! I have done a similar thing by creating artifical obs. by sampling the model, and then perturb the observations. But only in pervious experiments with along-track satellite altimetry (which turned out to be too advanced for me at the moment). But I haven't thought about perturbing the initial conditions!

Thanks again for your help!

: map_romsbob.png (94.58 KiB) Viewed 27672 times

stef · #4 Unread post by **stef** » Mon Feb 12, 2024 3:41 pm

Hmm, regarding what I wrote above about preconditioning:

In i4dvar.f90 (as noted in the first post I use I4DVAR for debugging), there is a segment

Code: Select all

!  Convert observation cost function gradient, GRADx(Jo), from model
!  space (x-space) to minimization space (v-space)

which is executed after the first adjoint iteration, independent of the Lprecond switch. The Lprecond switch appears in cgradient.f90 only and refers to 'second-stage' preconditioning only. This is explained in Moore et al. 2011 chapter 6.2.

So apparently I do use the background error cov. to precondition, even though I don't use the balance operator. I have to read more to check if that makes sense. In any case, this is done only after the adjoint iteration to compute the gradient in 'x-space', so I doubt it's the cause of the blow-up?

arango · #5 Unread post by **arango** » Mon Feb 12, 2024 4:49 pm

Why do you use the primal formulation I4DVAR? That algorithm works, but it is deprecated. Please use the dual formulation RBL4DVAR. It is faster! Check the WC13 test cases for RBL4DVAR. The dual formulation works on the space span by the observations and allows impacts and sensitivities. We also coded the (4DVAR)^T. That is the adjoint of the full 4DVAR algorithm for impacts and sensitivities, including the forecast.

Also, it is tough to make linear codes blow up. The TLM and ADM kernels start from the rest. The ADM is forced by the adjoint of the misfit between the observation and the prior. Work on the convolution parameters for the background error covariance.

stef · #6 Unread post by **stef** » Mon Feb 12, 2024 5:32 pm

Thanks for the reply!

I used RBL4DVAR with RPCG, as in Exercise 3 of the tutorial. It worked in some cases, but then I got blow-ups and didn't now how to investigate the causes. That's when I switched to I4DVAR and modified the NADJ to write out the adjoint solutions more frequently (I'm aware that this may break the code when modifying this, but I terminated those debug runs anyway after the first iteration of the adjoint code. It was just to look what the adjoint model does.)

Also, it is tough to make linear codes blow up.

That's very useful information! I have zero experience. So there must be some more fundamental problem.

Work on the convolution parameters for the background error covariance.

Ok, thanks for the hint!

arango · #7 Unread post by **arango** » Mon Feb 12, 2024 9:39 pm

You cannot change NADJ in I4DVAR since the primal formulation is only a strong constraint algorithm (the model is assumed to be perfect). However, RBL4DVAR is both strong- and weak-constraint. But I suggest staying in the strong-constraint regime since modeling the model error, Q, is not trivial. NADJ should be the same size as ntimes, so the analysis adjustment due to data assimilation is for the initial state vector.

stef · #8 Unread post by **stef** » Tue Feb 13, 2024 8:01 am

But I suggest staying in the strong-constraint regime since modeling the model error, Q, is not trivial.

Yes, I'm trying to look at the simplest case possible.

NADJ should be the same size as ntimes.

That's what I thought after I first tried to change it. Because if the input is lower than that, it 'automatically' resets itself to ntimes. In read_phypar.f90:

Code: Select all

!  If strong constraint, write only final adjoint solution since only
!  we are estimating initial conditions.
!
        nADJ(ng)=ntimes(ng)

However, after I got the blow-ups I looked at the final adjoint solution and it was complete nonsense. So much that it was not obvious that it actually had anything to do with the observation misfit. Then I thought: How can I verify in the output files that the adjoint forcing (i.e. the misfit) is applied at the correct horizontal location?

I couldn't find any other way, so I decided to uncomment the line above (being aware that it may break the code), to write out the adjoint more frequently. Then after I had a couple of adjoint timesteps in the netcdf file, I terminated that run, because I was aware that I may have broken the rest of the code by modifying that line. Looking at the more frequent adjoint output helped me, in the sense that I could see that the initial misfit forcing was indeed located at the correct horizontal position, and that the problems at least started at the correct location. This was not visible from the final adjoint solution.

So from my beginner perspective, there is value in writing out the adjoint more frequently, and while I'm aware that uncommenting the line above breaks the code, I do it anyways because I do not see another option. I terminate after the first adjoint iteration, because the rest of the model run doesn't make sense any more.

Is there a way to track the initial adjoint output more frequently to an auxiliary field, without breaking the subsequent model run?

In case there isn't, I could modify the code more sensibly by creating a custom auxiliary output field, right? And leave nADJ(ng) untouched. This wouldn't break the model, right?

Would this also be possible for the dual code? I read that in the dual formulation (e.g. RBL4DVAR), there is the variable transformation from x to 'w', and if I understood correctly, 'w' has no straightforward physical interpretation. So I'm not sure if there is any value in writing out the adjoint more often (I mean to a different auxiliary field) in the dual formulation code.

stef · #9 Unread post by **stef** » Tue Feb 13, 2024 5:09 pm

Sorry may I ask a question about the Tutorial in this thread (it may be related to my problem discussed here):

I noticed that the tutorial does not seem to work with git branch 'main'.

Currently 'main' is at 21d8ab47ad4b (from June 2023).

The debugger complains about what looks like related to the bug reported here:

viewtopic.php?t=6374

It's about state vector variables' indices. Hernan writes on the thread the he made some updates, but I assume these were for branch 'develop'.

Are you backporting these fixes to 'main', or should 'develop' be used for the tutorial?

arango · #10 Unread post by **arango** » Tue Feb 13, 2024 5:13 pm

Yes, use the ROMS version on GitHub. We will no longer update all the SVN versions after January 1, 2025. I fixed all the indices in the code.

stef · #11 Unread post by **stef** » Wed Feb 14, 2024 7:27 am

Yes, use the ROMS version on GitHub. We will no longer update all the SVN versions after January 1, 2025. I fixed all the indices in the code.

Hmm, not sure if I understand. Which ROMS version on github, the 'develop' branch or the 'main' branch? Which revision should we use with the currently available tutorial?

Thanks again for all the help!

arango · #12 Unread post by **arango** » Wed Feb 14, 2024 2:19 pm

I updated all the versions of ROMS in the svn and git repositories, but I recommend the GitHub version since it has branches of other developments that we are working on and will be merged to the default develop branch in the future. We will stop updating the other repositories on January 1, 2025, but keep them for historical reasons.

robertson · Wed Feb 14, 2024 3:09 pm

stef wrote: Wed Feb 14, 2024 7:27 am Hmm, not sure if I understand. Which ROMS version on github, the 'develop' branch or the 'main' branch? Which revision should we use with the currently available tutorial?

You should use the default develop branch on GitHub for pretty much everything. Think of develop as the latest svn revision and main as the most recent tagged revision (currently 'roms-4.1').

We also suggest that you use ROMS test cases from github (https://github.com/myroms/roms_test.git) as the testcases available in the svn repository will also no longer be updated starting January 1^st, 2025. As long as you keep both repositories (roms and roms_test) updated the tutorials should continue to work.

stef · #14 Unread post by **stef** » Wed Feb 14, 2024 3:36 pm

Oh, I did not see the roms_test repository on github! On the 4dvar tutorial page still mentions svn:

https://www.myroms.org/wiki/4DVar_Tutorial_Introduction
(accessed 2024/02/14)

I might have missed some announcements...

As long as you keep both repositories (roms and roms_test) updated the tutorials should continue to work.

Excellent! I'll retire every ROMS-related svn repo I have on my computer.

Thanks for clarifying this again.

arango · #15 Unread post by **arango** » Wed Feb 14, 2024 5:02 pm

I updated the tutorial information in WikiROMS. Thank you! We are usually swamped and will take time to update the information. We also have minimal information and instructions in the GitHub repositories wiki. Some users are unfamiliar with Git and GitHub, and we are given plenty of time to learn.

Ocean Modeling Discussion

Adjoint blow-up if misfit forcing is too far after initial time

Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time

Re: Adjoint blow-up if misfit forcing is too far after initial time