MPI - tiling on POWER8
MPI - tiling on POWER8
This might be an issue that has been answered or partially answered, or it might just be a dumb question. But I will ask anyway since I do not have the answer. While I was trying to figure out the optimal tiling for my specific domain, I found out that the results were depending on the number NtileI and not NtileJ. For example, 8x5 and 8x4 give the same results, but not 5x8 and 4x8. I then decided to download the latest version of the code from scratch (no update) just in case I missed an update or did something wrong. Even better I decided to run the roms_benchmark3.in where I played with the tiling. I found exactly the same problem. For example, a run with 8x4 gives the same results as 4x8. A 8x5 run will give the same results as a 8x4, however 4x8 will not give the same result a 5x8. I did many runs and it seems that I can change to any number NtitleJ while keep NtileI and the results remain the same. If I change NtileI while keeping NtitleJ, I get different results. I had compiled the code with the IBM compiler, running mpich, on a POWER8 2node, 10 cores, 8 threats per core under Linux CENTOS. I wonder if anybody had experienced the same problem on a POWER8 or any other types of machine. Thanks for your help. Y
- arango
- Site Admin
- Posts: 1368
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: MPI - tiling on POWER8
What do you mean by the same results? profiling time or state solution?
Usually, it is much efficient to do less partition in the I-direction (NtileI) and more in the J-direction (NtileJ) because of vectorization. Also, it depends on the computer architecture.
Usually, it is much efficient to do less partition in the I-direction (NtileI) and more in the J-direction (NtileJ) because of vectorization. Also, it depends on the computer architecture.
Re: MPI - tiling on POWER8
Thanks Hernan. Sorry for not being clear in my posting. When I said "results", I meant state solution. For example, the difference in zeta in my setup (where I have only one open boundary at the north of the domain, i.e Y-direction) reaches 10-4 after 12hrs of run and keeps increasing. In the benchmark3 case, the difference in zeta is very small (precision of the machine) after 200 iterations but also keeps increasing.
I have been following the discussion about tiling over the years. I also run my simulations on several architectures and every time I test what gives the best timing for the given project and architecture. I also did some benchmark with IBM Power8 as it seems that what IBM thinks is the best, i.e. 8 threats instead of 4 per core, is not an advantage for ROMS. I have to admit that I did not look as carefully as now on other architectures. I looked at timing and not results. I do not have time right now to explore more on other architectures, so I decided to post on the forum to inquire about the problem.
I have been following the discussion about tiling over the years. I also run my simulations on several architectures and every time I test what gives the best timing for the given project and architecture. I also did some benchmark with IBM Power8 as it seems that what IBM thinks is the best, i.e. 8 threats instead of 4 per core, is not an advantage for ROMS. I have to admit that I did not look as carefully as now on other architectures. I looked at timing and not results. I do not have time right now to explore more on other architectures, so I decided to post on the forum to inquire about the problem.
-
- Posts: 4
- Joined: Tue Aug 10, 2021 1:15 pm
- Location: Institute of Atmospheric Physics, Chinese Academy
Re: MPI - tiling on POWER8
Hello,Spitz.
I am a new user of ROMS and I have experienced the similar problem to yours. For instance, when I run a model start at 20010101_0000 with CPU=40, NtileI=5,NtileJ=8, the model would stop with the error of "blowing up" at 20010118_1030. However, when I still set CPU=40 but change the Ntile=4, Ntile=10, it would stop with the error of "blowing up" at 20010302_1130. I have tried more other CPU with different NtileI*NtileJ, and found it might run longer while NtileI is even number and NtileJ is an odd multiple of 5. I cannot make a conclusion of the best choice of NtileI*NtileJ and don't know why they will be different. I want to know if you have solved the problem? Or any other suggestions to me?
Hope for your reply, thanks a lot.
I am a new user of ROMS and I have experienced the similar problem to yours. For instance, when I run a model start at 20010101_0000 with CPU=40, NtileI=5,NtileJ=8, the model would stop with the error of "blowing up" at 20010118_1030. However, when I still set CPU=40 but change the Ntile=4, Ntile=10, it would stop with the error of "blowing up" at 20010302_1130. I have tried more other CPU with different NtileI*NtileJ, and found it might run longer while NtileI is even number and NtileJ is an odd multiple of 5. I cannot make a conclusion of the best choice of NtileI*NtileJ and don't know why they will be different. I want to know if you have solved the problem? Or any other suggestions to me?
Hope for your reply, thanks a lot.
Re: MPI - tiling on POWER8
I would say that any time the answer changes for different tilings, you have a bug. To debug these things, run both tilings for just a timestep and plot the differences between solutions. Which field changes first? If you run 1x4 and 4x1, do you get a grid pattern in the diffs of three vertical stripes and three horizontal stripes? Then I usually turn to the debugger to watch the dueling runs up close.
Re: MPI - tiling on POWER8
I have not solved the problem with the tiling. As Kate says, if the answer changes for different tilings, there is a bug. I indeed believe that there must be a problem either with the ROMS distribution code, POWER8 MPI or xlf. But where, this is the question that will require digging. The different answers I got were with the roms_benchmark3.in and no modification of the code (code with all the updates) as well as my own set-up. So I am confident that it is due to my test case. I do not have other architectures than the POWER8 to check if it is a problem inherent to POWER8 or simply some ROMS code indexing problem. I am not aware of anybody looking carefully at this issue on other architectures.
- arango
- Site Admin
- Posts: 1368
- Joined: Wed Feb 26, 2003 4:41 pm
- Location: DMCS, Rutgers University
- Contact:
Re: MPI - tiling on POWER8
We have not seen parallel bugs in the distributed ROMS code for years. In the past, such parallel bugs are associated with the user implementation of analytical functions. If there is an issue with the BENCHMARK application, we need to look at its analytical functions. I am very busy at the moment. I will take a look when I get a chance.