ROMS runtime error
-
- Posts: 27
- Joined: Mon Jan 27, 2014 9:50 pm
- Location: Indian Institute of Science
ROMS runtime error
Hi ROMS users
I have been trying to run a roms application, but after submitting my job (parallel) to the cluster, it exits from the queue after running for 1 second. I checked the log file and it displays the following message
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 15
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
I checked with my cluster admin and there is nothing wrong with the job submit script. Please help.
I have been trying to run a roms application, but after submitting my job (parallel) to the cluster, it exits from the queue after running for 1 second. I checked the log file and it displays the following message
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 15
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
I checked with my cluster admin and there is nothing wrong with the job submit script. Please help.
Re: ROMS runtime error
What does the ROMS output look like? Did you get any?
-
- Posts: 27
- Joined: Mon Jan 27, 2014 9:50 pm
- Location: Indian Institute of Science
Re: ROMS runtime error
I didn't get any output. The log file is supposed to show something like this:
--------------------------------------------------------------------------------
Model Input Parameters: ROMS/TOMS version 3.7
Wednesday - November 8, 2017 - 2:50:07 PM
--------------------------------------------------------------------------------
Instead, all I get is the message
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 15
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
Cluster admin says job submission script is submitting the job successfully. The problem is with running the model.
--------------------------------------------------------------------------------
Model Input Parameters: ROMS/TOMS version 3.7
Wednesday - November 8, 2017 - 2:50:07 PM
--------------------------------------------------------------------------------
Instead, all I get is the message
=====================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 15
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
Cluster admin says job submission script is submitting the job successfully. The problem is with running the model.
Re: ROMS runtime error
ROMS is normally quite verbose about what went wrong with it. It starts writing to stdout very early in the job. You're saying it gets killed even before that, there's no ROMS output at all. Very odd.
-
- Posts: 27
- Joined: Mon Jan 27, 2014 9:50 pm
- Location: Indian Institute of Science
Re: ROMS runtime error
Yes Kate. That's exactly what's happening. Please help.
Re: ROMS runtime error
Did it used to run? Are you sure you have the right syntax on the ROMS execute line in your script? Can we see that?
-
- Posts: 27
- Joined: Mon Jan 27, 2014 9:50 pm
- Location: Indian Institute of Science
Re: ROMS runtime error
Please find my job submit submit script in the attachment. It used to run, albeit with a little difference in execute line.
- Attachments
-
- submit_new.sh
- (486 Bytes) Downloaded 274 times
Re: ROMS runtime error
Try taking out the blank line #2. I don't know if you're allowed to have a blank line there. Some queueing systems end the batch commands on the first blank line.
-
- Posts: 27
- Joined: Mon Jan 27, 2014 9:50 pm
- Location: Indian Institute of Science
Re: ROMS runtime error
Removed the blank lines. Still the same log. Did a tracejob on jobid, the output is as follows:
[casparga@tyrone-cluster upwelling]$ tracejob 78891
/var/spool/torque/server_priv/accounting/20171111: Permission denied
Job: 78891.tyrone-cluster
11/11/2017 12:29:47 S enqueuing into batch, state 1 hop 1
11/11/2017 12:29:47 S dequeuing from batch, state QUEUED
11/11/2017 12:29:47 S enqueuing into idqueue, state 1 hop 1
11/11/2017 12:29:47 S Job Queued at request of casparga@tyrone-cluster, owner = casparga@tyrone-cluster, job name = UPWELLING, queue = idqueue
11/11/2017 12:29:47 S Job Modified at request of Scheduler@tyrone-cluster
11/11/2017 12:29:47 S Exit_status=2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00
11/11/2017 12:29:47 L Job Run
11/11/2017 12:29:47 S Job Run at request of Scheduler@tyrone-cluster
11/11/2017 12:29:47 S Not sending email: User does not want mail of this type.
11/11/2017 12:29:47 S Not sending email: User does not want mail of this type.
11/11/2017 12:29:47 S dequeuing from idqueue, state COMPLETE
11/11/2017 12:29:47 M scan_for_terminated: job 78891.tyrone-cluster task 1 terminated, sid=7807
11/11/2017 12:29:47 M job was terminated
11/11/2017 12:29:47 M obit sent to server
11/11/2017 12:29:47 M removed job script
[casparga@tyrone-cluster upwelling]$ tracejob 78891
/var/spool/torque/server_priv/accounting/20171111: Permission denied
Job: 78891.tyrone-cluster
11/11/2017 12:29:47 S enqueuing into batch, state 1 hop 1
11/11/2017 12:29:47 S dequeuing from batch, state QUEUED
11/11/2017 12:29:47 S enqueuing into idqueue, state 1 hop 1
11/11/2017 12:29:47 S Job Queued at request of casparga@tyrone-cluster, owner = casparga@tyrone-cluster, job name = UPWELLING, queue = idqueue
11/11/2017 12:29:47 S Job Modified at request of Scheduler@tyrone-cluster
11/11/2017 12:29:47 S Exit_status=2 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:00
11/11/2017 12:29:47 L Job Run
11/11/2017 12:29:47 S Job Run at request of Scheduler@tyrone-cluster
11/11/2017 12:29:47 S Not sending email: User does not want mail of this type.
11/11/2017 12:29:47 S Not sending email: User does not want mail of this type.
11/11/2017 12:29:47 S dequeuing from idqueue, state COMPLETE
11/11/2017 12:29:47 M scan_for_terminated: job 78891.tyrone-cluster task 1 terminated, sid=7807
11/11/2017 12:29:47 M job was terminated
11/11/2017 12:29:47 M obit sent to server
11/11/2017 12:29:47 M removed job script
Re: ROMS runtime error
Have you talked to your supercomputer people? I don't think this is anything to do with ROMS. Maybe you should let it send you email if it can be more verbose./var/spool/torque/server_priv/accounting/20171111: Permission denied
-
- Posts: 27
- Joined: Mon Jan 27, 2014 9:50 pm
- Location: Indian Institute of Science
Re: ROMS runtime error
Sorry for late reply but I communicated my cluster admin that there is no problem with the model. After resolving an issue of password less login, I tried to run the model and got the following as output in the log file.
[mpiexec@tyrone-node16] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:184): assert (!closed) failed
[mpiexec@tyrone-node16] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:74): unable to send SIGUSR1 downstream
[mpiexec@tyrone-node16] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@tyrone-node16] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec@tyrone-node16] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
Is there a problem with the compilation of the model?
[mpiexec@tyrone-node16] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:184): assert (!closed) failed
[mpiexec@tyrone-node16] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:74): unable to send SIGUSR1 downstream
[mpiexec@tyrone-node16] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@tyrone-node16] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
[mpiexec@tyrone-node16] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
Is there a problem with the compilation of the model?
Re: ROMS runtime error
You can search the web for answers to things like this. Here's one match which might be useful.