Have you ever been wondering what it means that your job has ended in UNKNOWN state? Or why your job never seems to finish (state STARTED), although you restarted the server?
This short blog post is about the different batch states and the transitions between them.
Let’s take a look at the BatchStatus state transitions when nobody interacts with the job.
The BatchStatus applies to
Steps. While the job pretty soon transitions from STARTING to STARTED, the separation of those two states makes more sense for steps. If you partition your data into 40 partitions, but only have ten threads to work on them, ten
StepExecutions will directly transition from STARTING to STARTED, but the remaining 30
StepExecutions will stay in BatchStatus STARTING until a thread is free.
The states COMPLETED and FAILED are pretty clear. COMPLETED indicates a successful job or step, and FAILED indicates a failure that stopped the job or step. In both cases the framework part of the execution went fine, and in the case of the state FAILED a restart is possible.
So what’s UNKNOWN then? A job or step gets into the UNKNOWN state when there was an error while saving batch job meta data. Since normally the chunk is committed before, Spring Batch cannot tell if all data is consistent, and to avoid problems when restarting, the Job / Step finishes in UNKNOWN state.
So now let’s take a look at manual intervention.
While in states STARTING or STARTED a job can be manually stopped via
JobOperator, thread interruption or
BatchStatus of the job is set to STOPPING, and steps will check for this state at chunk boundaries to stop themselves. When all steps are stopped, the state of the job is set to STOPPED as well. This means that if a step is looping inside a chunk, or if the server is shut down during stopping the batch, the job will always stay in state STOPPING.
Stopped jobs may be restarted and to avoid someone doing so, it’s possible to abandon a job. This can be done from the
JobOperator and from the
CommandLineJobRunner. The state of the job is immediately set to ABANDONED, and no restart is possible anymore.
To answer the question from the intro: whenever the server is stopped or the process is killed, the job stays in the state it’s in, so the state may be STARTED although the job isn’t running anymore. The only solution then is to stop and abandon it.
I hope this short blog post helps understanding the lifecycle of jobs and steps. By the way, the JSR-352 took over the states from Spring Batch, so this information applies there as well.