Overview

Spring Batch and JSR-352 (Batch Applications for the Java Platform) – Differences

5 Comments

JSR-352 is final and included in JEE7, the first implementation is available in Glassfish 4. JSR-352 takes over the programming model of Spring Batch almost 1-1, just take a look at the domain and component vocabulary:

Spring BatchJSR-352Comment
JobJob
StepStep
ChunkChunk
ItemItem
ItemReader / ItemStreamItemReaderJSR-352’s ItemReader includes Spring Batch’s ItemStream capabilities
ItemProcessorItemProcessor
ItemWriter / ItemStreamItemWriterJSR-352’s ItemWriter includes Spring Batch’s ItemStream capabilities
JobInstanceJobInstance
JobExecutionJobExecution
StepExecutionStepExecution
JobExecutionListenerJobListener
StepExecutionListenerStepListener
ListenersListenersWe have the same listeners in SB and JSR-352

Those are the most important components and names, but you can continue this list and you’ll only find minor differences. The configuration in XML for a simple job looks very much the same as well:

Spring BatchJSR-352
<job id="myJob">
    <step id="myStep" >
        <tasklet>
            <chunk 
                reader="reader" 
                writer="writer" 
                processor="processor"
                commit-interval="10" />
        </tasklet>
    </step>
</job>
<job id="myJob">
    <step id="myStep" >
        <chunk item-count="2">
            <reader ref="reader"/>
            <processor ref="processor"/>
            <writer ref="writer"/>
        </chunk>
    </step>
</job>

All in all it’s a very good thing from either point of view. The Java community gets a standard derived from the most popular open source batch framework, which in turn will implement the standard in its next release. People using Spring Batch will always have the safety to know that, if Spring Batch is abandoned sometime in the future, there are other implementations with the exact same programming model, and it’s (quite) easy to switch. People using other implementations of JEE7 server vendors have the safety to know that the programming model has been validated for years now.

Though the programming model is pretty much the same, there are still some differences between the JSR-352 specification and the current Spring Batch implementation. Today I wanna talk about three of them, and I’m very curious about how Michael Minella and Co. will solve those differences.

Scoping

The following paragraph is taken from the JSR-352 specification.

11.1 Batch Artifact Lifecycle

All batch artifacts are instantiated prior to their use in the scope in which they are declared in the Job XML and are valid for the life of their containing scope. There are three scopes that pertain to artifact lifecycle: job, step, and step-partition.
One artifact per Job XML reference is instantiated. In the case of a partitioned step, one artifact per Job XML reference per partition is instantiated. This means job level artifacts are valid for the life of the job. Step level artifacts are valid for the life of the step. Step level artifacts in a partition are valid for the life of the partition.
No artifact instance may be shared across concurrent scopes. The same instance must be used in the applicable scope for a specific Job XML reference.

So, we’re gonna have three scopes in implementations of the JSR-352: job, step and step-partition. In Spring Batch we currently have the two scopes singleton and step. Since partitioning is a little bit more different between Spring Batch and the JSR-352, I will exclude it here and just talk about the scopes job and step vs. the scopes singleton and step. In Spring Batch everything is singleton by default, and if we want to have step scope, we need to explicitly set it on the batch artifact. A job scope does not exist. A very practical consequence is that you can’t inject job parameters into components that are not in step scope. In JSR-352, all components inside or referenced by a <job /> definition get job scope and all components inside or referenced by a <step /> definition get step scope. You cannot change that behaviour, which, for example, means that you cannot have components in singleton scope.
All in all, I prefer the JSR-352 way of dealing with scopes. Since many batch components have state and job parameters need to be injected here and there, you almost always end up giving step scope to almost every component inside a step, so step scope would be a sensible default and it wouldn’t really be a limitation if you cannot have singleton scope. A job scope would make sense in general, but it has been discussed in the Spring Batch community several times (for example here) and always has been declined for not adding much value. This is still true, since the only component that cannot have step scope for accessing job parameters is the JobExecutionListener, and methods of this component always receive arguments which include the job parameters. So when the JSR-352 way is a little bit more straight forward and cleaner, it’s not a game changer, it’s more or less about a nicer default scope for steps and a job scope that’s not really necessary.
Anyway, if Spring Batch wants to implement the JSR-352, there will be some changes. The JSR-352’s JobListener (which is the equivalent for the JobExecutionListener in Spring Batch) definitely needs a job scope, because otherwise it would not have any chance to access job parameters (its beforeJob and afterJob methods don’t take arguments, so job parameters need to be injected, and step scope is not available at that point of processing the job). EDIT: Sometimes reality is faster than writing blog posts: Spring Batch 2.2.1 has been released, and it introduces a job scope.

Chunk processing

The following illustration is taken from the final release of the specification. You can see that one item is read, then processed, then the next item is read and processed, and finally all processed items are written in one action.
chunk-oriented-processing
Ironically, this picture is copied from the Spring Batch reference documentation, but it has never been implemented like that. Chunk based processing in Spring Batch works like this:
Blog_Transactions_Base
First, all items for the chunk are read, then processed, then written. If processing in Spring Batch stays like this, it doesn’t conform to the JSR-352 spec, but why does it make a difference? It makes a difference, because the spec introduces an attribute time-limit on the chunk element, and it specifies the number of seconds of reading and processing after which a chunk is complete. My guess is that in Spring Batch it will specify the number of seconds of reading after which a chunk is complete, because changing that behaviour would be too complex and didn’t bring too much value.
For batches that mostly do writing (and I know a lot of them) the time-limit attribute is not very helpful anyway.

Properties

The JSR-352 introduces an interesting concept of dealing with properties. On almost any level of the job XML you may define your own properties, and then you can access them for substitution in property definitions that are defined after the first property AND belong to the hierarchy where the first property was defined. This example is taken from the spec:

   <job id="job1">
      <properties>
         <property name="filestem" value="postings"/>
      </properties>
      <step id="step1">
         <chunk>
            <properties>
               <property name="infile.name" value="#{jobProperties['filestem']}.txt"/>
            </properties>
         </chunk>
      </step>
   </job>

The resolution for infile.name would be postings.txt. If you want to access the property in some component that’s referenced inside the chunk, for example the ItemReader, you need to inject it with a special annotation BatchProperty:

@Inject @BatchProperty(name="infile.name") 
String fileName;

Until now we just saw how to define our own properties in the job XML, but the spec offers some more sources for properties. This is the complete list:

  1. jobParameters – specifies to use a named parameter from the job parameters.
  2. jobProperties – specifies to use a named property from among the job’s properties.
  3. systemProperties – specifies to use a named property from the system properties.
  4. partitionPlan – specifies to use a named property from the partition plan of a partitioned step.

This system reflects a little bit a different philosophy of dealing with properties. In a Spring application properties are normally read from a file and/or system properties with a little help of the PropertyPlaceholderConfigurer and then used in bean definitions. In Spring Batch you additionally may access job parameters and job and step execution contexts (the latter would be the location for partition plan parameters) in bean definitions. The JSR-352 does not specify any way of reading properties from an external file, instead the job XML itself seems to be the property file. That’s not very useful, so I guess every implementation will have its own solution for reading properties from an external file.
Anyway, the possibility to define properties directly in the job XML and to build them up in a hierarchieral way is new to Spring Batch and has to be implemented for the JSR-352. Using @Inject @BatchProperty for injecting properties into a bean is new as well, but it’s more or less the same thing that currently does the annotation @Value, so the implementation shouldn’t be much of a problem.

Conclusion

Though the programming models in JSR-352 and Spring Batch are pretty much the same, there are some small differences between the spec and the implementation of Spring Batch. I’m curious about the way these differences are dealt with. Exciting times for batch programmers!

Kommentare

  • John Williamson

    29. July 2013 von John Williamson

    A couple minor technical questions:
    * Does JSR-352 support reader/processor/writer attributes on the chunk element as show in the listing above? I thought they were sub elements.
    * Isn’t commit-interval item-count in the JSR under the way you are using it in the same listing?

  • Simon

    I have checked the source code of Spring Batch v2.2.1. The job scope is not yet officially supported.

  • Binh Thanh Nguyen

    17. December 2013 von Binh Thanh Nguyen

    Thanks, nice post

Comment

Your email address will not be published. Required fields are marked *