Overview

Spring Batch 2.2 – JavaConfig Part 1: A comparison to XML

10 Comments

This is the first part of a series on Java based configuration in Spring Batch. Spring Batch 2.2 will be out in a few weeks (update: was released 6/6), and it will have a Java DSL for Spring Batch, including its own @Enable annotation. In Spring Core I prefer Java based configuration over XML, but Spring Batch has a really good namespace in XML. Is the Java based approach really better? Time to take a deep look into the new features!
In this first post I will introduce the Java DSL and compare it to the XML version, but there’s more to come. In future posts I will talk about JobParameters, ExecutionContexts and StepScope, profiles and environments, job inheritance, modular configurations and partitioning and multi-threaded step, everything regarding Java based configuration, of course. You can find the JavaConfig code examples on Github. If you want to know when a new blog post is available, just follow me on Twitter (@TobiasFlohre) or Google+.

Back in the days – a simple configuration in XML

Before we start looking at the new Java DSL, I’ll introduce you to the job we’ll translate to Java based configuration. It’s a common use case, not trivial, but simple enough to understand it in a reasonable amount of time. It’s the job’s job to import partner data (name, email address, gender) from a file into a database. Each line in the file is one dataset, different properties are delimited by a comma. We use the FlatFileItemReader to read the data from the file, and we use the JdbcBatchItemWriter to write the data to the database.
We split the configuration in two parts: the infrastructure configuration and the job configuration. It always makes sense to do that, because you may want to switch the infrastructure configuration for different environments (test, production), and you may have more than one job configuration.
An infrastructure configuration in XML for a test environment looks like this:

<context:annotation-config/>
 
<batch:job-repository/>
 
<jdbc:embedded-database id="dataSource" type="HSQL">
	<jdbc:script location="classpath:org/springframework/batch/core/schema-hsqldb.sql"/>
	<jdbc:script location="classpath:schema-partner.sql"/>
</jdbc:embedded-database>
 
<bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
	<property name="dataSource" ref="dataSource" />
</bean>
 
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
	<property name="jobRepository" ref="jobRepository" />
</bean>

Note that we create our domain database tables here as well (schema-partner.sql), and note that it’s done in an In-Memory-Database. That’s a perfect scenario for JUnit integration tests.
Now let’s take a look at the job configuration:

<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
	<property name="resource" value="classpath:partner-import.csv"/>
	<property name="lineMapper" ref="lineMapper"/>
</bean>
<bean id="lineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
	<property name="lineTokenizer">
		<bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
			<property name="names" value="name,email"/>
			<property name="includedFields" value="0,2"/>
		</bean>
	</property>
	<property name="fieldSetMapper">
		<bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
			<property name="targetType" value="de.codecentric.batch.domain.Partner"/>
		</bean>
	</property>
</bean>
 
<bean id="processor" class="de.codecentric.batch.LogItemProcessor"/>
 
<bean id="writer" class="org.springframework.batch.item.database.JdbcBatchItemWriter">
	<property name="sql" value="INSERT INTO PARTNER (NAME, EMAIL) VALUES (:name,:email)"/>
	<property name="dataSource" ref="dataSource"/>
	<property name="itemSqlParameterSourceProvider">
		<bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider"/>
	</property>
</bean>
 
<batch:job id="flatfileJob">
	<batch:step id="step">			
		<batch:tasklet>
			<batch:chunk reader="reader" processor="processor" writer="writer" commit-interval="3" />
		</batch:tasklet>
	</batch:step>
</batch:job>

Note that we almost only use standard Spring Batch components, with the exception of the LogItemProcessor and, of course, our domain class Partner.

Java – and only Java

Now it’s time for the Java based configuration style. You can find all the examples used in this blog post series here.

Infrastructure configuration

First, we’ll take a look at the infrastructure configuration. Following one of the patterns I described here, I provide an interface for the InfrastructureConfiguration to make it easier to switch it in different environments:

public interface InfrastructureConfiguration {
 
	@Bean
	public abstract DataSource dataSource();
 
}

Our first implementation will be one for testing purposes:

@Configuration
@EnableBatchProcessing
public class StandaloneInfrastructureConfiguration implements InfrastructureConfiguration {
 
	@Bean
	public DataSource dataSource(){
		EmbeddedDatabaseBuilder embeddedDatabaseBuilder = new EmbeddedDatabaseBuilder();
		return embeddedDatabaseBuilder.addScript("classpath:org/springframework/batch/core/schema-drop-hsqldb.sql")
				.addScript("classpath:org/springframework/batch/core/schema-hsqldb.sql")
				.addScript("classpath:schema-partner.sql")
				.setType(EmbeddedDatabaseType.HSQL)
				.build();
	}
 
}

All we need here is our DataSource and the small annotation @EnableBatchProcessing. If you’re familiar with Spring Batch, you know that the minimum for running jobs is a PlatformTransactionManager, a JobRepository and a JobLauncher, adding a DataSource if you want to persist job meta data. All we have right now is a DataSource, so what about the rest? The annotation @EnableBatchProcessing is creating those component for us. It takes the DataSource and creates a DataSourceTransactionManager working on it, it creates a JobRepository working with the transaction manager and the DataSource, and it creates a JobLauncher using the JobRepository. In addition it registers the StepScope for usage on batch components and a JobRegistry to find jobs by name.
Of course you’re not always happy with a DataSourceTransactionManager, for example when running inside an application server. We’ll cover that in a future post. The usage of the StepScope will be covered in a future post as well.
I left out two new components that are registered in the application context as well: a JobBuilderFactory and a StepBuilderFactory. Of course we may autowire all of those components into other Spring components, and that’s what we’re gonna do now in our job configuration with the JobBuilderFactory and the StepBuilderFactory.

Job configuration

@Configuration
public class FlatfileToDbJobConfiguration {
 
	@Autowired
	private JobBuilderFactory jobBuilders;
 
	@Autowired
	private StepBuilderFactory stepBuilders;
 
	@Autowired
	private InfrastructureConfiguration infrastructureConfiguration;
 
	@Bean
	public Job flatfileToDbJob(){
		return jobBuilders.get("flatfileToDbJob")
				.listener(protocolListener())
				.start(step())
				.build();
	}
 
	@Bean
	public Step step(){
		return stepBuilders.get("step")
				.<Partner,Partner>chunk(1)
				.reader(reader())
				.processor(processor())
				.writer(writer())
				.listener(logProcessListener())
				.build();
	}
 
	@Bean
	public FlatFileItemReader<Partner> reader(){
		FlatFileItemReader<Partner> itemReader = new FlatFileItemReader<Partner>();
		itemReader.setLineMapper(lineMapper());
		itemReader.setResource(new ClassPathResource("partner-import.csv"));
		return itemReader;
	}
 
	@Bean
	public LineMapper<Partner> lineMapper(){
		DefaultLineMapper<Partner> lineMapper = new DefaultLineMapper<Partner>();
		DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
		lineTokenizer.setNames(new String[]{"name","email"});
		lineTokenizer.setIncludedFields(new int[]{0,2});
		BeanWrapperFieldSetMapper<Partner> fieldSetMapper = new BeanWrapperFieldSetMapper<Partner>();
		fieldSetMapper.setTargetType(Partner.class);
		lineMapper.setLineTokenizer(lineTokenizer);
		lineMapper.setFieldSetMapper(fieldSetMapper);
		return lineMapper;
	}
 
	@Bean
	public ItemProcessor<Partner,Partner> processor(){
		return new LogItemProcessor();
	}
 
	@Bean
	public ItemWriter<Partner> writer(){
		JdbcBatchItemWriter<Partner> itemWriter = new JdbcBatchItemWriter<Partner>();
		itemWriter.setSql("INSERT INTO PARTNER (NAME, EMAIL) VALUES (:name,:email)");
		itemWriter.setDataSource(infrastructureConfiguration.dataSource());
		itemWriter.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Partner>());
		return itemWriter;
	}
 
	@Bean
	public ProtocolListener protocolListener(){
		return new ProtocolListener();
	}
 
	@Bean
	public LogProcessListener logProcessListener(){
		return new LogProcessListener();
	}
}

Looking at the code you’ll find the ItemReader, ItemProcessor and ItemWriter definition identical to the XML version, just done in Java based configuration. I added two listeners to the configuration, the ProtocolListener and the LogProcessListener.
The interesting part is the configuration of the Step and the Job. In the Java DSL we use builders for building Steps and Jobs. Since every Step needs access to the PlatformTransactionManager and the JobRepository, and every Job needs access to the JobRepository, we use the StepBuilderFactory to create a StepBuilder that already uses the configured JobRepository and PlatformTransactionManager, and we use the JobBuilderFactory to create a JobBuilder that already uses the configured JobRepository. Those factories are there for our convenience, it would be totally okay to create the builders ourselves.
Now that we have a StepBuilder, we can call all kinds of methods on it to configure our Step, from setting the chunk size over reader, processor, writer to listeners and much more. Just explore it for yourself. Note that the type of the builder may change in your builder chain according to your needs. For example, when calling the chunk method, you switch from a StepBuilder to a parameterized SimpleStepBuilder<I,O>, because from now on the builder knows that you want to build a chunk based Step. The StepBuilder doesn’t have methods for adding a reader or writer, but the SimpleStepBuilder has those methods. Because the SimpleStepBuilder is typesafe regarding the item type, you need to parameterize the call to the chunk method, like it is done in the example with the item type Partner. Normally you won’t notice the switching of builder types when constructing a builder chain, but it’s good to know how it works.
The same holds for the JobBuilder for configuring Jobs. You can define all kinds of properties important for the Job, and you may define a Step flow with multiple Steps, and again, according to your needs, the type of the builder may change in your builder chain. In our example we define a simple Job with one Step and one JobExecutionListener.

Connecting infrastructure and job configuration

One more thing about the job configuration: we need the DataSource in the JdbcBatchItemWriter, but we defined it in the infrastructure configuration. That’s a good thing, because it is very low level, and of course we don’t want to define something like that in the job configuration. So how do we get the DataSource? We know that we’ll start the application context with an infrastructure configuration and one or more job configurations, so one option would be to autowire the DataSource directly into the job configuration. I didn’t do that, because I believe that minimizing autowire magic is one important thing in the enterprise world, and I could do better. Instead of injecting the DataSource I injected the InfrastructureConfiguration itself, getting the DataSource from there. Now it’s a thousand times easier to understand where the DataSource comes from when looking at the job configuration. Note that the InfrastructureConfiguration is an interface and we don’t bind the job configuration to a certain infrastructure configuration. Still there’ll be only two or three implementations, and it’s easy to see which one is used under which circumstances.

Fault-tolerant steps: skipping and retrying items

If you want to use skip and/or retry functionality, you’ll need to activate fault-tolerance on the builder, which is done with the method faultTolerant. Like explained above, the builder type switches, this time to FaultTolerantStepBuilder, and a bunch of new methods appear, like skip, skipLimit, retry, retryLimit and so on. A Step configuration may look like this:

	@Bean
	public Step step(){
		return stepBuilders.get("step")
				.<Partner,Partner>chunk(1)
				.reader(reader())
				.processor(processor())
				.writer(writer())
				.listener(logProcessListener())
				.faultTolerant()
				.skipLimit(10)
				.skip(UnknownGenderException.class)
				.listener(logSkipListener())
				.build();
	}

Conclusion

The Spring Batch XML namespace for configuring jobs and steps is a little bit more concise than its Java counterpart, that’s a plus on that side. The Java DSL has the advantage of type-safety and the perfect IDE support regarding refactoring, auto-completion, finding usages etc. So you may say it’s just a matter of taste if you pick this one or the other, but I say it’s more than that.
90 % of all batch applications reside in the enterprise, big companies like insurances or financial services. Batch applications are at the heart of their business, and they are business critical. Every such company using Java for batch processing has its own little framework or library around solutions like Spring Batch to adapt it to its needs. And when it comes to building frameworks and libraries, Java based configuration is way ahead of XML, and here are some of the reasons:

  • We want to do some basic configurations in the framework. People add a dependency to our framework library and import those configurations according to their needs. If these configurations were written in XML, they would have a hard time opening them to look what they are doing. No problem in Java. Important topic for transparency and maintainability.
  • There’s no navigability in XML. That may be okay as long as you don’t have too many XML files and all of them are in your workspace, because then you can take advantage of the Spring IDE support. But a framework library usually should not be added as a project to the workspace. When using Java based configuration you can perfectly jump into framework configuration classes. I will talk more about this subject in a following blog post.
  • In a framework you often have requirements the user of the library has to fulfil in order to make everything work, for example the need for a DataSource, a PlatformTransactionManager and a thread pool. The implementation doesn’t matter from the perspective of the framework, they just need to be there. In XML you have to write some documentation for the users of framework, telling them they need to add this and this and this Spring bean under this name to the ApplicationContext. In Java you just write an interface describing that contract, and people using the library implement that interface and add it as a configuration class to the ApplicationContext. That’s what I did with the interface InfrastructureConfiguration above, and I will talk more about it in a future post.

All these advantages become even more important when there’s not only one common library but a hierarchy of libraries, for example one for the basic stuff and then one for a certain division. You really need to be able to navigate through everything to keep it understandable. And Java based configuration makes it possible.

Kommentare

  • chris

    4. June 2013 von chris

    Great article, very well explained even for people not familiar with spring batch.

    So great to see this Java DSL coming !
    I hope I’ll have time to try the 2.2 version asap.

    Also, eager to see the post on StepScope as we heavily rely on this to inject jobparameters into reader/processor/writer !

  • C

    In the past it was possible to use the MapJobRepositoryFactoryBean.

    Do you have any idea how to skip the HSQL database and use this class instead? We don’t want to keep track of the job records so we don’t want to have these records in our database.

    I’ve not been able to get this working with the JavaConfig way.

  • Clément H

    Great serie! Thanks lot for writing it only 1 month after the SB 2.2.0 releas!

    Nevertheless, I’m sceptical about the Java DSL, because the XML namespace is a really good entry point to understand what a batch is about.

    • Tobias Flohre

      5. July 2013 von Tobias Flohre

      I agree that at first glance the XML namespace is a little bit more intuitive, as it reflects the job-step-chunk-hierarchy in a concise way. Part of it is probably just customisation. Anyway, I think the advantages of JavaConfig outweigh that one advantage of the XML namespace, especially when you have a small company-specific framework around Spring Batch.

  • Haidar

    How to run the a batch from CommandLineJobRunner since there is no applicationContext.xml file ? The first argument of the CommandLineJobRunner is the path to the applicationContext.xml or job.xml definition file.

  • Ibai

    Awesome post. Articles like this one lets people learn on subjects really fast. Thank you for making my life more easy.

  • Smita

    InfrastructureConfiguration is not getting autowired in FlatfileToDbJobConfiguration in my project

  • Smita

    I have a problem in @Autowired annotation. My Spring batch project is not able to deploy on LINUX server. but it is working properly on my Local sever. It is showing bean creation exception for the Interface which is autowired in writer class. Implementing class is defined with @Component annotation. All configurations are working on Local server but not on Linux. All required jars are also there. If somebody have knowledge about this issue please help. Configuration is in XML. Its giving error only on Interfaces.

Comment

Your email address will not be published. Required fields are marked *