Overview

Transactions in Spring Batch – Part 1: The Basics

25 Comments

This is the first post in a series about transactions in Spring Batch, you find the second one here, it’s about restarting a batch, cursor based reading and listeners, and the third one here, it’s about skip and retry.

Transactions are important in almost any application, but handling transactions in batch applications is something a little more tricky. In standard online applications you usually have one transaction for one user action, and as a developer you normally just have to assure that your code picks up an existing transaction or creates a new one when there’s none (propagation type REQUIRED). That’s it. Developers of batch applications have much more headaches with transactions. Of course you cannot have just one transaction for the whole batch, the database couldn’t cope with that, so there have to be commits somewhere in between. A failed batch then doesn’t mean you get the unchanged data back, and when you throw in features like restarting a failed batch, retrying or skipping failing items, you automatically get a complicated transaction behaviour. Spring Batch offers the functionality just mentioned, but how does it do that?

Spring Batch is a great framework, and there is a lot of documentation and some good books, but after reading a lot about Spring Batch I still wasn’t sure about everything regarding transactions, so in the end all that helped to understand everything was looking into the code and a lot of debugging. So, this is no introduction to Spring Batch, I’m gonna focus just on transactions, and I assume that you’re familiar with transactions in Spring (transaction managers, transaction attributes). And since I have to restrict myself a little bit, I will just talk about one-threaded chunk oriented processing.

Chunk oriented steps

Let’s start with a picture that will follow us throughout this and the following blog posts, only changed in little details every now and then to focus on a certain subject.
Chunk oriented steps
It’s already telling a lot about Spring Batch and its transactional behaviour. In chunk-oriented processing we have ItemReaders reading items, one after the other, always delivering the next one item. When there are no more items, the reader delivers null. Then we have optional ItemProcessors taking one item and delivering one item, that may be of another type. Finally we have ItemWriters taking a list of items and writing them somewhere.
The batch is separated in chunks, and each chunk is running in its own transaction. The chunk size actually is determined by a CompletionPolicy, as you can see in the illustration at (1): when the CompletionPolicy is fulfilled, Spring Batch stops reading items and starts with the processing. By default, if you use the commit-interval attribute on chunk, you get a SimpleCompletionPolicy that is completed when the number of items you specified in the attribute is read. If you want something more sophisticated you can specify your own CompletionPolicy in the attribute chunk-completion-policy.
This is all quite straight forward, if there’s a RuntimeException being thrown in one of the participating components, the transaction for the chunk is rolled back and the batch fails. Every already committed chunk of course stays in the processed state.

Business data and batch job data

As you might know already, Spring Batch brings a set of database table definitions. These tables are used to store data about the jobs and steps and the different job and step execution contexts. This persistence layer is useful for some kind of history on the one hand, and for restarting jobs on the other hand. If you’re thinking of putting these tables in a different database than your business data: don’t. The data stored there is about the state of the job and the steps, with numbers of processed items, start time, end time, a state identifier (COMPLETED, FAILED and so on) and much more. In addition there is a map for each step (the step execution context) and job (the job execution context) which can be filled by any batch programmer. Changes in this data have to be in line with the transaction running on our business data, so if we have two databases we’ll need for sure a JtaTransactionManager handling different DataSources, suffering in performance as well. So, if you have a choice, put those tables near to your business data. In the following diagram you can see where in the processing step and job data is persisted. As you can see, it doesn’t happen only inside the chunk transaction, for good reasons: we want to have step and job data persisted in the case of a failure, too.

Note that I use little numbers for indicating items that are explained in a text box. The numbers stay in following versions of the diagram while the text box may disappear due to readability. It’s always possible to look up the explanation in a previous version of the diagram.

A failed batch

Until now, the diagram just includes successful processing. Let’s take a look at the diagram including a possible failure.

If you didn’t configure skip or retry functionality (we’ll get to that in the next blog posts) and there’s an uncaught RuntimeException somewhere in an element executed inside the chunk, the transaction is rolled back, the step is marked as FAILED and the whole job will fail. Persisting step data in a separate transaction at (5) makes sure that the failure state gets into the database.
When I say that an uncaught RuntimeException causes the rollback, then it’s not quite true for every case. We have the option to set no-rollback-exceptions:

<batch:tasklet>
  <batch:chunk ... />
  <batch:no-rollback-exception-classes>
    <batch:include class="de.codecentric.MyRuntimeException"/>
  </batch:no-rollback-exception-classes>
</batch:tasklet>

Transaction attributes

One more thing for today: if you don’t configure transaction attributes explicitly, you get the defaults. Transaction attributes are propagation type, isolation level and timeout, for example. You may specify those attributes as shown here:

<batch:tasklet>
  <batch:transaction-attributes isolation="READ_COMMITTED" propagation="REQUIRES_NEW" timeout="200"/>
  <batch:chunk reader="myItemReader" writer="myItemWriter" commit-interval="20"/>
</batch:tasklet>

If you don’t specify them, you’ll get the propagation type REQUIRED and the isolation level DEFAULT, which means that the default of the actual database is used. Normally you don’t want to change the propagation type, but it makes sense to think about the isolation level and check the batch job: am I fine with non-repeatable reads? Am I fine with phantom reads? And: what other applications are accessing and changing the database, do they corrupt the data I’m working on in a way that causes trouble? Is there a possibility to get locks? For more information on the different isolation levels check this wikipedia article.

Conclusion

In this first article on transactions in Spring Batch I explained the basic reader-processor-writer cycle in chunk oriented steps and where the transactions come into play. We saw what happens when a step fails, how to set transaction attributes and no-rollback-exception-classes and how job and step metadata is updated.
Next on the list will be restart, retry and skip functionality: what are the preconditions? How does the transaction management work with these features? Click here for the next blog post in this series about restart, cursor based reading and listeners, and here for the third post about skip and retry.

Kommentare

  • Hi Tobias,

    Great Article! Thanks!

    Chunk related query:

    I have a simple spring batch job to read data from file and insert into database.

    For e.g. I have a file with 200 records.

    I have set commit-interval=100.

    The first chunk of 100 gets committed with no errors and say we encounter some sort of error in the second chunk of 100 records, Is it possible to undo the first 100 records commit?

    • Tobias Flohre

      29. September 2012 von Tobias Flohre

      A commit is a commit, if you really wanna undo things, you have to do it manually. Normally it’s better to make your job restartable. If you’re using FlatFileItemReader, it supports restarts that continue with the first item that hasn’t been processed in the first (failed) try. You may also want to take a look at the second post in this series where I talk about restarting jobs.

      • Thanks! I thought we should be able to use ‘chunk-completion-policy’ somehow. As per our requirement either the whole bunch data is committed or not.

        • Tobias Flohre

          30. September 2012 von Tobias Flohre

          You could do that, though I wouldn’t recommend that. Long transactions are always something to avoid. But if you just have a few items in your job you probably won’t run into problems.

          • 1. October 2012 von msns3ka

            I see your point. Well we are expecting around 8000 lines to be processed. Will will go with 10,000 as the commit interval.

            If the above doesnt suit us, then will go with job re-start mechanism. The issue we dont want any manual intervention for re-starting jobs but if it is the only solution then will go with it 🙂

            Thanks for your tips! Nice Article!

            Just wondering, if you have ever having multiple loggers for various job within a single Spring batch application?

            http://forum.springsource.org/showthread.php?130629-Multiple-Loggers-for-Spring-batch-application-with-3-different-jobs

            You dont have to reply here as it is unrelated for this page topic 🙂 but if you have any ideas just update the above forum. Thanks Very Much.

  • 21. December 2012 von benmasi

    Hi, very good article!
    Some questions about transaction:

    1- if we use two different databases, one for Spring Batch metadata and one for business data (just for reading by a JDBCCursorItemReader).
    Do we need XATransactions as you mention? I do not understand because these are accessed by two different transactions isn’t it?

    2- what is the configuration, if we use so two DB as in my first question on an application server? We have to declare 2 JNDI database resources in the server (XA datasources?), retrieve them in spring, and use the JBoss transaction mangager for both (JobRepository and Step involving the resources)?

    I am looking forward to your response, thanks in advance

    • Tobias Flohre

      21. December 2012 von Tobias Flohre

      Hi!

      1) Meta data and business data are accessed by the same transaction. That’s necessary for updating data in the execution context. For example: the JdbcCursorItemReader stores the read count in the execution context for restartability (so that it knows how many items to skip on restart because they are already processed). So when the business data causes a rollback, that counter needs to be rolled back as well.
      So, yes, you need XA transactions.
      2) You answered that question yourself :-). Declare two DataSources in the server, retrieve them via JNDI in your application context, and use a JtaTransactionManager accessing the JBoss transaction manager.

  • 7. January 2013 von benmasi

    Hi, thanks for your answer.
    But I am not sure to well understand. 🙁

    1) You say in your answer that “Meta data and business data are accessed by the same transaction” so i “need XA transactions”.

    But in your second Spring Batch article “Transactions in Spring Batch – Part 2: Restart, cursor based reading and listeners”, in the “Cursor based reading” paragraph, you say that “Spring Batch’s JdbcCursorItemReader uses a separate connection for opening the cursor, thereby bypassing the transaction managed by the transaction manager”.

    So where is the truth? Is there well 2 different transactions? If yes, why would i need XA transactions?

    2) In the same paragraph regarding the “Cursor based reading”, you say “In an application server environment we have to do a little bit more to make it work. Normally we get connections from a DataSource managed by the application server, and all of those connections take part in transactions by default. We need to set up a separate DataSource which does not take part in transactions, and only inject it into our cursor based readers. Injecting them anywhere else could cause a lot of damage regarding transaction safety.”
    What do you mean by “set up a separate Datasource”? Is it in the case we use only one datasource for business data and metadata? And what do you mean by ‘injecting them anywhere else’?
    Is the configuration I suggest in the second question of my first post (and you confirmed in your previoust post) in adequation with this?

    Thanks in advance for your responses!

    • Tobias Flohre

      8. January 2013 von Tobias Flohre

      Hi!
      1) In the case that you are using the JdbcCursorItemReader it bypasses the transaction just for reading. When you are writing maybe later in a writer, the writing is done inside the transaction. There is only one transaction for Meta and business data, only the reading done by the JdbcCursorItemReader will be outside the transaction.
      2) The configuration you suggest is not in adequation with this. In your case you need another data source for your business data that’s explicitly set to non-transactional. You inject this data source only into the JdbcCursorItemReader. In all other occasions (for example a writer) you inject the normal transactional data source for business data. This way you have an XA transaction spanning the writer and the meta data, and non-transactional data access for the reader.

      • 8. January 2013 von benmasi

        Hi!

        So in this configuration:
        – one Job with one Step
        – one database for Spring Batch metadata (db1)
        – the Step has a JDBCCursorItemReader which reads from business data from an other database (db2)
        – the Step writes business data in a file via FlatFileItemWriter

        I understand that I do not need XA transactions, right? (since only one database is transactionnal)

        But, if i use a database writer instead of the FlatFileItemWriter, in this case, I need XA transactions, is it right? (since the two databases are in the chunk transaction)

        • Tobias Flohre

          8. January 2013 von Tobias Flohre

          Right!
          But check if you need to make the business data source non-transactional. I’m working on Websphere and we need to do that, because otherwise the data source automatically takes part in the transaction when using a JtaTransactionManager (that uses the jBoss transaction manager underneath). I guess it’s the same on JBoss.

  • 20. August 2013 von Stefan Haberl

    Hi Tobias,

    There’s a small configuration error in your last example. The transaction attributes have to be defined on the tasklet rather than the chunk itself.:

  • 17. December 2013 von Binh Thanh Nguyen

    Thanks, nice post

  • 3. February 2014 von Mauro Molinari

    Very useful article, thank you! I think this information should really be included in the reference documentation of Spring Batch.

    By the way, a little typo in your last example:
    writer=”myItemReader” (I guess you rather meant “myItemWriter”).

  • 22. February 2014 von Eugene Kogan

    Fantastic article!!! Thank you!

  • Hi

    This is a chunk related query
    I have a Dynamic Commit interval which I have achieved by using completion policy.

    I wanted to rollback the complete chunk when an exception is thrown and proceed to the next chunk automatically.For example I have 3 chunks.The first chunk has 2 records and the same is processed sucessfuly.The second chunk has 3 records,the first record in the second chunk processed succesfully .The 2nd record in the second chunk throws an exception now I want to rollback the second chunk completely and proceed to 3 chunk automatically without any manual intervention.Can some help in throwing some light for this issue

  • 18. August 2014 von Vince Zamora

    Hi Tobias,
    very good article!
    I have a question:
    I have a job with two steps, in the first step I’m saving information in one table. The second step depends on the first to save information in another table. I need to rollback the first step in case that the second step fails. How can I do that?
    I’ll appreciate your help!

    • Tobias Flohre

      19. August 2014 von Tobias Flohre

      There’s no way to rollback automatically, since data from the first step is committed. You can clean it up manually (for example in a StepExecutionListener for step 2 getting active on failure). Another possibility would be to combine the two steps into one doing the update in both tables. And, third possibility: you save the information in step 1 into a staging table you either may throw away on failure of step 2 or take over to the real table on success of step 2.
      Which solution to choose depends on your use case.

  • 15. June 2015 von Kiran

    Hi,

    I am new to spring batch. I have few questions.

    1. What if I want open a new transaction in the processor step? can the dat will be committed or rollback in that new transaction method?

    2. What I have to do, if I want to rollback only those records which are failed in that chunk?
    for example , if my chunk size is 50 records, if 49 are success and only 1 record failed , I don’t want to rollback 49 records, rather I want to do only for failed record

    • Tobias Flohre

      17. June 2015 von Tobias Flohre

      Hi,

      1. Spring Batch does the transaction management for you, why would you want to open a new transaction in a processor? There’s already one running. If you really open a new transaction there (propagation method REQUIRES_NEW), everything done in that transaction won’t be rolled back by Spring Batch.
      2. A transaction is a transaction, you can only commit or rollback the whole transaction. But you can use the skip functionality explained in the third blog post of this series for achieving something similar.

      • 17. June 2015 von Kiran

        Thanks for your reply. We have some business scenario’s where we want to do some DB operations in a new transaction, so that dirty entity in the current transaction should not be affected.

        • Tobias Flohre

          17. June 2015 von Tobias Flohre

          There are very few scenarios I can think of that make sense. Maybe not any except doing a protocol of a rollback.
          Let’s assume you have a chunk size of 50, so you get one Spring Batch managed transaction for 50 items. Now you open a new transaction for each item in the processor to do something, then item number 34 fails and the Spring Batch managed transaction is rolled back. Your 33 processor transactions are already committed and won’t roll back, and there you have your inconsistency.

Comment

Your email address will not be published. Required fields are marked *