Fork/Join and other Techniques to Improve Performance


In the last few years there has been nearly no improvement in single thread performance of CPUs. On the other hand, the number of cores increases: Laptops with eight cores are common (okay, including hyperthreading, only four real cores). Even modern smartphones often have four cores. To utilize these modern beasts, you need parallel programming.

In this article, I use a simple board game as an example for a parallel algorithm and other optimization techniques, a variant of peg solitaire. The problem to solve is: How many different solutions exist for a board with n pegs on a side? The focus is on different optimization techniques, not only the Fork/Join framework. You may be surprised to find other techniques are much more efficient for these problems.

Definition of the Problem

Lets start with a more precise definition of the problem. We play on a triangular board. A board with edge length 5 (n = 5) before any move has been done looks like this:

         x x
        x o x
       x x x x
      x x x x x

The middle peg of the third row is empty. A legal move is a jump over one peg in one of the six different directions. The leapfrogged peg is removed from the board. So the board could look like this after one move:

         x x
        x x x
       x o x x
      x o x x x

A solution is found when there is only one peg left, wherever it is located on the board. You get different results for different starting positions, see Dan O’Briens Puzzle Solution Page for some more information on the topic.

Given a Java class which can represent a position and which is capable to compute a list of all resulting positions after one move, the solver is a simple recursive function (source code as zip):

  long countSolutions(Board start) {
      if (start.isSolution()) {
          return 1;
      } else {
          long count = 0;
          for (Board board : start.nextPositions()) {
              count += countSolutions(board);
          return count;

When you feed it with the start board with edge length five, it takes about a tenth of a second and you can see there are 1,550 solutions for n = 5. A tenth of a second is a short time, so why optimize? Let’s see bigger values, e.g. n = 6. Takes a little bit longer. Much longer. Not as long as to compute 42, but about 30  hours  resulting in 29,235,690,234 (now it should be obvious why countSolutions() returns a long and not an int).

Why is there such a huge difference for a slightly larger board? Because the number of positions for a board of size n is 2^(n * (n+1)/2). The exponent is the number of holes/pegs on the board, which increases quadratically.


When you know the Java Fork/Join framework (otherwise read the fork/join tutorial), you should see the perfect match: In each recursion level, you can fork a thread for the list of next positions. Here is the code, first the initialization of the pool and the code for starting the computation:

  ForkJoinPool pool = new ForkJoinPool(numThreads);
  RecursiveSolver root = new RecursiveSolver(startBoard, sequential);
  solutions = pool.invoke(root);

Then the implementing class:

class RecursiveSolver extends RecursiveTask<Long> {
  private Board start;
  private int sequential;
  public RecursiveSolver(Board start, int sequential) {
    this.start = start;
    this.sequential = sequential;
  protected Long compute() {
    int card = start.cardinality();
    if (card == 1) {
       return Long.valueOf(1);
    } else if (card < sequential) {
       return Long.valueOf(countSolutions(start));
    } else {
      List<Board> nextPositions = start.nextPositions();
      List<Board> tasks = new ArrayList<>(nextPositions.size());
      for (Board b : nextPositions) {
        tasks.add(new RecursiveSolver(b, sequential));
      long count = 0;
      for (RecursiveSolver rs : tasks) {
        count += rs.join();
      return count;
    return Long.valueOf(0);

The recursion of the sequential algorithm has been replaced by the creation of new instances of RecursiveTask. I introduced another optimization (as proposed in the fork/join tutorial): The parallel algorithm switches back to a sequential one when there are less than sequential pegs left. This avoids the overhead of task creation for small problems. After some experiments I used eight as threshold in my test runs.


  • Ilias Tsagklis

    Hey nice article!

    Fork/join is quite a powerful framework for achieving parallelism in your code. You may also find a similar example (with benchmarks) here:

    Java Fork/Join for Parallel Programming

    • roger.butenuth

      6. November 2012 von roger.butenuth

      Yes, the article contains more Fork/Join details. But the example Fibonacci numbers is not the best one. You can compute them without recursion and without cache much faster. This is simple enough to implement on a programmable calculatur (25 year old) and will be faster than the parallel/recursive implementation (on current hardware).

  • Java dev

    One of the best example to learn fork join framework. Thanks a lot.


Your email address will not be published. Required fields are marked *