Overview

Kofax Capture – Document Separation and Barcodes

No Comments

A well known approach to separate documents at scan time is the use of barcode labels on the first page of a document. The barcode may also be put on a single separator sheet. If a batch of documents is scanned by Kofax Capture, the barcode will be recognized and the start of a new document is specified.

This approach has been used for several years, but it has two weak points:

1. If the recognition of a barcode fails, the pages of this document are attached to the previous document. So this document is lost. The reason for the failed recognition may be a barcode that was destroyed or blackened

2. Up to Kofax Capture 9 the barcode value could not be used as usable data for the separation.

The only criteria were the barcode type (Code39, Interleaved 2 of 5, Code 128, …) and the minimum length of the barcode (number of characters). For this reason, ‘zombie’ documents might have been built. If one of the following pages of a document had also a barcode label, which suited the criteria (barcode type, length), this barcode was taken as a separator barcode too and the document was splitted in two documents. This happens sometimes as the creators of a document may use their own barcode labels on the document too. The only way to avoid these ‘zombie’ documents is to fine-tune the barcode engine (width of the bars, height of the bars) or by manual programming, which both may be cumbersome.

With weak point 1. (barcode recognition fails) you have to do a visual check of the document separation after scanning. In Kofax Capture the adjustment of documents/pages can be done directly within the scanning application.

With weak point 2. (‘zombie’ documents by external barcodes) Kofax Capture 10 offers a way to use the barcode value for document separation. You now may define regular expressions, which describe the allowed barcode values.

For examples the regular expression \b(93|92)\d{7 }\b describes values, which start with 93 or 92 followed by seven numeric characters. \\b at the start and the end of the expression makes sure, that the value is enclosed by a white space (space character, tab character, beginning/end of line, …). So \\b makes sure, that the expression (93|92)\d{7 } is not taken out of a longer character combination, for example out of a product number.

‘ 931234567 ‘ would be ok, but ‘4700000931234567’ would not be considered.

In Kofax Capture 10 the needed regular expression can be defined within the properties of the batch class. Within the settings of ‘Separation and Form Identification’ you have to create a user defined separation profile:

KC10-Trenn85E

This profile looks like:

BC10-BC-RegExE

‘BC-Typ’ defines the type of the used barcode(Code 39, Interleaved 2 of 5, Code 128, …).

In the field ‘Search Text:’ you have to define the regular expression. You still have to check ‘Treat search text as regular expression’. By this approach you will get rid of weak point 2, the ‘zombie’ documents, which were created by external barcodes.

But there is still another scenario, in which the regular expression doesn’t help directly. In some applications, for example late scanning with barcodes in a SAP capturing system, the barcode value is not only used for document separation, but the barcode value has to be exported to the business application.

Unfortunately Kofax Capture doesn’t provide the value of the separation barcode for further processing directly. For this purpose Kofax Capture provides the construct of ‘Page Level Barcode’. This construct provides ALL barcodes values that were recognised on the document page for further processing. If only one barcode exists, this is of course the separation barcode.

If there are several barcodes on the page, all recognised values are made available for processing in the Kofax Capture workflow. By using a script in Kofax Capture (for example a validation script) a verification of the barcodes with the regular expression can be done, the separation barcode can be identified and routed to the next processing step.

A scripting example may be provided in a next blog article, if desired.


Other blog articles about Kofax Capture and Kofax Transformation Modules:

Kofax Capture – Customisation beyond standard features

Kofax Capture Advanced Scan Api: A first approach

Kofax Transformation Modules – format locators and dynamic regular expressions

Kofax Transformation Modules – format locators and dynamic regular expressions – Part 2

Document classification with Kofax Transformation Modules (KTM)

KTM and insurance companies: Document Process Automation

IBM Content Collector for SAP (formerly known as IBM CommonStore for SAP), Kofax Capture 10 and the IBM CommonStore Release Script

Comment

Your email address will not be published. Required fields are marked *