Many of our customers are using systems for automatic document classification and data extraction. ‘Kofax Transformation Modules’ (KTM) is one of these systems. These data capturing systems extract metadata out of the electronic images (these are the scanned pages of the documents, faxes or emails) and release the data and the document to business applications.
In this article I will explain the different ways of document classification within KTM.
Up to now, two other articles about KTM were published in the codecentric blog:
Kofax Transformation Modules – format locators and dynamic regular expressions – Part 1
Kofax Transformation Modules – format locators and dynamic regular expressions – Part 2
Before data can be extracted out of a document, KTM needs to know the type of the document. Invoices have to be treated different than for example insurance contracts. You want to extract invoice number, invoice date and amounts from an invoice but the insurance number and the insurance class from the contract.
Kofax Capture Advanced Scan Api: A first approach
The following article shows a possible use case in using the new scan api, coming with SP1 of Kofax Capture 10.
A sample application can be found at the Source directory of the Kofax Capture installation under …\Source\Sample Projects\StdCust\ScanApiSamplePanel.
The business case we want to solve is described as follows:
We want to scan duplex and for document separation we use patch codes. For a following custom module we need to flag all pages with a CustomStorageString. If the tiff was grabbed by the front camera we want to set its value to “0” and to “1” if the tiff was grabbed by the rear camera.
Part 2: Dynamic regular expressions in KTM
In the first part of this blog article I explained the use of KTM format locators and regular epressions. Now I will try to explain how flexible KTM projects can be designed by using the KTM internal scripting language. But you should be familiar with KTM’s scripting language and the KTM object model.
KTM format locators (see part 1) are static expressions, when they have been defined in the KTM Project Builder. They are used with their defined values within the Kofax Capture workflow during runtime.
But there might be the – admittedly very rare – case, that you have to change the regular expression of a format locator during runtime, because of general conditions. Unfortunately this doesn’t work ‘out of the box’. But within the rich building set of KTM there is a library which will enable this functionality.