Kofax Transformation Modules (KTM): ‘free-form recognition’ for handwritten numbers

1 Comment

In contrast to form based recognition, the free-form recognition tries to find certain values (like an insurance number) somewhere on a document. It is helpful if the searched value has a structure that can be found with regular expressions. Furthermore key words are often used for the search. These key words are located ‘near’ the searched values (for example ‘insurance number’, ‘ins nbr’, …)

Most of the established classification/extraction products offer this kind of tools. With machine printed text all of them will deliver sufficient results.

At our customers we are using the Kofax product Kofax Transformation Modules (KTM) for document classification and data extraction. The KTM tools for free-form recognition with machine printed text are the so called ‘format locators’. You can read about them in former KTM blog articles (1).

This article will decribe how to find handwritten numbers that have a certain structure somewhere on a document.

In this example we are searching handwritten insurance numbers on a document. These numbers have the following structure: 1x-xxxxxx-xx. The x represents a character between 0 and 9, example: 14-386723-89.

This is the example document, which will be used in our KTM project:


Within the KTM project you first have to classify the example document to the appropriate document class (in our example to the class ‘InsuranceDocs’). This can be done with any of the available KTM classification methods (see also: Document classification with KTM).

A field ‘InsuranceNumber’ and a locator ‘Numbers’ (Advanced Zone Locator) should be added to the document class ‘InsuranceDocs’:


This is the base idea behind ‘free-form recognition’ for handwritten numbers:

  1. The Advanced Zone Locator reads the text of the page by sizing its zone large enough to cover the entire page (or at least the region where the handwritten numbers may occur).
  2. From experience the RecoStar Engine reads numerical characters better than the FineReader Engine. Therefore RecoStar is used within the Advanced Zone Locator with a numerical recognition profile [0-9-].
  3. The result of the Advanced Zone Locator will be a string consisting of numerical characters and -.
  4. Within the script of the document class ‘InsuranceDocs’ the result string will be examined for insurance numbers using regular expressions.
  5. If possible the found insurance number should be checked against an inventory database and finally put into the extraction field ‘InsuranceNumber’.


Setup of the Advanced Zone Locator

Draw the zone on the region of the example page, where the handwritten numbers may occur:


Set the zone recognition profile to a RecoStar zone engine with these settings:


Remove the checkmark at ‘Registration failure makes zone invalid’, as registration will always fail with unstructured documents, and we want to keep the result in any case:


Testing of the Advanced Zone Locator will show this result:


At first this looks somewhat messy, but in the fourth line from bottom, the desired insurance number shows up. Now this number still has to be extracted from the result string.


Extraction of the insurance number by scripting

Exemplarily we are using the event ‘Document_AfterProcess’ in the script of document class ‘InsuranceDocs’, to extract the insurance number out of the result string of the Advanced Zone Locator by using regular expressions.

First of all the library ‘Microsoft VBScript Regular Expressions 5.5’ has to be added as reference to the script:


This Microsoft library enables your scripting to search with regular expression in string variables (Microsoft VBScript Regular Expressions 5.5 Description).

The actual KTM scripting finally looks like:

Option Explicit

' Class script: InsuranceDocs

Private Sub Document_AfterProcess(ByVal pXDoc As CASCADELib.CscXDocument)
   Dim String_RecoStar As String
   Dim myRegExp As RegExp
   Dim myMatches As MatchCollection
   Dim myMatch As Match
   Dim InsNbr_Recostar As String

   Set myRegExp = New RegExp

   'get the first alternative from the advanced zone locator

   myRegExp.IgnoreCase = True
   myRegExp.Global = True
   'define the regular expression for the insurance numbers
   myRegExp.Pattern = "1(1|2|3|4|5|6|7|8|9)\s?\-\s?\d{6}\s?\-\s?\d{2}"

   Set myMatches = myRegExp.Execute(String_RecoStar)
   If myMatches.Count>0 Then 'if something was found:
      'we just take the first result in this example...
      InsNbr_Recostar=Replace(myMatches.Item(0)," ","") 'get rid of spaces
      If DB_Check(InsNbr_Recostar)=True Then 'if possible validate the number against a database
         'put the value into the InsuranceNumber field
      End If
   End If
End Sub

Function DB_Check(Number As String) As Boolean
   DB_Check=True 'just return True in this example
   'Implement the database validation of the extracted insurance number
End Function

Processing the example document in KTM project builder will finally produce a result like this:



(1) More codecentric blog articles about KTM:

KTM and insurance companies: Document Process Automation

Document classification with Kofax Transformation Modules (KTM)

Kofax Transformation Modules – format locators and dynamic regular expressions – Part 2

Kofax Transformation Modules – format locators and dynamic regular expressions



  • Jürgen Voss

    August 14, 2015 von Jürgen Voss

    Addition for KTM versions >= 5.5.0:

    When using these KTM versions the mixed print recognition profile will make life easier.

    Just set the full page OCR to the mixed print recognition and the OCR result will contain machine and hand printed data. Now you can use standard format locators and their regular expression to extract the insurance number.


Your email address will not be published. Required fields are marked *