LANGUAGE

Kofax Transformation Modules (KTM): ‘free-form recognition’ for handwritten numbers

19.7.2015 | 4 minutes of reading time

In contrast to form based recognition, the free-form recognition tries to find certain values (like an insurance number) somewhere on a document. It is helpful if the searched value has a structure that can be found with regular expressions. Furthermore key words are often used for the search. These key words are located ‘near’ the searched values (for example ‘insurance number’, ‘ins nbr’, …)

Most of the established classification/extraction products offer this kind of tools. With machine printed text all of them will deliver sufficient results.

At our customers we are using the Kofax product Kofax Transformation Modules (KTM) for document classification and data extraction. The KTM tools for free-form recognition with machine printed text are the so called ‘format locators’. You can read about them in former KTM blog articles (1).

This article will decribe how to find handwritten numbers that have a certain structure somewhere on a document.

In this example we are searching handwritten insurance numbers on a document. These numbers have the following structure: 1x-xxxxxx-xx. The x represents a character between 0 and 9, example: 14-386723-89.

This is the example document, which will be used in our KTM project:

Within the KTM project you first have to classify the example document to the appropriate document class (in our example to the class ‘InsuranceDocs’). This can be done with any of the available KTM classification methods (see also: Document classification with KTM ).

A field ‘InsuranceNumber’ and a locator ‘Numbers’ (Advanced Zone Locator) should be added to the document class ‘InsuranceDocs’:

This is the base idea behind ‘free-form recognition’ for handwritten numbers:

The Advanced Zone Locator reads the text of the page by sizing its zone large enough to cover the entire page (or at least the region where the handwritten numbers may occur).
From experience the RecoStar Engine reads numerical characters better than the FineReader Engine. Therefore RecoStar is used within the Advanced Zone Locator with a numerical recognition profile [0-9-].
The result of the Advanced Zone Locator will be a string consisting of numerical characters and -.
Within the script of the document class ‘InsuranceDocs’ the result string will be examined for insurance numbers using regular expressions.
If possible the found insurance number should be checked against an inventory database and finally put into the extraction field ‘InsuranceNumber’.

Setup of the Advanced Zone Locator

Draw the zone on the region of the example page, where the handwritten numbers may occur:

Set the zone recognition profile to a RecoStar zone engine with these settings:

Remove the checkmark at ‘Registration failure makes zone invalid’, as registration will always fail with unstructured documents, and we want to keep the result in any case:

Testing of the Advanced Zone Locator will show this result:

At first this looks somewhat messy, but in the fourth line from bottom, the desired insurance number shows up. Now this number still has to be extracted from the result string.

Extraction of the insurance number by scripting

Exemplarily we are using the event ‘Document_AfterProcess’ in the script of document class ‘InsuranceDocs’, to extract the insurance number out of the result string of the Advanced Zone Locator by using regular expressions.

First of all the library ‘Microsoft VBScript Regular Expressions 5.5’ has to be added as reference to the script:

This Microsoft library enables your scripting to search with regular expression in string variables (Microsoft VBScript Regular Expressions 5.5 Description ).

The actual KTM scripting finally looks like:

1Option Explicit
2
3' Class script: InsuranceDocs
4
5Private Sub Document_AfterProcess(ByVal pXDoc As CASCADELib.CscXDocument)
6   Dim String_RecoStar As String
7   Dim myRegExp As RegExp
8   Dim myMatches As MatchCollection
9   Dim myMatch As Match
10   Dim InsNbr_Recostar As String
11
12   Set myRegExp = New RegExp
13
14   'get the first alternative from the advanced zone locator
15   String_RecoStar=Trim(pXDoc.Locators.ItemByName("Numbers").Alternatives(0).SubFields.ItemByName("UF_Zone0").Text)
16
17   myRegExp.IgnoreCase = True
18   myRegExp.Global = True
19   'define the regular expression for the insurance numbers
20   myRegExp.Pattern = "1(1|2|3|4|5|6|7|8|9)\s?\-\s?\d{6}\s?\-\s?\d{2}"
21
22   Set myMatches = myRegExp.Execute(String_RecoStar)
23   If myMatches.Count>0 Then 'if something was found:
24      'we just take the first result in this example...
25      InsNbr_Recostar=Replace(myMatches.Item(0)," ","") 'get rid of spaces
26      If DB_Check(InsNbr_Recostar)=True Then 'if possible validate the number against a database
27         'put the value into the InsuranceNumber field
28         pXDoc.Fields.ItemByName("InsuranceNumber").Text=InsNbr_Recostar
29         pXDoc.Fields.ItemByName("InsuranceNumber").Valid=True
30      End If
31   End If
32End Sub
33
34Function DB_Check(Number As String) As Boolean
35   DB_Check=True 'just return True in this example
36   'Implement the database validation of the extracted insurance number
37End Function

Processing the example document in KTM project builder will finally produce a result like this:

(1) More codecentric blog articles about KTM:

KTM and insurance companies: Document Process Automation

Document classification with Kofax Transformation Modules (KTM)

Kofax Transformation Modules – format locators and dynamic regular expressions – Part 2

Kofax Transformation Modules – format locators and dynamic regular expressions

Was this post helpful?

LANGUAGE

Likes

Blog author

Jürgen Voss

Do you still have questions? Just send me a message.

fromJürgen Voss

Spaß mit Flaggen: KTM – ein lockerer Rückblick auf 16 Jahre Kofax Transformation...

Anfang 2006 war ich bei DICOM beschäftigt, die einige Jahre zuvor Kofax gekauft hatten (ja, ich bin schon etwas älter). Da ich mit dem KTM-Vorgängerprodukt Ascent Advanced Forms schon einige Projekte erfolgreich durchgeführt hatte, durfte ich mich dann...

Digitalisierung

12.12.2022 | 2 Minuten Lesezeit

Jürgen Voss

Auslesen von deutschen Empfängeradressen mit Kofax Transformation Modules...

Das Auslesen von Adress-/Anschriftbereichen in Briefen war schon immer eine recht schwierige Problematik. Die Freude war umso größer, als Kofax vor einigen KTM-Versionen (Kofax Transformation Modules ) ein Werkzeug (Adress-Lokator) für das automatisierte...

NLP
Archivierung

7.3.2022 | 6 Minuten Lesezeit

Jürgen Voss

Natural Language Processing: Erweiterungen mit KTM 6.4

Im Frühjahr 2020 erhielt das Produkt Kofax Transformation Modules (KTM) mit dem Service Pack 6.3.1 ein neues Modul: Natural Language Processing (NLP). Natural Language Processing versucht, den Text des Dokuments zu analysieren, Wörter und deren Beziehungen...

Content Management
Archivierung
NLP

15.4.2021 | 2 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules: Natural Language Processing, sentiments ...

Kofax Transformation Modules (KTM) offers several tools for document classification and data extraction. There are some older blog articles about these tools: – Document classification – Data extraction with format locators – Machine Learning The...

Content Management
AI
Archiving
NLP

6.4.2020 | 8 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules: Natural Language Processing, Stimmungen ...

Kofax Transformation Modules (KTM) bietet diverse Werkzeuge, um Dokumente zu klassifizieren und Daten zu extrahieren. Diese Werkzeuge wurden bereits in früheren Blog-Artikeln erläutert: – Dokumentenklassifizierung – Datenextraktion mit Format-Lokatoren...

Content Management
NLP
Archivierung

16.3.2020 | 7 Minuten Lesezeit

Jürgen Voss

Document classification, data extraction and everything

Over time, a lot of posts about document classification and data extraction, using Kofax, among other products, have been published in the codecentric blog. This blog post will put these posts into context and point out the changes with regard to older...

Content Management
AI
Archiving

20.8.2019 | 6 Minuten Lesezeit

Jürgen Voss

Dokumentenklassifikation, Datenextraktion und der ganze Rest…

Im Laufe der Zeit gab es im codecentric-Blog viele Beiträge, die Dokumentenklassifikation und Datenextraktion zum Thema hatten. In diesem Beitrag möchte ich diese Artikel nochmal in einen Zusammenhang stellen und auf Neuerungen bei den älteren Beiträ...

Content Management
NLP
Archivierung

20.8.2019 | 7 Minuten Lesezeit

Jürgen Voss

Orientation problems with document processing (Kofax Transformation Modules...

Document classification and data extraction in business companies have to deal with paper documents, emails and faxes. The orientation of the digitized documents (0°, 90°, 180°, 270°) usually doesn’t matter. During OCR processing the system will recognize...

Content Management
Archiving
AI

7.7.2019 | 3 Minuten Lesezeit

Jürgen Voss

Orientierungsprobleme bei der Dokumentenerkennung (Kofax Transformation...

Bei der intelligenten Dokumentenklassifizierung und Datenextraktion von Eingangspost in Unternehmen müssen die Eingangskanäle Papier, Email und Fax berücksichtigt werden. Normalerweise ist die Orientierung der digitalisierten Dokumente (0°, 90°, 180°...

Content Management
NLP
Archivierung

7.7.2019 | 3 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules (KTM) – Dictionaries: Search by script

In addition to fuzzy databases KTM also offers so-called dictionaries for the optimization of recognition. For example these dictionaries can be used in the regular expressions of a format locator to find dates of the form “01. December 2015”. The dictionary...

6.7.2017 | 2 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules (KTM), AI and Machine Learning

The topics AI, machine learning and deep learning are on everyone’s lips, and the media regularly publishes articles on them. What many do not know is that Kofax Transformation Modules (KTM) also provides mechanisms of machine learning. KTM is a system...

5.6.2017 | 5 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules (KTM), KI und maschinelles Lernen

Die Themen „KI“, maschinelles Lernen und Deep Learning sind in aller Munde, und in den Medien erscheinen regelmäßig Artikel darüber. Was viele nicht wissen ist, dass Kofax Transformation Modules (KTM) „unter der Haube“ auch Mechanismen des maschinellen...

16.5.2017 | 5 Minuten Lesezeit

Jürgen Voss

CenterDevice und CenterScan – Scannen, Erkennen und sichere Ablage

CenterDevice ist ein Cloud-basiertes, professionelles Dokumentenmanagement- und Online-Collaboration-System. Im codecentric-Blog-Artikel CenterDevice und Kofax Capture – Integration out of the box wurde die einfache Integration von CenterDevice und...

8.2.2017 | 2 Minuten Lesezeit

Jürgen Voss

CenterDevice und Kofax Capture – Integration out of the box

Eine Standardaufgabe in vielen Unternehmen ist die Digitalisierung von eingehenden Papier-, Fax- und EMail-Dokumenten, deren Klassifizierung, Datenextraktion, sowie die sichere Ablage in einem Dokumentenmanagementsystem. In diesem Artikel soll kurz skizziert...

7.12.2016 | 3 Minuten Lesezeit

Jürgen Voss

Unterstützung eines automatisierten Kündigungsprozesses mit Kofax KTM

Die Eingangsdokumente (Brief, Fax oder Email) bei einem unserer Versicherungskunden werden mit Kofax Capture erfasst und durch Kofax Transformation Modules (KTM) klassifiziert und die gewünschten Geschäftsdaten werden dann ebenfalls mit KTM extrahiert...

26.10.2016 | 4 Minuten Lesezeit

Jürgen Voss

Kofax Capture Validation Scripting – from SBL to VB.NET for Dummies

With Kofax Capture you can enter document index values in a validation screen or just confirm or changes values which have been recognized automatically. The validation screen form presents all fields of a document and the user has to confirm/change ...

8.6.2016 | 4 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules: SEPA Mandates and handwritten additional...

Within the last two years many companies had to ask their customers to sign the SEPA Direct Debit Mandates. It is an established procedure to send out forms with filled customer data (the SEPA Mandate). The customer signs the mandate and sends it back...

19.2.2016 | 5 Minuten Lesezeit

Jürgen Voss

Kofax Capture – Document Separation and Barcodes

A well known approach to separate documents at scan time is the use of barcode labels on the first page of a document. The barcode may also be put on a single separator sheet. If a batch of documents is scanned by Kofax Capture, the barcode will be recognized...

6.1.2015 | 4 Minuten Lesezeit

Jürgen Voss

IBM Content Collector for SAP (formerly known as IBM CommonStore for SAP...

IBM Content Collector for SAP (ICC/SAP) is an interface for SAP ERP-Systems and IBM archiving systems: IBM Content Manager, On Demand und TSM. SAP provides the standard interface ‘ArchiveLink’ for linking external archiving systems. ICC/SAP is certified...

Content Management
NLP
Archiving

22.7.2014 | 5 Minuten Lesezeit

Jürgen Voss

KTM and insurance companies: Document Process Automation

Many of our customers are using systems for automatic document classification and data extraction. ‘Kofax Transformation Modules’ (KTM) is one of these systems. These data capturing systems extract metadata out of the electronic images (these are ...

29.11.2013 | 5 Minuten Lesezeit

Jürgen Voss

Document classification with Kofax Transformation Modules (KTM)

22.3.2013 | 6 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules – format locators and dynamic regular expressions...

Part 2: Dynamic regular expressions in KTM In the first part of this blog article I explained the use of KTM format locators and regular epressions. Now I will try to explain how flexible KTM projects can be designed by using the KTM internal scripting...

1.2.2013 | 4 Minuten Lesezeit

Jürgen Voss

Kofax Transformation Modules – format locators and dynamic regular expressions

Part 1: An introduction to format locators and regular expressions Many of our customers are using systems for automatic document classification and data extraction. These data capturing systems extract metadata out of the electronic images (these are...

9.1.2013 | 5 Minuten Lesezeit

Jürgen Voss

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

ChatGPT im Alltag eines Python-Entwicklers

Seit einigen Tagen spiele ich mit ChatGPT herum. Beruflich und privat konnte ich damit einige Fragen bearbeiten, bspw. welche Alternativen es zu bestimmten Tools gibt, was Vorteile von Teilzeit für den Arbeitgeber sind oder wer ich bin. Leider weiß ChatGPT...

NLP
Python
Künstliche Intelligenz

27.1.2023 | 7 Minuten Lesezeit

Robert Meißner

Mit wenigen Zeilen Code Titel und Vorschaubild generieren

Ich bin ein fauler Mensch. Und ich schreibe viel, u. a. beruflich und privat in Blogs, auf Twitter und auf Wissenschaftsseiten. Das Schreiben per se ist schön. Aber wenn ich mir Titel überlegen muss oder gar Schlagwörter, dann ist der Spaß vorbei. Noch...

11.10.2022 | 7 Minuten Lesezeit

Robert Meißner

Auslesen von deutschen Empfängeradressen mit Kofax Transformation Modules...

NLP
Archivierung

7.3.2022 | 6 Minuten Lesezeit

Jürgen Voss

Natural Language Processing: Erweiterungen mit KTM 6.4

Content Management
Archivierung
NLP

15.4.2021 | 2 Minuten Lesezeit

Jürgen Voss

Handschriftenerkennung bei der Dokumentenklassifikation und -extraktion

Im Rahmen eines Kundenprojektes bei einem Telekommunikationsunternehmen war die Aufgabenstellung folgende: Die Eingangsbriefpost musste digitalisiert werden. Nach dem Scannen der Dokumente galt es diese zu klassifizieren (z. B. Kündigungen, Beschwerden...

Content Management
NLP

29.3.2020 | 3 Minuten Lesezeit

Thomas Bergmann

Kofax Transformation Modules: Natural Language Processing, Stimmungen ...

Kofax Transformation Modules (KTM) bietet diverse Werkzeuge, um Dokumente zu klassifizieren und Daten zu extrahieren. Diese Werkzeuge wurden bereits in früheren Blog-Artikeln erläutert:– Dokumentenklassifizierung – Datenextraktion mit Format-Lokatoren...

Content Management
NLP
Archivierung

16.3.2020 | 7 Minuten Lesezeit

Jürgen Voss

Dokumentenklassifikation, Datenextraktion und der ganze Rest…

Content Management
NLP
Archivierung

20.8.2019 | 7 Minuten Lesezeit

Jürgen Voss

Orientierungsprobleme bei der Dokumentenerkennung (Kofax Transformation...

Content Management
NLP
Archivierung

7.7.2019 | 3 Minuten Lesezeit

Jürgen Voss

codecentric.AI Bootcamp ist online!

Im letzten Jahr haben wir bei codecentric eine AI-Initiative gestartet. Wir haben uns zum Ziel gesetzt, einen Online-Kurs zum Thema Machine Learning und künstliche Intelligenz in deutscher Sprache zu entwickeln. Natürlich gibt es bereits mehrere sehr...

Computer Vision
Künstliche Intelligenz
NLP

26.5.2019 | 4 Minuten Lesezeit

Oliver Moser

Natural Language Processing — Einsteigen und loslegen!

1 Worum geht es?Ob Suchmaschinen, Spamfilter, Chatbots oder Sprachassistenten wie Siri und Alexa — Computer verarbeiten immer mehr Sprache mit immer besserer Genauigkeit und dringen damit immer weiter in unseren Alltag vor. Dahinter stecken anspruchsvolle...

Künstliche Intelligenz
Machine Learning
Python
NLP
Data

7.3.2019 | 11 Minuten Lesezeit

Thomas Timmermann

Computer-Vision-Techniken in Kofax Transformation Modules (KTM/KTD)

„Computer Vision“ ist eines der wichtigsten, aktuellen Themen in der IT. Überall in modernen Systemen kommt diese Technologie zum Einsatz – sei es in den genialen Autos von Tesla („Object Detection“ für Hindernisse, andere Verkehrsteilnehmer, Straßenschilder...

Data
NLP
Softwareentwicklung
Computer Vision
Archivierung
Künstliche Intelligenz

11.4.2017 | 3 Minuten Lesezeit

Niko Blättermann

Topic Modeling der codecentric Blog-Artikel

Der größte Teil von Big Data sind unstrukturierte Daten. Wenn eine Organisation ihre oder externe Daten von sozialen Medien mit dem Ziel besserer Geschäftsentscheidungen nutzbar machen möchte, so besteht eine Herausforderung darin aus unstrukturierten...

NLP
Python
Machine Learning

3.1.2017 | 15 Minuten Lesezeit

Matthias Radtke

IBM Content Collector for SAP (formerly known as IBM CommonStore for SAP...

IBM Content Collector for SAP (kurz ICC/SAP) ist die Verbindung zwischen SAP ERP-Systemen und den von IBM angebotenen Archivierungslösungen IBM Content Manager, On Demand und TSM. SAP stellt eine Standardschnittstelle zur Anbindung von externen Archivsystemen...

Content Management
NLP
Archivierung

22.7.2014 | 5 Minuten Lesezeit

Jürgen Voss

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.

Contact

Send

Kofax Transformation Modules (KTM): ‘free-form recognition’ for handwritten numbers

Was this post helpful?

Ja

Blog author

Get in contact

Get in contact

More articles

Spaß mit Flaggen: KTM – ein lockerer Rückblick auf 16 Jahre Kofax Transformation...

Auslesen von deutschen Empfängeradressen mit Kofax Transformation Modules...

Natural Language Processing: Erweiterungen mit KTM 6.4

Kofax Transformation Modules: Natural Language Processing, sentiments ...

Kofax Transformation Modules: Natural Language Processing, Stimmungen ...

Document classification, data extraction and everything

Dokumentenklassifikation, Datenextraktion und der ganze Rest…

Orientation problems with document processing (Kofax Transformation Modules...

Orientierungsprobleme bei der Dokumentenerkennung (Kofax Transformation...

Kofax Transformation Modules (KTM) – Dictionaries: Search by script

Kofax Transformation Modules (KTM), AI and Machine Learning

Kofax Transformation Modules (KTM), KI und maschinelles Lernen

CenterDevice und CenterScan – Scannen, Erkennen und sichere Ablage

CenterDevice und Kofax Capture – Integration out of the box

Unterstützung eines automatisierten Kündigungsprozesses mit Kofax KTM

Kofax Capture Validation Scripting – from SBL to VB.NET for Dummies

Kofax Transformation Modules: SEPA Mandates and handwritten additional...

Kofax Capture – Document Separation and Barcodes

IBM Content Collector for SAP (formerly known as IBM CommonStore for SAP...

KTM and insurance companies: Document Process Automation

Document classification with Kofax Transformation Modules (KTM)

Kofax Transformation Modules – format locators and dynamic regular expressions...

Kofax Transformation Modules – format locators and dynamic regular expressions

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

View Job

More articles in this subject area

ChatGPT im Alltag eines Python-Entwicklers

Mit wenigen Zeilen Code Titel und Vorschaubild generieren

Auslesen von deutschen Empfängeradressen mit Kofax Transformation Modules...

Natural Language Processing: Erweiterungen mit KTM 6.4

Handschriftenerkennung bei der Dokumentenklassifikation und -extraktion

Kofax Transformation Modules: Natural Language Processing, Stimmungen ...

Dokumentenklassifikation, Datenextraktion und der ganze Rest…

Orientierungsprobleme bei der Dokumentenerkennung (Kofax Transformation...

codecentric.AI Bootcamp ist online!

Natural Language Processing &mdash; Einsteigen und loslegen!

Computer-Vision-Techniken in Kofax Transformation Modules (KTM/KTD)

Topic Modeling der codecentric Blog-Artikel

IBM Content Collector for SAP (formerly known as IBM CommonStore for SAP...

Gemeinsam bessere Projekte umsetzen.

Wir helfen deinem Unternehmen.

Unsere Leistungen

Hilf uns, noch besser zu werden.

Zu den Jobangeboten

Natural Language Processing — Einsteigen und loslegen!