//

Kofax Transformation Modules (KTM) – Dictionaries: Search by script

6.7.2017 | 2 minutes of reading time

In addition to fuzzy databases KTM also offers so-called dictionaries for the optimization of recognition. For example these dictionaries can be used in the regular expressions of a format locator to find dates of the form “01. December 2015”. The dictionary should consist of all the month names (January, February, March, …)

The KTM fuzzy databases can be searched by using the KTM script language. There are also sample programs offered by KOFAX (for example in the scripting help of KTM or here: “Best Practices “). To my knowledge there are no such examples for a search in a dictionary. In a recognition project for German license plates I had to search in a dictionary by script. I would like to briefly explain the reasons for this and present a sample script.

A regular expression of the following form is usually enough to recognize a German license plate number (One to three characters – delimiter – one to two characters – delimiter – one to four numbers, example: SG CC 876):

[A-ZÄÖÜ]{1,3}\x20?[\.|\x20|-]\x20?[A-Z]{1,2}[\.|\x20|-]?[0-9]{1,4}

The first characters are a code for the city or the administrative district where the car is registered.

However, this regular expression may find also strings that are not valid, since only one to three letters are searched at the front. It would be nice, if one could use a list of valid city codes instead of [A-ZÄÖÜ]{1,3}. By doing so, the list of recognized, but invalid number plates would become much smaller. KTM dictionaries come in handy there. Lists of valid city codes are available in the Internet and a suitable dictionary file (KFZ-Staedte) with the valid codes is created quickly:

AIC
AK
AM
AN
ANA

AP
AS
ASL
ASZ
AUR
:
:

This format locator

delivers the correct result:

However, the test of another document failed – no number plate was detected:

Here is a weakness in the integration of dictionaries into regular expressions: It only works if the dictionary string is separated by a space, tabulator,… from the rest of the regular expression.

This is the case with the first example: “COE – EW 247”. The second example “COE.EW.247” has points as a separator between the individual parts of the number plate and the integration of the dictionary does not work as desired.
But I did not want to do without the optimized recognition of the city codes. Thus I used again the ‘original’ regular expression:

[A-ZÄÖÜ]{1,3}\x20?[\.|\x20|-]\x20?[A-Z]{1,2}[\.|\x20|-]?[0-9]{1,4}

But now I took the recognized city code and checked it against the dictionary by script. If the test is positive, the license plate is accepted, otherwise it is discarded.

Here is a sample script showing how to search for strings in a KTM dictionary:

1Function ExistiertStadtAusAMKZ(kennzeichen) As Boolean
2
3'The following format for the number plate (kennzeichen) is expected: COE.EW.247
4
5Dim DictResItems As CscDictionaryResItems
6Dim Dict As CscDictionary
7Dim strData As String
8Dim strReplaceVal As String
9Dim QueryText As String
10Dim pos As Integer
11
12ExistiertStadtAusAMKZ=False
13pos=InStr(kennzeichen,".")
14
15If pos>0 Then
16  QueryText=Left(kennzeichen,pos-1) 'city code
17  Set Dict = Project.Dictionaries.ItemByName("KFZ-Staedte")
18  Set DictResItems=Dict.Search(QueryText,CscEvalMatchQuery,5)
19  If DictResItems.Count>0 Then
20    'strData holds the code
21    'strReplaceVal holds the optinal replacement value from dictionary
22    Dict.GetRecordData(DictResItems(0).RecID,strData,strReplaceVal)
23    'ExistiertStadtAusAMKZ=True 'something was found
24  Else
25    'nothing was found
26    ExistiertStadtAusAMKZ=False
27  End If
28End If
29
30End Function
31

share post

Likes

0

//

Gemeinsam bessere Projekte umsetzen

Wir helfen Deinem Unternehmen

Du stehst vor einer großen IT-Herausforderung? Wir sorgen für eine maßgeschneiderte Unterstützung. Informiere dich jetzt.

Hilf uns, noch besser zu werden.

Wir sind immer auf der Suche nach neuen Talenten. Auch für dich ist die passende Stelle dabei.