Overview

Kofax Transformation Modules (KTM) – Dictionaries: Search by script

No Comments

In addition to fuzzy databases KTM also offers so-called dictionaries for the optimization of recognition. For example these dictionaries can be used in the regular expressions of a format locator to find dates of the form “01. December 2015”. The dictionary should consist of all the month names (January, February, March, …)

The KTM fuzzy databases can be searched by using the KTM script language. There are also sample programs offered by KOFAX (for example in the scripting help of KTM or here: “Best Practices“). To my knowledge there are no such examples for a search in a dictionary. In a recognition project for German license plates I had to search in a dictionary by script. I would like to briefly explain the reasons for this and present a sample script.

A regular expression of the following form is usually enough to recognize a German license plate number (One to three characters – delimiter – one to two characters – delimiter – one to four numbers, example: SG CC 876):

[A-ZÄÖÜ]{1,3}\x20?[\.|\x20|-]\x20?[A-Z]{1,2}[\.|\x20|-]?[0-9]{1,4}

The first characters are a code for the city or the administrative district where the car is registered.

However, this regular expression may find also strings that are not valid, since only one to three letters are searched at the front. It would be nice, if one could use a list of valid city codes instead of [A-ZÄÖÜ]{1,3}. By doing so, the list of recognized, but invalid number plates would become much smaller. KTM dictionaries come in handy there. Lists of valid city codes are available in the Internet and a suitable dictionary file (KFZ-Staedte) with the valid codes is created quickly:

AIC
AK
AM
AN
ANA

AP
AS
ASL
ASZ
AUR
:
:

This format locator

delivers the correct result:

However, the test of another document failed – no number plate was detected:

Here is a weakness in the integration of dictionaries into regular expressions: It only works if the dictionary string is separated by a space, tabulator,… from the rest of the regular expression.

This is the case with the first example: “COE – EW 247”. The second example “COE.EW.247” has points as a separator between the individual parts of the number plate and the integration of the dictionary does not work as desired.
But I did not want to do without the optimized recognition of the city codes. Thus I used again the ‘original’ regular expression:

[A-ZÄÖÜ]{1,3}\x20?[\.|\x20|-]\x20?[A-Z]{1,2}[\.|\x20|-]?[0-9]{1,4}

But now I took the recognized city code and checked it against the dictionary by script. If the test is positive, the license plate is accepted, otherwise it is discarded.

Here is a sample script showing how to search for strings in a KTM dictionary:

Function ExistiertStadtAusAMKZ(kennzeichen) As Boolean

'The following format for the number plate (kennzeichen) is expected: COE.EW.247

Dim DictResItems As CscDictionaryResItems
Dim Dict As CscDictionary
Dim strData As String
Dim strReplaceVal As String
Dim QueryText As String
Dim pos As Integer

ExistiertStadtAusAMKZ=False
pos=InStr(kennzeichen,".")

If pos>0 Then
  QueryText=Left(kennzeichen,pos-1) 'city code
  Set Dict = Project.Dictionaries.ItemByName("KFZ-Staedte")
  Set DictResItems=Dict.Search(QueryText,CscEvalMatchQuery,5)
  If DictResItems.Count>0 Then
    'strData holds the code
    'strReplaceVal holds the optinal replacement value from dictionary
    Dict.GetRecordData(DictResItems(0).RecID,strData,strReplaceVal)
    'ExistiertStadtAusAMKZ=True 'something was found
  Else
    'nothing was found
    ExistiertStadtAusAMKZ=False
  End If
End If

End Function

Comment

Your email address will not be published. Required fields are marked *