Wildcards
When you extract information using processors you can use wildcards when you define your condition with the 'defined text' fileds.
Supported wildcards are:
1. Simple
---------
? - 1 any character (mandatory)
_ - 0 or 1 any character
* - 0 or more characters
% - 1 or more characters
# - 1 or more digits ('0'...'9')
2. Sets []
----------
matches a limited set of characters.
sintax: [[I|E]<occurrences>:][?|!]<range>[;[?|!]<range>]...
(enclosed on brackets)
occurrences can be:
a number - number of occurrences
an interval n-m - n to m occurrences
empty - 0 or 1 occurrence
? - 1 occurrence
_ - 0 or 1 occurrence
* - 0 or more occurrences
% - 1 or more occurrences
range can be:
1 character - that characater is part of the set
2 characters - every characters between, according to the ASCII table,
inclusive, are part of the set (only when second is
greater than first)
3 or more - each one are part of the set
parameters:
E - (exclusive) will match until next mask succedded (default)
I - (inclusive) will match as much characters as possible
? - the character(s) must occur
! - the character(s) must not occur
examples:
[a;b] - one mandatory occurrence of 'a' or 'b'
[az] - characters 'a','b','c'...'z'
[za] - character 'z' or 'a'
[xyz] - character 'x' or 'y' or 'z'
[az;09] - characters 'a'...'z' or '0'...'9'
[az;!mo] - characters 'a'...'z' except 'm','n','o'
[!ac] - any character except 'a','b','c'
[6:az;09] - 6 occurrences (e.g. '9c3ax7')
[2-5:az;09] - 2 to 5 occurrences
[:az] - 0 or 1 occurrence
[*:az;09] - 0 or more occurrences
[%:az;09] - 1 or more occurrences
[%:az;?x] - any lowercase sequence containing an 'x'
[E*:az]abc - any lowercase sequence ending by 'abc' e.g. 'ashufnmnjuabc'
[I*:az]abc - impossible, because set will consume all 'a'..'z' chars
and will be nothing left to match 'abc'
warning: no masks are supported inside a set, no need to escape reserved
characters
3. OR sequences {}
------------------
matches any of the specified submasks.
sintax: <mask_string>;<mask_string>[;<mask_string>]...
(enclosed on braces)
examples:
{xyz;abcd;123} - word 'xyz' or 'abcd' or '123'
{[3:09];-[2:09]} - 3 digits or '-' and 2 digits
{a*;*0} - something starting with 'a' or ending with '0'
{a{bc;de};fgh} - 'abc' or 'ade' or 'fgh'
4. Negation <>
--------------
match when specified submask is not true.
sintax: <mask_string>
(enclosed on angle brackets)
examples:
<{xyz;abcd;123}> - string is not 'xyz' nor 'abc' nor '123'
<#>* - string doesn't begin with digit(s)
warning: unpredictable behavior when negated mask is complex.
5. Numeric fields ()
--------------------
match any variable-length string respecting a numeric range.
sintax: <minimum>;<maximum>[;<decimal-symbol>[;<other-punctuation-chars>]]
(encolsed on parenthesis)
parameters:
- <minimum>, <maximum> are any Real-compatible value (including negatives),
specifying the range where the numeric string must fit.
- <decimal-symbol> is one character assumed to be the decimal-part
separator (default is dot $2E).
- <other-punctuation-chars> are characters that can be found within the
number and must be ignored (e.g. thousand-separators, currency symbols)
examples:
(0;100) - matches anything between 0 and 100 e.g. '0',
'25.3', '000100'
(-15;15;,) - matches anything between -15 and 15, assuming
coma as the decimal separator e.g. '-14,99'
US$[:\s](0.01;100000;.;,) - matches an american dollar ammount upto 100000,
including comma as the thousands separator
e.g. 'US$ 53,982.33' or 'US$150'
warning: masks are not supported on these parameters.
6. Mask repetition ||
---------------------
matches a mask repetitively, to treat contiguous instances of a pattern
(zero to infinite instances).
sintax: <mask_string>
(encloset between two pipes)
examples:
a.|[2-3:09].|b - matches 'a.123.b' or 'a.05.332.41.191.b' or 'a.b'
warning: nested repetitions are not supported.
7. Escapes \
------------
bypass next character (next character will not be seen as a wildcard or
any control or reserved character).
sintax: \<character|predefined_character|$two_digits_hex_code>
predefined character can be:
- s (space)
- t (tab)
- c (CR)
- l (LF)
examples:
\* - character '*'
\\ - character '\'
\s - character space (#$20)
\t - character tab (#$9)
\c - character CR (#$0D)
\l - character LF (#$0A)
\$41 - character 'A' (#$41) - hex must have 2 digits
[I1-2:\c;\l] - matches any EOL (CR, LF, CR/LF or LF/CR)
[\[;\]] - matches char '[' or ']'
the following characters are reserved and must be escaped when they have to
be interpreted as itselves (some are reserved only inside structures):
? _ * % # [ ] { } < > ( ) \ | ; !