match-regex
Purpose: Find, or find and replace patterns in strings using regex (regular expressions).
match-regex <pattern> in <target> \
[ \
( replace-with <replace pattern> \
result [ define ] <result> \
[ status [ define ] <status> ] ) \
| \
( status [ define ] <status> \
[ case-insensitive [ <case-insensitive> ] ] [ single-match [ <single-match> ] ] ) \
] \
[ cache ] \
[ clear-cache <clear cache> )
match-regex searches <target> string for regex <pattern>. If "replace-with" is specified, then instance(s) of <pattern> in <target> are replaced with <replace pattern> string, and the result is stored in <result> string, which must be different from <target>. <result> is
allocated memory.
Both the search and replacement use
extended regex syntax (ERE) from the built-in Linux regex library. The number of found or found/replaced patterns can be obtained in number <status> variable.
If "replace-with" is not specified, then the number of matched <pattern>s within <target> is given in <status> number variable, which in this case must be specified.
If "case-insensitive" is used without optional boolean expression <case-insensitive>, or if <case-insensitive> evaluates to true, then searching for pattern is case insensitive. By default, it is case sensitive.
If "single-match" is specified without optional boolean expression <single-match>, or if <single-match> evaluates to true, then only the very first occurrence of <pattern> in <target> is processed. Otherwise, all occurrences are processed.
<result> and <status> variables can be created within the statement with "define" clause.
If the pattern is bad (i.e. <pattern> is not a correct regular expression), Vely will error out with a message.
Caching and performance
If "cache" clause is used, then regex compilation of <pattern> will be done only once and saved for future use. There is a significant performance benefit when match-regex executes repeatedly with "cache" (such as in case of web applications or in any kind of loop). If <pattern> changes and you need to recompile it once in a while, use "clear-cache" clause. <clear cache> is a "bool" variable; the regex cache is cleared if it is true, and stays if it is false. For example:
bool cl_c;
if (q == 0) cl_c = true; else cl_c = false;
match-regex ps in look_in replace-with "Yes it is \\1!" result res cache clear-cache cl_c
In this case, when "q" is 0, cache will be cleared, and the pattern in variable "ps" will be recompiled. In all other cases, the last computed regex stays the same.
While every pattern is different, when using cache even a relatively small one was shown in tests to speed up the match-regex by about 500%, or 5x faster. Use cache whenever possible as it brings parsing performance close to its theoretical limits.
Subexpressions and back-referencing
Subexpressions are referenced via compatible ERE extension, which is a backslash followed by a number. Because C treats backslash followed by a number as an octal number, you must use double backslash (\\). For example:
match-regex "(good).*(day)" \
in "Hello, good day!" \
replace-with "\\2 \\1" \
result res
will produce string "Hello, day good!" as a result in "res" variable. Each subexpression is within () parenthesis, so for instance "day" in the above pattern is the 2nd subexpression, and is back-referenced as \\2 in replacement.
There can be a maximum of 23 subexpressions.
Note that backreferences to non-existing subexpressions are ignored - for example \\4 when there are only 3 subexpressions. Vely is "smart" about using two digits and differentiating between \\1 and \\10 for instance - it takes into account the actual number of subexpressions and their validity, and selects a proper subexpression even when two digits are used in a backreference.
Examples
Use match-regex statement to find out if a string matches a pattern, for example:
match-regex "SOME (.*) IS (.*)" in "WOW SOME THING IS GOOD HERE" status define st
In this case, the first parameter ("SOME (.*) IS (.*)") is a pattern and you're matching it with string ("WOW SOME THING IS GOOD HERE"). Since there is a match, status variable (defined on the fly as integer) "st" will be "1" (meaning one match was found) - in general it will contain the number of patterns matched.
Search for patterns and replace them by using replace-with clause, for example:
match-regex "SOME (.*) IS ([^ ]+)" in "WOW SOME THING IS GOOD HERE FOR SURE" replace-with "THINGS ARE \\2 YES!" result define res status define st
In this case, the result from replacement will be in a new string variable "res" specified with the result clause, and it will be
WOW THINGS ARE GOOD YES! HERE FOR SURE
The above demonstrates a typical use of subexpressions in regex (meaning "()" statements) and their referencing with "\\1", "\\2" etc. in the order in which they appear. Consult regex documentation for more information. Status variable specified with status clause ("st" in this example) will contain the number of patterns matched and replaced. As is the case with all Vely statements, you could use "define" clause to create a result/output variable within the statement, or omit it if it's created elsewhere.
Matching is by default case sensitive. Use "case-insensitive" clause to change it to case insensitive, for instance:
match-regex "SOME (.*) IS (.*)" in "WOW some THING IS GOOD HERE" status define st case-insensitive
In the above case, the pattern would not be found without "case-insensitive" clause because "SOME" and "some" would not match. This clause works the same in matching-only as well as replacing strings.
If you want to match only the first occurrence of a pattern, use "single-match" option:
match-regex "SOME ([^ ]+) IS ([^ ]+)" in "WOW SOME THING IS GOOD HERE AND SOME STUFF IS GOOD TOO" status define st single-match
In this case there would be two matches by default ("SOME THING IS GOOD" and "SOME STUFF IS GOOD") but only the first would be found. This clause works the same for replacing as well - only the first occurrence would be replaced.
See also
Regex (
match-regex )
SEE ALL (
documentation)