Unlike commonly used regexp libraries, regular expressions are not strings: instead a first class syntax is used to define them.
Felix allows you to name regular expressions with the syntax:
regexp <name> = <regexp> ;The name is an identifier. A string used in a regexp stands for a match of each character of the string in sequence. The following symbols are special, and are given from weakest to strongest binding order:
symbol | syntax | meaning |
---|---|---|
| | infix | alternatives |
* | postfix | 0 or more occurences |
+ | postfix | 1 or more occurences |
? | postfix | 0 or 1 occurences |
<juxtaposition> | infix | concatenation |
<name> | atomic | re denoted by the name in a REGEXP definition |
<string> | atomic | sequence of chars of the string |
[<charset>] | atomic | any char of the charset |
[^<charset>] | atomic | any char not in the charset |
. | atomic | any char other than end of line |
_ | atomic | any char |
eof | atomic | end marker |
(<regexp>) | atomic | brackets |
symbol | meaning |
---|---|
<string> | any character in the string |
<char>-<char> | any between or including the two chars |
1: #line 1103 "./lpsrc/flx_tutorial.pak" 2: #import <flx.flxh> 3: regexp lower = ["abcdefghijklmnopqrstuvwxyz"]; 4: regexp upper = ["ABCDEFGHIJKLMNOPQRSTUVWXYZ"]; 5: regexp digit = ["0123456789"]; 6: regexp alpha = lower | upper | "_"; 7: regexp id = alpha (alpha | digit) *;
8: #line 1118 "./lpsrc/flx_tutorial.pak" 9: print 10: regmatch "identifier" with 11: | digit+ => "Number" 12: | id => "Identifier" 13: endmatch 14: ; 15: endl; 16: 17: print 18: regmatch "9999" with 19: | digit+ => "Number" 20: | id => "Identifier" 21: endmatch 22: ; 23: endl; 24: 25: print 26: regmatch "999xxx" with 27: | digit+ => "Number" 28: | id => "Identifier" 29: | _* => "Neither" 30: endmatch 31: ; 32: endl;
1: Identifier 2: Number 3: Neither
Note: the generated code is *extremely* fast, within one or two memory fetches of the fastest possible code. here is the generated code for the inner loop of a regmatch:
while(state && start != end) state = matrix[*start++][state];