Fix UTF-32 note

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@343406 c90b9560-bf6c-de11-be94-00142212c4b1
This commit is contained in:
Anatol Belski 2017-11-12 09:59:54 +00:00
parent cbd24b3772
commit 82330cfda6

View file

@ -8,7 +8,7 @@
Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: <literal>[:alnum:]</literal>, <literal>[:alpha:]</literal>, <literal>[:blank:]</literal>, <literal>[:cntrl:]</literal>, <literal>[:digit:]</literal>, <literal>[:graph:]</literal>, <literal>[:lower:]</literal>, <literal>[:print:]</literal>, <literal>[:punct:]</literal>, <literal>[:space:]</literal>, <literal>[:upper:]</literal> and <literal>[:xdigit:]</literal>.
</para>
<para>
The Unicode character classes are currently not supported. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression <literal>[\xe2][\x82][\xac]</literal> can be used. The pattern for an UTF-8 encoded string could be <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.
The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression <literal>[\xe2][\x82][\xac]</literal> can be used. The pattern for an UTF-8 encoded string could be <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.
</para>
<section xml:id="parle.regex.chars">
<title>Character representations</title>