mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-15 08:28:54 +00:00
395 lines
12 KiB
XML
395 lines
12 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!-- $Revision$ -->
|
|
|
|
<chapter xml:id="parle.pattern.matching" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
|
|
<title>Parle pattern matching</title>
|
|
<titleabbrev>Pattern matching</titleabbrev>
|
|
<para>
|
|
Parle supports regex matching similar to flex. Also supported are the following POSIX character sets: <literal>[:alnum:]</literal>, <literal>[:alpha:]</literal>, <literal>[:blank:]</literal>, <literal>[:cntrl:]</literal>, <literal>[:digit:]</literal>, <literal>[:graph:]</literal>, <literal>[:lower:]</literal>, <literal>[:print:]</literal>, <literal>[:punct:]</literal>, <literal>[:space:]</literal>, <literal>[:upper:]</literal> and <literal>[:xdigit:]</literal>.
|
|
</para>
|
|
<para>
|
|
The Unicode character classes are currently not enabled by default, pass --enable-parle-utf32 to make them available. A particular encoding can be mapped with a correctly constructed regex. For example, to match the EURO symbol encoded in UTF-8, the regular expression <literal>[\xe2][\x82][\xac]</literal> can be used. The pattern for an UTF-8 encoded string could be <literal>[ -\x7f]{+}[\x80-\xbf]{+}[\xc2-\xdf]{+}[\xe0-\xef]{+}[\xf0-\xff]+</literal>.
|
|
</para>
|
|
<section xml:id="parle.regex.chars">
|
|
<title>Character representations</title>
|
|
<para>
|
|
<table>
|
|
<title>Character representations</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Sequence</entry><entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>\a</entry><entry>Alert (bell).</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\b</entry><entry>Backspace.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\e</entry><entry>ESC character, \x1b.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\n</entry><entry>Newline.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\r</entry><entry>Carriage return.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\f</entry><entry>Form feed, \x0c.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\t</entry><entry>Horizontal tab, \x09.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\v</entry><entry>Vertical tab, \x0b.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\oct</entry><entry>Character specified by a three-digit octal code.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\xhex</entry><entry>Character specified by a hex code.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\cchar</entry><entry>Named control character.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
</section>
|
|
<section xml:id="parle.regex.charclass">
|
|
<title>Character classes</title>
|
|
<para>
|
|
<table>
|
|
<title>Character classes</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Sequence</entry><entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>[...]</entry><entry>A single character listed or contained within a listed range. Ranges can be combined with the <literal>{+}</literal> and <literal>{-}</literal> operators. For example <literal>[a-z]{+}[0-9]</literal> is the same as <literal>[0-9a-z]</literal> and <literal>[a-z]{-}[aeiou]</literal> is the same as <literal>[b-df-hj-np-tv-z]</literal>.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>[^...]</entry><entry>A single character not listed and not contained within a listed range.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>.</entry><entry>Any character, default <literal>[^\n].</literal></entry>
|
|
</row>
|
|
<row>
|
|
<entry>\d</entry><entry>Digit character, <literal>[0-9]</literal>.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\D</entry><entry>Non-digit character, <literal>[^0-9]</literal>.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\s</entry><entry>White space character, <literal>[ \t\n\r\f\v]</literal>.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\S</entry><entry>Non-white space character, <literal>[^ \t\n\r\f\v]</literal>.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\w</entry><entry>Word character, <literal>[a-zA-Z0-9_]</literal>.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\W</entry><entry>Non-word character, <literal>[^a-zA-Z0-9_]</literal>.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
</section>
|
|
<section xml:id="parle.regex.unicodecharclass">
|
|
<title>Unicode character classes</title>
|
|
<para>
|
|
<table>
|
|
<title>Unicode character classes</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Sequence</entry><entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>\p{C}</entry><entry>Other.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Cc}</entry><entry>Other, control.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Cf}</entry><entry>Other, format.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Co}</entry><entry>Other, private use.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Cs}</entry><entry>Other, surrogate.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{L}</entry><entry>Letter.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{LC}</entry><entry>Letter, cased.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Ll}</entry><entry>Letter, lowercase.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Lm}</entry><entry>Letter, modifier.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Lo}</entry><entry>Letter, other.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Lt}</entry><entry>Letter, titlecase.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Lu}</entry><entry>Letter, uppercase.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{M}</entry><entry>Mark.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Mc}</entry><entry>Mark, space combining.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Me}</entry><entry>Mark, enclosing.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Mn}</entry><entry>Mark, nonspacing.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{N}</entry><entry>Number.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Nd}</entry><entry>Number, decimal digit.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Nl}</entry><entry>Number, letter.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{No}</entry><entry>Number, other.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{P}</entry><entry>Punctuation.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Pc}</entry><entry>Punctiation, connector.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Pd}</entry><entry>Punctuation, dash.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Pe}</entry><entry>Punctuation, close.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Pf}</entry><entry>Punctuation, final quote.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Pi}</entry><entry>Punctuation, initial quote.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Po}</entry><entry>Punctuation, other.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Ps}</entry><entry>Punctuation, open.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{S}</entry><entry>Symbol.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Sc}</entry><entry>Symbol, currency.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Sk}</entry><entry>Symbol, modifier.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Sm}</entry><entry>Symbol, math.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{So}</entry><entry>Symbol, other.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Z}</entry><entry>Separator.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Zl}</entry><entry>Separator, line.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Zp}</entry><entry>Separator, paragraph.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>\p{Zs}</entry><entry>Separator, space.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
<para>
|
|
These character clasess are only available, if the option --enable-parle-utf32 was passed at the compilation time.
|
|
</para>
|
|
</section>
|
|
<section xml:id="parle.regex.alternation">
|
|
<title>Alternation and repetition</title>
|
|
<para>
|
|
<table>
|
|
<title>Alternation and repetition</title>
|
|
<tgroup cols="3">
|
|
<thead>
|
|
<row>
|
|
<entry>Sequence</entry><entry>Greedy</entry><entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>...|...</entry><entry>-</entry><entry>Try subpatterns in alternation.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>*</entry><entry>yes</entry><entry>Match 0 or more times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>+</entry><entry>yes</entry><entry>Match 1 or more times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>?</entry><entry>yes</entry><entry>Match 0 or 1 times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>{n}</entry><entry>no</entry><entry>Match exactly n times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>{n,}</entry><entry>yes</entry><entry>Match at least n times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>{n,m}</entry><entry>yes</entry><entry>Match at least n times but no more than m times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>*?</entry><entry>no</entry><entry>Match 0 or more times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>+?</entry><entry>no</entry><entry>Match 1 or more times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>??</entry><entry>no</entry><entry>Match 0 or 1 times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>{n,}?</entry><entry>no</entry><entry>Match at least n times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>{n,m}?</entry><entry>no</entry><entry>Match at least n times but no more than m times.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>{MACRO}</entry><entry>-</entry><entry>Include the regex MACRO in the current regex.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
</section>
|
|
<section xml:id="parle.regex.anchors">
|
|
<title>Anchors</title>
|
|
<para>
|
|
<table>
|
|
<title>Anchors</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Sequence</entry><entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>^</entry><entry>Start of string or after a newline.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>$</entry><entry>End of string or before a newline.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
</section>
|
|
<section xml:id="parle.regex.grouping">
|
|
<title>Grouping</title>
|
|
<para>
|
|
<table>
|
|
<title>Grouping</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Sequence</entry><entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>(...)</entry><entry>Group a regular expression to override default operator precedence.</entry>
|
|
</row>
|
|
<row>
|
|
<entry valign="top">(?r-s:pattern)</entry>
|
|
<entry>
|
|
Apply option r and omit option s while interpreting pattern. Options may be zero or more of the characters i, s, or x.
|
|
<table>
|
|
<title>Options</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Option</entry><entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>i</entry><entry>Case insensitive.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>-i</entry><entry>Case sensitive.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>s</entry><entry>Alters the meaning of '.' to match any character whatsoever.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>-s</entry><entry>Alters the meaning of '.' to match any character except '\n'.</entry>
|
|
</row>
|
|
<row>
|
|
<entry>x</entry><entry>Ignores comments and whitespace in patterns. Whitespace is ignored unless it is backslash-escaped, contained within ""s, or appears inside a character range.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
These options can be applied globally at the rules level by passing a combination of the bit flags to the lexer.
|
|
</entry>
|
|
</row>
|
|
<row>
|
|
<entry>(?# comment )</entry><entry>Omit everything within (). The first ) character encountered ends the pattern. It is not possible for the comment to contain a ) character. The comment may span lines.</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
</para>
|
|
</section>
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-omittag:t
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
indent-tabs-mode:nil
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:nil
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
vim600: syn=xml fen fdm=syntax fdl=2 si
|
|
vim: et tw=78 syn=sgml
|
|
vi: ts=1 sw=1
|
|
-->
|