Character classes

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@295259 c90b9560-bf6c-de11-be94-00142212c4b1
This commit is contained in:
Jakub Vrana 2010-02-19 17:17:25 +00:00
parent 34fe6dd1e0
commit f5c36b9d90

View file

@ -1040,6 +1040,48 @@
terminator is always special and must be escaped when used
within an expression.
</para>
<para>
Perl supports the POSIX notation for character classes. This uses names
enclosed by <literal>[:</literal> and <literal>:]</literal> within the enclosing square brackets. PCRE also
supports this notation. For example, <literal>[01[:alpha:]%]</literal>
matches "0", "1", any alphabetic character, or "%". The supported class
names are:
<table>
<title>Character classes</title>
<tgroup cols="2">
<tbody>
<row><entry><literal>alnum</literal></entry><entry>letters and digits</entry></row>
<row><entry><literal>alpha</literal></entry><entry>letters</entry></row>
<row><entry><literal>ascii</literal></entry><entry>character codes 0 - 127</entry></row>
<row><entry><literal>blank</literal></entry><entry>space or tab only</entry></row>
<row><entry><literal>cntrl</literal></entry><entry>control characters</entry></row>
<row><entry><literal>digit</literal></entry><entry>decimal digits (same as \d)</entry></row>
<row><entry><literal>graph</literal></entry><entry>printing characters, excluding space</entry></row>
<row><entry><literal>lower</literal></entry><entry>lower case letters</entry></row>
<row><entry><literal>print</literal></entry><entry>printing characters, including space</entry></row>
<row><entry><literal>punct</literal></entry><entry>printing characters, excluding letters and digits</entry></row>
<row><entry><literal>space</literal></entry><entry>white space (not quite the same as \s)</entry></row>
<row><entry><literal>upper</literal></entry><entry>upper case letters</entry></row>
<row><entry><literal>word</literal></entry><entry>"word" characters (same as \w)</entry></row>
<row><entry><literal>xdigit</literal></entry><entry>hexadecimal digits</entry></row>
</tbody>
</tgroup>
</table>
The <literal>space</literal> characters are HT (9), LF (10), VT (11), FF (12), CR (13),
and space (32). Notice that this list includes the VT character (code
11). This makes "space" different to <literal>\s</literal>, which does not include VT (for
Perl compatibility).
</para>
<para>
The name <literal>word</literal> is a Perl extension, and <literal>blank</literal> is a GNU extension
from Perl 5.8. Another Perl extension is negation, which is indicated
by a <literal>^</literal> character after the colon. For example,
<literal>[12[:^digit:]]</literal> matches "1", "2", or any non-digit.
</para>
<para>
In UTF-8 mode, characters with values greater than 128 do not match any
of the POSIX character classes.
</para>
</section>
<section xml:id="regexp.reference.verticalbar">