mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-15 16:38:54 +00:00
Character classes
git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@295259 c90b9560-bf6c-de11-be94-00142212c4b1
This commit is contained in:
parent
34fe6dd1e0
commit
f5c36b9d90
1 changed files with 42 additions and 0 deletions
|
@ -1040,6 +1040,48 @@
|
|||
terminator is always special and must be escaped when used
|
||||
within an expression.
|
||||
</para>
|
||||
<para>
|
||||
Perl supports the POSIX notation for character classes. This uses names
|
||||
enclosed by <literal>[:</literal> and <literal>:]</literal> within the enclosing square brackets. PCRE also
|
||||
supports this notation. For example, <literal>[01[:alpha:]%]</literal>
|
||||
matches "0", "1", any alphabetic character, or "%". The supported class
|
||||
names are:
|
||||
<table>
|
||||
<title>Character classes</title>
|
||||
<tgroup cols="2">
|
||||
<tbody>
|
||||
<row><entry><literal>alnum</literal></entry><entry>letters and digits</entry></row>
|
||||
<row><entry><literal>alpha</literal></entry><entry>letters</entry></row>
|
||||
<row><entry><literal>ascii</literal></entry><entry>character codes 0 - 127</entry></row>
|
||||
<row><entry><literal>blank</literal></entry><entry>space or tab only</entry></row>
|
||||
<row><entry><literal>cntrl</literal></entry><entry>control characters</entry></row>
|
||||
<row><entry><literal>digit</literal></entry><entry>decimal digits (same as \d)</entry></row>
|
||||
<row><entry><literal>graph</literal></entry><entry>printing characters, excluding space</entry></row>
|
||||
<row><entry><literal>lower</literal></entry><entry>lower case letters</entry></row>
|
||||
<row><entry><literal>print</literal></entry><entry>printing characters, including space</entry></row>
|
||||
<row><entry><literal>punct</literal></entry><entry>printing characters, excluding letters and digits</entry></row>
|
||||
<row><entry><literal>space</literal></entry><entry>white space (not quite the same as \s)</entry></row>
|
||||
<row><entry><literal>upper</literal></entry><entry>upper case letters</entry></row>
|
||||
<row><entry><literal>word</literal></entry><entry>"word" characters (same as \w)</entry></row>
|
||||
<row><entry><literal>xdigit</literal></entry><entry>hexadecimal digits</entry></row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
The <literal>space</literal> characters are HT (9), LF (10), VT (11), FF (12), CR (13),
|
||||
and space (32). Notice that this list includes the VT character (code
|
||||
11). This makes "space" different to <literal>\s</literal>, which does not include VT (for
|
||||
Perl compatibility).
|
||||
</para>
|
||||
<para>
|
||||
The name <literal>word</literal> is a Perl extension, and <literal>blank</literal> is a GNU extension
|
||||
from Perl 5.8. Another Perl extension is negation, which is indicated
|
||||
by a <literal>^</literal> character after the colon. For example,
|
||||
<literal>[12[:^digit:]]</literal> matches "1", "2", or any non-digit.
|
||||
</para>
|
||||
<para>
|
||||
In UTF-8 mode, characters with values greater than 128 do not match any
|
||||
of the POSIX character classes.
|
||||
</para>
|
||||
</section>
|
||||
|
||||
<section xml:id="regexp.reference.verticalbar">
|
||||
|
|
Loading…
Reference in a new issue