Character classes

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@295259 c90b9560-bf6c-de11-be94-00142212c4b1
2025-03-15 16:38:54 +00:00 · 2010-02-19 17:17:25 +00:00 · 2010-02-19 17:17:25 +00:00 · f5c36b9d90
commit f5c36b9d90
parent 34fe6dd1e0
1 changed files with 42 additions and 0 deletions
--- a/reference/pcre/pattern.syntax.xml
+++ b/reference/pcre/pattern.syntax.xml
@ -1040,6 +1040,48 @@
      terminator is always special and must be escaped when used
      within an expression.
     </para>
+     <para>
+      Perl supports the POSIX notation for character classes. This uses names
+      enclosed by <literal>[:</literal> and <literal>:]</literal> within the enclosing square brackets. PCRE also
+      supports this notation. For example, <literal>[01[:alpha:]%]</literal>
+      matches "0", "1", any alphabetic character, or "%". The supported class
+      names are:
+      <table>
+       <title>Character classes</title>
+       <tgroup cols="2">
+        <tbody>
+         <row><entry><literal>alnum</literal></entry><entry>letters and digits</entry></row>
+         <row><entry><literal>alpha</literal></entry><entry>letters</entry></row>
+         <row><entry><literal>ascii</literal></entry><entry>character codes 0 - 127</entry></row>
+         <row><entry><literal>blank</literal></entry><entry>space or tab only</entry></row>
+         <row><entry><literal>cntrl</literal></entry><entry>control characters</entry></row>
+         <row><entry><literal>digit</literal></entry><entry>decimal digits (same as \d)</entry></row>
+         <row><entry><literal>graph</literal></entry><entry>printing characters, excluding space</entry></row>
+         <row><entry><literal>lower</literal></entry><entry>lower case letters</entry></row>
+         <row><entry><literal>print</literal></entry><entry>printing characters, including space</entry></row>
+         <row><entry><literal>punct</literal></entry><entry>printing characters, excluding letters and digits</entry></row>
+         <row><entry><literal>space</literal></entry><entry>white space (not quite the same as \s)</entry></row>
+         <row><entry><literal>upper</literal></entry><entry>upper case letters</entry></row>
+         <row><entry><literal>word</literal></entry><entry>"word" characters (same as \w)</entry></row>
+         <row><entry><literal>xdigit</literal></entry><entry>hexadecimal digits</entry></row>
+        </tbody>
+       </tgroup>
+      </table>
+      The <literal>space</literal> characters are HT (9), LF (10), VT (11), FF (12), CR (13),
+      and space (32). Notice that this list includes the VT character (code
+      11). This makes "space" different to <literal>\s</literal>, which does not include VT (for
+      Perl compatibility).
+     </para>
+     <para>
+      The name <literal>word</literal> is a Perl extension, and <literal>blank</literal> is a GNU extension
+      from Perl 5.8. Another Perl extension is negation, which is indicated
+      by a <literal>^</literal> character after the colon. For example,
+      <literal>[12[:^digit:]]</literal> matches "1", "2", or any non-digit.
+     </para>
+     <para>
+      In UTF-8 mode, characters with values greater than 128 do not match any
+      of the POSIX character classes.
+     </para>
    </section>

    <section xml:id="regexp.reference.verticalbar">