mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-15 16:38:54 +00:00
Added information, fixed large lines.
git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@247722 c90b9560-bf6c-de11-be94-00142212c4b1
This commit is contained in:
parent
6e0001af5e
commit
77d57327f0
1 changed files with 62 additions and 43 deletions
|
@ -1,5 +1,5 @@
|
|||
<?xml version="1.0" encoding="iso-8859-1"?>
|
||||
<!-- $Revision: 1.18 $ -->
|
||||
<!-- $Revision: 1.19 $ -->
|
||||
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
|
||||
<refentry xml:id="reference.pcre.pattern.syntax" xmlns="http://docbook.org/ns/docbook">
|
||||
<refnamediv>
|
||||
|
@ -80,8 +80,8 @@
|
|||
</listitem>
|
||||
<listitem>
|
||||
<simpara>
|
||||
Fairly obviously, PCRE does not support the (?{code})
|
||||
construction.
|
||||
Fairly obviously, PCRE does not support the (?{code}) and (??{code})
|
||||
construction. However, there is support for recursive patterns.
|
||||
</simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
|
@ -658,18 +658,18 @@
|
|||
</varlistentry>
|
||||
</variablelist>
|
||||
<para>
|
||||
The property names represented by <literal>xx</literal> above are limited to the Unicode
|
||||
general category properties. Each character has exactly one such
|
||||
property, specified by a two-letter abbreviation. For compatibility with
|
||||
The property names represented by <literal>xx</literal> above are limited
|
||||
to the Unicode general category properties. Each character has exactly one
|
||||
such property, specified by a two-letter abbreviation. For compatibility with
|
||||
Perl, negation can be specified by including a circumflex between the
|
||||
opening brace and the property name. For example, <literal>\p{^Lu}</literal> is the same
|
||||
as <literal>\P{Lu}</literal>.
|
||||
opening brace and the property name. For example, <literal>\p{^Lu}</literal>
|
||||
is the same as <literal>\P{Lu}</literal>.
|
||||
</para>
|
||||
<para>
|
||||
If only one letter is specified with <literal>\p</literal> or <literal>\P</literal>, it includes all the
|
||||
properties that start with that letter. In this case, in the absence of
|
||||
negation, the curly brackets in the escape sequence are optional; these
|
||||
two examples have the same effect:
|
||||
If only one letter is specified with <literal>\p</literal> or
|
||||
<literal>\P</literal>, it includes all the properties that start with that
|
||||
letter. In this case, in the absence of negation, the curly brackets in the
|
||||
escape sequence are optional; these two examples have the same effect:
|
||||
</para>
|
||||
<literallayout>
|
||||
\p{L}
|
||||
|
@ -728,9 +728,9 @@
|
|||
For example, <literal>\p{Lu}</literal> always matches only upper case letters.
|
||||
</para>
|
||||
<para>
|
||||
The <literal>\X</literal> escape matches any number of Unicode characters that form an
|
||||
extended Unicode sequence. <literal>\X</literal> is equivalent to
|
||||
<literal>(?>\PM\pM*)</literal>.
|
||||
The <literal>\X</literal> escape matches any number of Unicode characters
|
||||
that form an extended Unicode sequence. <literal>\X</literal> is equivalent
|
||||
to <literal>(?>\PM\pM*)</literal>.
|
||||
</para>
|
||||
<para>
|
||||
That is, it matches a character without the "mark" property, followed
|
||||
|
@ -741,8 +741,9 @@
|
|||
<para>
|
||||
Matching characters by Unicode property is not fast, because PCRE has
|
||||
to search a structure that contains data for over fifteen thousand
|
||||
characters. That is why the traditional escape sequences such as <literal>\d</literal> and
|
||||
<literal>\w</literal> do not use Unicode properties in PCRE.
|
||||
characters. That is why the traditional escape sequences such as
|
||||
<literal>\d</literal> and <literal>\w</literal> do not use Unicode properties
|
||||
in PCRE.
|
||||
</para>
|
||||
</refsect2>
|
||||
|
||||
|
@ -801,7 +802,8 @@
|
|||
Note that the sequences \A, \Z, and \z can be used to match
|
||||
the start and end of the subject in both modes, and if all
|
||||
branches of a pattern start with \A is it always anchored,
|
||||
whether <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or not.
|
||||
whether <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>
|
||||
is set or not.
|
||||
</para>
|
||||
</refsect2>
|
||||
|
||||
|
@ -940,8 +942,8 @@
|
|||
<link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>,
|
||||
<link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link>,
|
||||
<link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link>,
|
||||
and <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
|
||||
can be changed from within the pattern by
|
||||
<link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
|
||||
and PCRE_DUPNAMES can be changed from within the pattern by
|
||||
a sequence of Perl option letters enclosed between "(?" and
|
||||
")". The option letters are:
|
||||
|
||||
|
@ -973,6 +975,10 @@
|
|||
<entry><literal>X</literal></entry>
|
||||
<entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link></entry>
|
||||
</row>
|
||||
<row>
|
||||
<entry><literal>J</literal></entry>
|
||||
<entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_INFO_JCHANGED</link></entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
|
@ -1021,7 +1027,8 @@
|
|||
compile time. There would be some very weird behaviour otherwise.
|
||||
</para>
|
||||
<para>
|
||||
The PCRE-specific options <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> and
|
||||
The PCRE-specific options <link
|
||||
linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> and
|
||||
<link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> can
|
||||
be changed in the same way as the Perl-compatible options by
|
||||
using the characters U and X respectively. The (?X) flag
|
||||
|
@ -1106,9 +1113,9 @@
|
|||
|
||||
<para>
|
||||
It is possible to name the subpattern with
|
||||
<literal>(?P<name>pattern)</literal> since PHP 4.3.3. Array with matches will
|
||||
contain the match indexed by the string alongside the match indexed by
|
||||
a number, then.
|
||||
<literal>(?P<name>pattern)</literal> since PHP 4.3.3. Array with
|
||||
matches will contain the match indexed by the string alongside the match
|
||||
indexed by a number, then.
|
||||
</para>
|
||||
</refsect2>
|
||||
|
||||
|
@ -1237,7 +1244,8 @@
|
|||
that is the only way the rest of the pattern matches.
|
||||
</para>
|
||||
<para>
|
||||
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> option is set (an option which is not
|
||||
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link>
|
||||
option is set (an option which is not
|
||||
available in Perl) then the quantifiers are not greedy by
|
||||
default, but individual ones can be made greedy by following
|
||||
them with a question mark. In other words, it inverts the
|
||||
|
@ -1248,7 +1256,8 @@
|
|||
as many characters as possible and don't return to match the rest of the
|
||||
pattern. Thus <literal>.*abc</literal> matches "aabc" but
|
||||
<literal>.*+abc</literal> doesn't because <literal>.*+</literal> eats the
|
||||
whole string. Possessive quantifiers can be used to speed up processing since PHP 4.3.3.
|
||||
whole string. Possessive quantifiers can be used to speed up processing
|
||||
since PHP 4.3.3.
|
||||
</para>
|
||||
<para>
|
||||
When a parenthesized subpattern is quantified with a minimum
|
||||
|
@ -1257,7 +1266,8 @@
|
|||
proportion to the size of the minimum or maximum.
|
||||
</para>
|
||||
<para>
|
||||
If a pattern starts with .* or .{0,} and the <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
|
||||
If a pattern starts with .* or .{0,} and the <link
|
||||
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
|
||||
option (equivalent to Perl's /s) is set, thus allowing the .
|
||||
to match newlines, then the pattern is implicitly anchored,
|
||||
because whatever follows will be tried against every character
|
||||
|
@ -1265,7 +1275,9 @@
|
|||
retrying the overall match at any position after the first.
|
||||
PCRE treats such a pattern as though it were preceded by \A.
|
||||
In cases where it is known that the subject string contains
|
||||
no newlines, it is worth setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> when the pattern begins with .* in order to
|
||||
no newlines, it is worth setting <link
|
||||
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> when the
|
||||
pattern begins with .* in order to
|
||||
obtain this optimization, or
|
||||
alternatively using ^ to indicate anchoring explicitly.
|
||||
</para>
|
||||
|
@ -1337,8 +1349,9 @@
|
|||
following the backslash are taken as part of a potential
|
||||
back reference number. If the pattern continues with a digit
|
||||
character, then some delimiter must be used to terminate the
|
||||
back reference. If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, this can
|
||||
be whitespace. Otherwise an empty comment can be used.
|
||||
back reference. If the <link
|
||||
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option
|
||||
is set, this can be whitespace. Otherwise an empty comment can be used.
|
||||
</para>
|
||||
<para>
|
||||
A back reference that occurs inside the parentheses to which
|
||||
|
@ -1360,8 +1373,8 @@
|
|||
<para>
|
||||
Back references to the named subpatterns can be achieved by
|
||||
<literal>(?P=name)</literal> or, since PHP 5.2.4, also by
|
||||
<literal>\k<name></literal>, <literal>\k'name'</literal> or
|
||||
<literal>\k{name}</literal>.
|
||||
<literal>\k<name></literal>, <literal>\k'name'</literal>,
|
||||
<literal>\k{name}</literal> or <literal>\g{name}</literal>.
|
||||
</para>
|
||||
</refsect2>
|
||||
|
||||
|
@ -1636,8 +1649,9 @@
|
|||
condition is satisfied if the capturing subpattern of that
|
||||
number has previously matched. Consider the following pattern,
|
||||
which contains non-significant white space to make it
|
||||
more readable (assume the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option) and to
|
||||
divide it into three parts for ease of discussion:
|
||||
more readable (assume the <link
|
||||
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
|
||||
option) and to divide it into three parts for ease of discussion:
|
||||
|
||||
<literal>( \( )? [^()]+ (?(1) \) )</literal>
|
||||
</para>
|
||||
|
@ -1693,9 +1707,10 @@
|
|||
comment play no part in the pattern matching at all.
|
||||
</para>
|
||||
<para>
|
||||
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, an unescaped # character
|
||||
outside a character class introduces a comment that
|
||||
continues up to the next newline character in the pattern.
|
||||
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
|
||||
option is set, an unescaped # character outside a character class
|
||||
introduces a comment that continues up to the next newline character
|
||||
in the pattern.
|
||||
</para>
|
||||
</refsect2>
|
||||
|
||||
|
@ -1763,9 +1778,10 @@
|
|||
</para>
|
||||
|
||||
<para>
|
||||
Since PHP 4.3.3, <literal>(?1)</literal>, <literal>(?2)</literal> and so on can be used
|
||||
for recursive subpatterns too. It is also possible to use named
|
||||
subpatterns: <literal>(?P<name>foo)</literal>.
|
||||
Since PHP 4.3.3, <literal>(?1)</literal>, <literal>(?2)</literal> and so on
|
||||
can be used for recursive subpatterns too. It is also possible to use named
|
||||
subpatterns: <literal>(?P>name)</literal> or
|
||||
<literal>(?P&name)</literal>.
|
||||
</para>
|
||||
<para>
|
||||
If the syntax for a recursive subpattern reference (either by number or
|
||||
|
@ -1803,10 +1819,12 @@
|
|||
regular expressions for efficient performance.
|
||||
</para>
|
||||
<para>
|
||||
When a pattern begins with .* and the <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> option is
|
||||
When a pattern begins with .* and the <link
|
||||
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> option is
|
||||
set, the pattern is implicitly anchored by PCRE, since it
|
||||
can match only at the start of a subject string. However, if
|
||||
<link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> is not set, PCRE cannot make this optimization,
|
||||
<link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
|
||||
is not set, PCRE cannot make this optimization,
|
||||
because the . metacharacter does not then match a newline,
|
||||
and if the subject string contains newlines, the pattern may
|
||||
match from the character immediately following one of them
|
||||
|
@ -1822,7 +1840,8 @@
|
|||
<para>
|
||||
If you are using such a pattern with subject strings that do
|
||||
not contain newlines, the best performance is obtained by
|
||||
setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>, or starting the pattern with ^.* to
|
||||
setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>,
|
||||
or starting the pattern with ^.* to
|
||||
indicate explicit anchoring. That saves PCRE from having to
|
||||
scan along the subject looking for a newline to restart at.
|
||||
</para>
|
||||
|
|
Loading…
Reference in a new issue