Added information, fixed large lines.

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@247722 c90b9560-bf6c-de11-be94-00142212c4b1
This commit is contained in:
Felipe Pena 2007-12-06 22:54:23 +00:00
parent 6e0001af5e
commit 77d57327f0

View file

@ -1,5 +1,5 @@
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- $Revision: 1.18 $ -->
<!-- $Revision: 1.19 $ -->
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
<refentry xml:id="reference.pcre.pattern.syntax" xmlns="http://docbook.org/ns/docbook">
<refnamediv>
@ -80,8 +80,8 @@
</listitem>
<listitem>
<simpara>
Fairly obviously, PCRE does not support the (?{code})
construction.
Fairly obviously, PCRE does not support the (?{code}) and (??{code})
construction. However, there is support for recursive patterns.
</simpara>
</listitem>
<listitem>
@ -658,18 +658,18 @@
</varlistentry>
</variablelist>
<para>
The property names represented by <literal>xx</literal> above are limited to the Unicode
general category properties. Each character has exactly one such
property, specified by a two-letter abbreviation. For compatibility with
The property names represented by <literal>xx</literal> above are limited
to the Unicode general category properties. Each character has exactly one
such property, specified by a two-letter abbreviation. For compatibility with
Perl, negation can be specified by including a circumflex between the
opening brace and the property name. For example, <literal>\p{^Lu}</literal> is the same
as <literal>\P{Lu}</literal>.
opening brace and the property name. For example, <literal>\p{^Lu}</literal>
is the same as <literal>\P{Lu}</literal>.
</para>
<para>
If only one letter is specified with <literal>\p</literal> or <literal>\P</literal>, it includes all the
properties that start with that letter. In this case, in the absence of
negation, the curly brackets in the escape sequence are optional; these
two examples have the same effect:
If only one letter is specified with <literal>\p</literal> or
<literal>\P</literal>, it includes all the properties that start with that
letter. In this case, in the absence of negation, the curly brackets in the
escape sequence are optional; these two examples have the same effect:
</para>
<literallayout>
\p{L}
@ -728,9 +728,9 @@
For example, <literal>\p{Lu}</literal> always matches only upper case letters.
</para>
<para>
The <literal>\X</literal> escape matches any number of Unicode characters that form an
extended Unicode sequence. <literal>\X</literal> is equivalent to
<literal>(?>\PM\pM*)</literal>.
The <literal>\X</literal> escape matches any number of Unicode characters
that form an extended Unicode sequence. <literal>\X</literal> is equivalent
to <literal>(?>\PM\pM*)</literal>.
</para>
<para>
That is, it matches a character without the "mark" property, followed
@ -741,8 +741,9 @@
<para>
Matching characters by Unicode property is not fast, because PCRE has
to search a structure that contains data for over fifteen thousand
characters. That is why the traditional escape sequences such as <literal>\d</literal> and
<literal>\w</literal> do not use Unicode properties in PCRE.
characters. That is why the traditional escape sequences such as
<literal>\d</literal> and <literal>\w</literal> do not use Unicode properties
in PCRE.
</para>
</refsect2>
@ -801,7 +802,8 @@
Note that the sequences \A, \Z, and \z can be used to match
the start and end of the subject in both modes, and if all
branches of a pattern start with \A is it always anchored,
whether <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or not.
whether <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>
is set or not.
</para>
</refsect2>
@ -940,8 +942,8 @@
<link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>,
<link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link>,
<link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link>,
and <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
can be changed from within the pattern by
<link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
and PCRE_DUPNAMES can be changed from within the pattern by
a sequence of Perl option letters enclosed between "(?" and
")". The option letters are:
@ -973,6 +975,10 @@
<entry><literal>X</literal></entry>
<entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link></entry>
</row>
<row>
<entry><literal>J</literal></entry>
<entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_INFO_JCHANGED</link></entry>
</row>
</tbody>
</tgroup>
</table>
@ -1021,7 +1027,8 @@
compile time. There would be some very weird behaviour otherwise.
</para>
<para>
The PCRE-specific options <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> and
The PCRE-specific options <link
linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> and
<link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> can
be changed in the same way as the Perl-compatible options by
using the characters U and X respectively. The (?X) flag
@ -1106,9 +1113,9 @@
<para>
It is possible to name the subpattern with
<literal>(?P&lt;name&gt;pattern)</literal> since PHP 4.3.3. Array with matches will
contain the match indexed by the string alongside the match indexed by
a number, then.
<literal>(?P&lt;name&gt;pattern)</literal> since PHP 4.3.3. Array with
matches will contain the match indexed by the string alongside the match
indexed by a number, then.
</para>
</refsect2>
@ -1237,7 +1244,8 @@
that is the only way the rest of the pattern matches.
</para>
<para>
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> option is set (an option which is not
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link>
option is set (an option which is not
available in Perl) then the quantifiers are not greedy by
default, but individual ones can be made greedy by following
them with a question mark. In other words, it inverts the
@ -1248,7 +1256,8 @@
as many characters as possible and don't return to match the rest of the
pattern. Thus <literal>.*abc</literal> matches "aabc" but
<literal>.*+abc</literal> doesn't because <literal>.*+</literal> eats the
whole string. Possessive quantifiers can be used to speed up processing since PHP 4.3.3.
whole string. Possessive quantifiers can be used to speed up processing
since PHP 4.3.3.
</para>
<para>
When a parenthesized subpattern is quantified with a minimum
@ -1257,7 +1266,8 @@
proportion to the size of the minimum or maximum.
</para>
<para>
If a pattern starts with .* or .{0,} and the <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
If a pattern starts with .* or .{0,} and the <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
option (equivalent to Perl's /s) is set, thus allowing the .
to match newlines, then the pattern is implicitly anchored,
because whatever follows will be tried against every character
@ -1265,7 +1275,9 @@
retrying the overall match at any position after the first.
PCRE treats such a pattern as though it were preceded by \A.
In cases where it is known that the subject string contains
no newlines, it is worth setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> when the pattern begins with .* in order to
no newlines, it is worth setting <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> when the
pattern begins with .* in order to
obtain this optimization, or
alternatively using ^ to indicate anchoring explicitly.
</para>
@ -1337,8 +1349,9 @@
following the backslash are taken as part of a potential
back reference number. If the pattern continues with a digit
character, then some delimiter must be used to terminate the
back reference. If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, this can
be whitespace. Otherwise an empty comment can be used.
back reference. If the <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option
is set, this can be whitespace. Otherwise an empty comment can be used.
</para>
<para>
A back reference that occurs inside the parentheses to which
@ -1360,8 +1373,8 @@
<para>
Back references to the named subpatterns can be achieved by
<literal>(?P=name)</literal> or, since PHP 5.2.4, also by
<literal>\k&lt;name&gt;</literal>, <literal>\k'name'</literal> or
<literal>\k{name}</literal>.
<literal>\k&lt;name&gt;</literal>, <literal>\k'name'</literal>,
<literal>\k{name}</literal> or <literal>\g{name}</literal>.
</para>
</refsect2>
@ -1636,8 +1649,9 @@
condition is satisfied if the capturing subpattern of that
number has previously matched. Consider the following pattern,
which contains non-significant white space to make it
more readable (assume the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option) and to
divide it into three parts for ease of discussion:
more readable (assume the <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
option) and to divide it into three parts for ease of discussion:
<literal>( \( )? [^()]+ (?(1) \) )</literal>
</para>
@ -1693,9 +1707,10 @@
comment play no part in the pattern matching at all.
</para>
<para>
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, an unescaped # character
outside a character class introduces a comment that
continues up to the next newline character in the pattern.
If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
option is set, an unescaped # character outside a character class
introduces a comment that continues up to the next newline character
in the pattern.
</para>
</refsect2>
@ -1763,9 +1778,10 @@
</para>
<para>
Since PHP 4.3.3, <literal>(?1)</literal>, <literal>(?2)</literal> and so on can be used
for recursive subpatterns too. It is also possible to use named
subpatterns: <literal>(?P&lt;name&gt;foo)</literal>.
Since PHP 4.3.3, <literal>(?1)</literal>, <literal>(?2)</literal> and so on
can be used for recursive subpatterns too. It is also possible to use named
subpatterns: <literal>(?P&gt;name)</literal> or
<literal>(?P&amp;name)</literal>.
</para>
<para>
If the syntax for a recursive subpattern reference (either by number or
@ -1803,10 +1819,12 @@
regular expressions for efficient performance.
</para>
<para>
When a pattern begins with .* and the <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> option is
When a pattern begins with .* and the <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> option is
set, the pattern is implicitly anchored by PCRE, since it
can match only at the start of a subject string. However, if
<link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> is not set, PCRE cannot make this optimization,
<link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
is not set, PCRE cannot make this optimization,
because the . metacharacter does not then match a newline,
and if the subject string contains newlines, the pattern may
match from the character immediately following one of them
@ -1822,7 +1840,8 @@
<para>
If you are using such a pattern with subject strings that do
not contain newlines, the best performance is obtained by
setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>, or starting the pattern with ^.* to
setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>,
or starting the pattern with ^.* to
indicate explicit anchoring. That saves PCRE from having to
scan along the subject looking for a newline to restart at.
</para>