mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-15 08:28:54 +00:00

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@297028 c90b9560-bf6c-de11-be94-00142212c4b1
156 lines
5.6 KiB
XML
156 lines
5.6 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!-- $Revision$ -->
|
|
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
|
|
<article xml:id="reference.pcre.pattern.differences" xmlns="http://docbook.org/ns/docbook">
|
|
<title>Perl Differences</title>
|
|
<titleabbrev>Differences From Perl</titleabbrev>
|
|
<para>
|
|
The differences described here are with respect to Perl 5.005.
|
|
<orderedlist>
|
|
<listitem>
|
|
<simpara>
|
|
By default, a whitespace character is any character that
|
|
the C library function isspace() recognizes, though it is
|
|
possible to compile PCRE with alternative character type
|
|
tables. Normally isspace() matches space, formfeed, newline,
|
|
carriage return, horizontal tab, and vertical tab. Perl 5 no
|
|
longer includes vertical tab in its set of whitespace characters.
|
|
The \v escape that was in the Perl documentation for
|
|
a long time was never in fact recognized. However, the character
|
|
itself was treated as whitespace at least up to 5.002.
|
|
In 5.004 and 5.005 it does not match \s.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
PCRE does not allow repeat quantifiers on lookahead
|
|
assertions. Perl permits them, but they do not mean what you
|
|
might think. For example, (?!a){3} does not assert that the
|
|
next three characters are not "a". It just asserts that the
|
|
next character is not "a" three times.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
Capturing subpatterns that occur inside negative
|
|
lookahead assertions are counted, but their entries in the
|
|
offsets vector are never set. Perl sets its numerical
|
|
variables from any such patterns that are matched before the
|
|
assertion fails to match something (thereby succeeding), but
|
|
only if the negative lookahead assertion contains just one
|
|
branch.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
Though binary zero characters are supported in the subject string,
|
|
they are not allowed in a pattern string because it is passed as a
|
|
normal C string, terminated by zero. The escape sequence "\x00" can
|
|
be used in the pattern to represent a binary zero.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
The following Perl escape sequences are not supported:
|
|
\l, \u, \L, \U. In fact these are implemented by
|
|
Perl's general string-handling and are not part of its
|
|
pattern matching engine.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
The Perl \G assertion is not supported as it is not
|
|
relevant to single pattern matches.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
Fairly obviously, PCRE does not support the (?{code}) and (??{code})
|
|
construction. However, there is support for recursive patterns.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
There are at the time of writing some oddities in Perl
|
|
5.005_02 concerned with the settings of captured strings
|
|
when part of a pattern is repeated. For example, matching
|
|
"aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
|
|
"b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2
|
|
unset. However, if the pattern is changed to
|
|
/^(aa(b(b))?)+$/ then $2 (and $3) get set.
|
|
In Perl 5.004 $2 is set in both cases, and that is also &true;
|
|
of PCRE. If in the future Perl changes to a consistent state
|
|
that is different, PCRE may change to follow.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
Another as yet unresolved discrepancy is that in Perl
|
|
5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string
|
|
"a", whereas in PCRE it does not. However, in both Perl and
|
|
PCRE /^(a)?a/ matched against "a" leaves $1 unset.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
PCRE provides some extensions to the Perl regular
|
|
expression facilities:
|
|
<orderedlist>
|
|
<listitem>
|
|
<simpara>
|
|
Although lookbehind assertions must match fixed length
|
|
strings, each alternative branch of a lookbehind assertion
|
|
can match a different length of string. Perl 5.005 requires
|
|
them all to have the same length.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
If <link linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
|
|
is set and <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is
|
|
not set, the $ meta-character matches only at the very end of the
|
|
string.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
If <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> is
|
|
set, a backslash followed by a letter with no special meaning is
|
|
faulted.
|
|
</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>
|
|
If <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> is
|
|
set, the greediness of the repetition quantifiers is inverted,
|
|
that is, by default they are not greedy, but if followed by a
|
|
question mark they are.
|
|
</simpara>
|
|
</listitem>
|
|
</orderedlist>
|
|
</para>
|
|
</listitem>
|
|
</orderedlist>
|
|
</para>
|
|
</article>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-omittag:t
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
indent-tabs-mode:nil
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:nil
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
vim600: syn=xml fen fdm=syntax fdl=2 si
|
|
vim: et tw=78 syn=sgml
|
|
vi: ts=1 sw=1
|
|
-->
|