mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-27 06:18:56 +00:00

What's done: the migration guide itself, updates to existing functions and functionality, and new language features. What's not done: stub pages for new functions and methods. (I'd rather get this in now, though, than have it languish on my hard drive any longer so that other people can contribute.) What could use help: the generator chapter is pretty rough and ready. Better examples would be awesome. I certainly won't argue if somebody wants to add the function stubs, either. git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@328430 c90b9560-bf6c-de11-be94-00142212c4b1
285 lines
11 KiB
XML
285 lines
11 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!-- $Revision$ -->
|
|
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
|
|
<article xml:id="reference.pcre.pattern.modifiers" xmlns="http://docbook.org/ns/docbook">
|
|
<title>Pattern Modifiers</title>
|
|
<titleabbrev>Possible modifiers in regex patterns</titleabbrev>
|
|
<para>
|
|
The current possible PCRE modifiers are listed below. The names
|
|
in parentheses refer to internal PCRE names for these modifiers.
|
|
Spaces and newlines are ignored in modifiers, other characters cause error.
|
|
</para>
|
|
<para>
|
|
<blockquote>
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><emphasis>i</emphasis> (<literal>PCRE_CASELESS</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
If this modifier is set, letters in the pattern match both
|
|
upper and lower case letters.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>m</emphasis> (<literal>PCRE_MULTILINE</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
By default, PCRE treats the subject string as consisting of a
|
|
single "line" of characters (even if it actually contains
|
|
several newlines). The "start of line" metacharacter (^)
|
|
matches only at the start of the string, while the "end of
|
|
line" metacharacter ($) matches only at the end of the
|
|
string, or before a terminating newline (unless
|
|
<emphasis>D</emphasis> modifier is set). This is the same as
|
|
Perl.
|
|
</simpara>
|
|
<simpara>
|
|
When this modifier is set, the "start of line" and "end of
|
|
line" constructs match immediately following or immediately
|
|
before any newline in the subject string, respectively, as
|
|
well as at the very start and end. This is equivalent to
|
|
Perl's /m modifier. If there are no "\n" characters in a
|
|
subject string, or no occurrences of ^ or $ in a pattern,
|
|
setting this modifier has no effect.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>s</emphasis> (<literal>PCRE_DOTALL</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
If this modifier is set, a dot metacharacter in the pattern
|
|
matches all characters, including newlines. Without it,
|
|
newlines are excluded. This modifier is equivalent to Perl's
|
|
/s modifier. A negative class such as [^a] always matches a
|
|
newline character, independent of the setting of this
|
|
modifier.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>x</emphasis> (<literal>PCRE_EXTENDED</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
If this modifier is set, whitespace data characters in the
|
|
pattern are totally ignored except when escaped or inside a
|
|
character class, and characters between an unescaped #
|
|
outside a character class and the next newline character,
|
|
inclusive, are also ignored. This is equivalent to Perl's /x
|
|
modifier, and makes it possible to include commentary inside
|
|
complicated patterns. Note, however, that this applies only
|
|
to data characters. Whitespace characters may never appear
|
|
within special character sequences in a pattern, for example
|
|
within the sequence (?( which introduces a conditional
|
|
subpattern.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>e</emphasis> (<literal>PREG_REPLACE_EVAL</literal>)</term>
|
|
<listitem>
|
|
&warn.deprecated.feature-5-5-0;
|
|
<simpara>
|
|
If this modifier is set, <function>preg_replace</function>
|
|
does normal substitution of backreferences in the
|
|
replacement string, evaluates it as PHP code, and uses the
|
|
result for replacing the search string.
|
|
Single quotes, double quotes, backslashes (<literal>\</literal>) and NULL chars will
|
|
be escaped by backslashes in substituted backreferences.
|
|
</simpara>
|
|
<caution>
|
|
<para>
|
|
The <function>addslashes</function> function is run on each matched backreference before
|
|
the substitution takes place. As such, when the backreference
|
|
is used as a quoted string, escaped characters will be converted
|
|
to literals. However, characters which are escaped, which would
|
|
normally not be converted, will retain their slashes. This makes
|
|
use of this modifier very complicated.
|
|
</para>
|
|
</caution>
|
|
<caution>
|
|
<para>
|
|
Make sure that <parameter>replacement</parameter> constitutes a valid PHP code string,
|
|
otherwise PHP will complain about a parse error at the line containing
|
|
<function>preg_replace</function>.
|
|
</para>
|
|
</caution>
|
|
<caution>
|
|
<para>
|
|
Use of this modifier is <emphasis>discouraged</emphasis>, as it can easily introduce
|
|
security vulnerabilites:
|
|
</para>
|
|
<informalexample>
|
|
<programlisting role="php">
|
|
<![CDATA[
|
|
<?php
|
|
$html = $_POST['html'];
|
|
|
|
// uppercase headings
|
|
$html = preg_replace(
|
|
'(<h([1-6])>(.*?)</h\1>)e',
|
|
'"<h$1>" . strtoupper("$2") . "</h$1>"',
|
|
$html
|
|
);
|
|
]]>
|
|
</programlisting>
|
|
</informalexample>
|
|
<para>
|
|
The above example code can be easily exploited by passing in a string such as
|
|
<literal><h1>{${eval($_GET[php_code])}}</h1></literal>. This gives
|
|
the attacker the ability to execute arbitrary PHP code and as such gives him
|
|
nearly complete access to your server.
|
|
</para>
|
|
<para>
|
|
To prevent this kind of remote code execution vulnerability the
|
|
<function>preg_replace_callback</function> function should be used instead:
|
|
</para>
|
|
<informalexample>
|
|
<programlisting role="php">
|
|
<![CDATA[
|
|
<?php
|
|
$html = $_POST['html'];
|
|
|
|
// uppercase headings
|
|
$html = preg_replace_callback(
|
|
'(<h([1-6])>(.*?)</h\1>)',
|
|
function ($m) {
|
|
return "<h$m[1]>" . strtoupper($m[2]) . "</h$m[1]>";
|
|
},
|
|
$html
|
|
);
|
|
]]>
|
|
</programlisting>
|
|
</informalexample>
|
|
</caution>
|
|
<note>
|
|
<para>
|
|
Only <function>preg_replace</function> uses this modifier;
|
|
it is ignored by other PCRE functions.
|
|
</para>
|
|
</note>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>A</emphasis> (<literal>PCRE_ANCHORED</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
If this modifier is set, the pattern is forced to be
|
|
"anchored", that is, it is constrained to match only at the
|
|
start of the string which is being searched (the "subject
|
|
string"). This effect can also be achieved by appropriate
|
|
constructs in the pattern itself, which is the only way to
|
|
do it in Perl.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>D</emphasis> (<literal>PCRE_DOLLAR_ENDONLY</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
If this modifier is set, a dollar metacharacter in the pattern
|
|
matches only at the end of the subject string. Without this
|
|
modifier, a dollar also matches immediately before the final
|
|
character if it is a newline (but not before any other
|
|
newlines). This modifier is ignored if <emphasis>m</emphasis>
|
|
modifier is set. There is no equivalent to this modifier in
|
|
Perl.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>S</emphasis></term>
|
|
<listitem>
|
|
<simpara>
|
|
When a pattern is going to be used several times, it is
|
|
worth spending more time analyzing it in order to speed up
|
|
the time taken for matching. If this modifier is set, then
|
|
this extra analysis is performed. At present, studying a
|
|
pattern is useful only for non-anchored patterns that do not
|
|
have a single fixed starting character.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>U</emphasis> (<literal>PCRE_UNGREEDY</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
This modifier inverts the "greediness" of the quantifiers so
|
|
that they are not greedy by default, but become greedy if
|
|
followed by <literal>?</literal>. It is not compatible with Perl. It can also
|
|
be set by a (<literal>?U</literal>)
|
|
<link linkend="regexp.reference.internal-options">modifier setting within
|
|
the pattern</link> or by a question mark behind a quantifier (e.g.
|
|
<literal>.*?</literal>).
|
|
</simpara>
|
|
<note>
|
|
<para>
|
|
It is usually not possible to match more than <link
|
|
linkend="ini.pcre.backtrack-limit">pcre.backtrack_limit</link>
|
|
characters in ungreedy mode.
|
|
</para>
|
|
</note>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>X</emphasis> (<literal>PCRE_EXTRA</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
This modifier turns on additional functionality of PCRE that
|
|
is incompatible with Perl. Any backslash in a pattern that
|
|
is followed by a letter that has no special meaning causes
|
|
an error, thus reserving these combinations for future
|
|
expansion. By default, as in Perl, a backslash followed by a
|
|
letter with no special meaning is treated as a literal.
|
|
There are at present no other features controlled by this
|
|
modifier.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>J</emphasis> (<literal>PCRE_INFO_JCHANGED</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
The (?J) internal option setting changes the local <literal>PCRE_DUPNAMES</literal>
|
|
option. Allow duplicate names for subpatterns.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><emphasis>u</emphasis> (<literal>PCRE_UTF8</literal>)</term>
|
|
<listitem>
|
|
<simpara>
|
|
This modifier turns on additional functionality of PCRE that
|
|
is incompatible with Perl. Pattern strings are treated as
|
|
UTF-8. This modifier is available from PHP 4.1.0 or greater
|
|
on Unix and from PHP 4.2.3 on win32.
|
|
UTF-8 validity of the pattern is checked since PHP 4.3.5.
|
|
</simpara>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</blockquote>
|
|
</para>
|
|
</article>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-omittag:t
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
indent-tabs-mode:nil
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:nil
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
vim600: syn=xml fen fdm=syntax fdl=2 si
|
|
vim: et tw=78 syn=sgml
|
|
vi: ts=1 sw=1
|
|
-->
|