2010-03-28 22:10:10 +00:00
|
|
|
<?xml version="1.0" encoding="utf-8"?>
|
2009-07-11 08:31:40 +00:00
|
|
|
<!-- $Revision$ -->
|
2007-12-29 01:25:12 +00:00
|
|
|
<article xml:id="reference.pcre.pattern.modifiers" xmlns="http://docbook.org/ns/docbook">
|
|
|
|
<title>Pattern Modifiers</title>
|
2008-12-06 10:39:24 +00:00
|
|
|
<titleabbrev>Possible modifiers in regex patterns</titleabbrev>
|
2004-09-14 21:28:40 +00:00
|
|
|
<para>
|
|
|
|
The current possible PCRE modifiers are listed below. The names
|
|
|
|
in parentheses refer to internal PCRE names for these modifiers.
|
2005-06-17 11:40:21 +00:00
|
|
|
Spaces and newlines are ignored in modifiers, other characters cause error.
|
2004-09-14 21:28:40 +00:00
|
|
|
</para>
|
|
|
|
<para>
|
|
|
|
<blockquote>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>i</emphasis> (<literal>PCRE_CASELESS</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
2003-06-16 20:03:02 +00:00
|
|
|
<simpara>
|
2004-09-14 21:28:40 +00:00
|
|
|
If this modifier is set, letters in the pattern match both
|
|
|
|
upper and lower case letters.
|
2003-06-16 20:03:02 +00:00
|
|
|
</simpara>
|
2004-09-14 21:28:40 +00:00
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>m</emphasis> (<literal>PCRE_MULTILINE</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
By default, PCRE treats the subject string as consisting of a
|
|
|
|
single "line" of characters (even if it actually contains
|
|
|
|
several newlines). The "start of line" metacharacter (^)
|
|
|
|
matches only at the start of the string, while the "end of
|
|
|
|
line" metacharacter ($) matches only at the end of the
|
|
|
|
string, or before a terminating newline (unless
|
|
|
|
<emphasis>D</emphasis> modifier is set). This is the same as
|
|
|
|
Perl.
|
|
|
|
</simpara>
|
|
|
|
<simpara>
|
|
|
|
When this modifier is set, the "start of line" and "end of
|
|
|
|
line" constructs match immediately following or immediately
|
|
|
|
before any newline in the subject string, respectively, as
|
|
|
|
well as at the very start and end. This is equivalent to
|
|
|
|
Perl's /m modifier. If there are no "\n" characters in a
|
|
|
|
subject string, or no occurrences of ^ or $ in a pattern,
|
|
|
|
setting this modifier has no effect.
|
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>s</emphasis> (<literal>PCRE_DOTALL</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
If this modifier is set, a dot metacharacter in the pattern
|
|
|
|
matches all characters, including newlines. Without it,
|
|
|
|
newlines are excluded. This modifier is equivalent to Perl's
|
|
|
|
/s modifier. A negative class such as [^a] always matches a
|
|
|
|
newline character, independent of the setting of this
|
|
|
|
modifier.
|
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>x</emphasis> (<literal>PCRE_EXTENDED</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
If this modifier is set, whitespace data characters in the
|
|
|
|
pattern are totally ignored except when escaped or inside a
|
|
|
|
character class, and characters between an unescaped #
|
|
|
|
outside a character class and the next newline character,
|
|
|
|
inclusive, are also ignored. This is equivalent to Perl's /x
|
2012-06-04 09:48:06 +00:00
|
|
|
modifier, and makes it possible to include commentary inside
|
2004-09-14 21:28:40 +00:00
|
|
|
complicated patterns. Note, however, that this applies only
|
|
|
|
to data characters. Whitespace characters may never appear
|
|
|
|
within special character sequences in a pattern, for example
|
|
|
|
within the sequence (?( which introduces a conditional
|
|
|
|
subpattern.
|
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>A</emphasis> (<literal>PCRE_ANCHORED</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
If this modifier is set, the pattern is forced to be
|
|
|
|
"anchored", that is, it is constrained to match only at the
|
|
|
|
start of the string which is being searched (the "subject
|
|
|
|
string"). This effect can also be achieved by appropriate
|
|
|
|
constructs in the pattern itself, which is the only way to
|
|
|
|
do it in Perl.
|
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>D</emphasis> (<literal>PCRE_DOLLAR_ENDONLY</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
If this modifier is set, a dollar metacharacter in the pattern
|
|
|
|
matches only at the end of the subject string. Without this
|
|
|
|
modifier, a dollar also matches immediately before the final
|
|
|
|
character if it is a newline (but not before any other
|
|
|
|
newlines). This modifier is ignored if <emphasis>m</emphasis>
|
|
|
|
modifier is set. There is no equivalent to this modifier in
|
|
|
|
Perl.
|
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
|
|
<term><emphasis>S</emphasis></term>
|
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
When a pattern is going to be used several times, it is
|
|
|
|
worth spending more time analyzing it in order to speed up
|
|
|
|
the time taken for matching. If this modifier is set, then
|
|
|
|
this extra analysis is performed. At present, studying a
|
|
|
|
pattern is useful only for non-anchored patterns that do not
|
|
|
|
have a single fixed starting character.
|
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>U</emphasis> (<literal>PCRE_UNGREEDY</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
This modifier inverts the "greediness" of the quantifiers so
|
|
|
|
that they are not greedy by default, but become greedy if
|
2009-05-08 21:30:58 +00:00
|
|
|
followed by <literal>?</literal>. It is not compatible with Perl. It can also
|
|
|
|
be set by a (<literal>?U</literal>)
|
2004-09-14 21:28:40 +00:00
|
|
|
<link linkend="regexp.reference.internal-options">modifier setting within
|
2004-09-15 07:22:26 +00:00
|
|
|
the pattern</link> or by a question mark behind a quantifier (e.g.
|
|
|
|
<literal>.*?</literal>).
|
2004-09-14 21:28:40 +00:00
|
|
|
</simpara>
|
2011-02-10 13:03:45 +00:00
|
|
|
<note>
|
|
|
|
<para>
|
|
|
|
It is usually not possible to match more than <link
|
|
|
|
linkend="ini.pcre.backtrack-limit">pcre.backtrack_limit</link>
|
|
|
|
characters in ungreedy mode.
|
|
|
|
</para>
|
|
|
|
</note>
|
2004-09-14 21:28:40 +00:00
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>X</emphasis> (<literal>PCRE_EXTRA</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
This modifier turns on additional functionality of PCRE that
|
|
|
|
is incompatible with Perl. Any backslash in a pattern that
|
|
|
|
is followed by a letter that has no special meaning causes
|
|
|
|
an error, thus reserving these combinations for future
|
|
|
|
expansion. By default, as in Perl, a backslash followed by a
|
|
|
|
letter with no special meaning is treated as a literal.
|
|
|
|
There are at present no other features controlled by this
|
|
|
|
modifier.
|
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
2007-12-06 22:55:20 +00:00
|
|
|
<varlistentry>
|
2009-05-08 21:30:58 +00:00
|
|
|
<term><emphasis>J</emphasis> (<literal>PCRE_INFO_JCHANGED</literal>)</term>
|
2007-12-06 22:55:20 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
2009-05-08 21:30:58 +00:00
|
|
|
The (?J) internal option setting changes the local <literal>PCRE_DUPNAMES</literal>
|
|
|
|
option. Allow duplicate names for subpatterns.
|
2017-11-28 16:10:38 +00:00
|
|
|
As of PHP 7.2.0 <literal>J</literal> is supported as modifier as well.
|
2007-12-06 22:55:20 +00:00
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
2004-09-14 21:28:40 +00:00
|
|
|
<varlistentry>
|
2012-03-06 15:50:53 +00:00
|
|
|
<term><emphasis>u</emphasis> (<literal>PCRE_UTF8</literal>)</term>
|
2004-09-14 21:28:40 +00:00
|
|
|
<listitem>
|
|
|
|
<simpara>
|
|
|
|
This modifier turns on additional functionality of PCRE that
|
2014-06-01 13:12:06 +00:00
|
|
|
is incompatible with Perl. Pattern and subject strings are
|
2016-12-05 10:17:07 +00:00
|
|
|
treated as UTF-8. An invalid subject will cause the preg_* function to
|
2014-06-01 13:12:06 +00:00
|
|
|
match nothing; an invalid pattern will trigger an error of
|
|
|
|
level E_WARNING. Five and six octet UTF-8 sequences are
|
2021-04-30 00:55:03 +00:00
|
|
|
regarded as invalid.
|
2004-09-14 21:28:40 +00:00
|
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</blockquote>
|
|
|
|
</para>
|
2007-12-29 01:25:12 +00:00
|
|
|
</article>
|
2002-04-15 00:12:54 +00:00
|
|
|
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
|
|
Local variables:
|
|
|
|
mode: sgml
|
|
|
|
sgml-omittag:t
|
|
|
|
sgml-shorttag:t
|
|
|
|
sgml-minimize-attributes:nil
|
|
|
|
sgml-always-quote-attributes:t
|
|
|
|
sgml-indent-step:1
|
|
|
|
sgml-indent-data:t
|
|
|
|
indent-tabs-mode:nil
|
|
|
|
sgml-parent-document:nil
|
2009-09-25 07:04:39 +00:00
|
|
|
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
|
2002-04-15 00:12:54 +00:00
|
|
|
sgml-exposed-tags:nil
|
|
|
|
sgml-local-catalogs:nil
|
|
|
|
sgml-local-ecat-files:nil
|
|
|
|
End:
|
|
|
|
vim600: syn=xml fen fdm=syntax fdl=2 si
|
|
|
|
vim: et tw=78 syn=sgml
|
|
|
|
vi: ts=1 sw=1
|
|
|
|
-->
|