There were forgotten division-marks inside words (Jakub Vrana)

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@145358 c90b9560-bf6c-de11-be94-00142212c4b1
This commit is contained in:
Mehdi Achour 2003-12-01 23:44:09 +00:00
parent 001059fd7f
commit 73dbf9a98f

View file

@ -1,5 +1,5 @@
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- $Revision: 1.5 $ -->
<!-- $Revision: 1.6 $ -->
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
<refentry id="pcre.pattern.syntax">
<refnamediv>
@ -47,10 +47,10 @@
</listitem>
<listitem>
<simpara>
Capturing subpatterns that occur inside negative looka-
head assertions are counted, but their entries in the
offsets vector are never set. Perl sets its numerical vari-
ables from any such patterns that are matched before the
Capturing subpatterns that occur inside negative
lookahead assertions are counted, but their entries in the
offsets vector are never set. Perl sets its numerical
variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but
only if the negative lookahead assertion contains just one
branch.
@ -68,8 +68,8 @@
<simpara>
The following Perl escape sequences are not supported:
\l, \u, \L, \U, \E, \Q. In fact these are implemented by
Perl's general string-handling and are not part of its pat-
tern matching engine.
Perl's general string-handling and are not part of its
pattern matching engine.
</simpara>
</listitem>
<listitem>
@ -123,7 +123,7 @@
<simpara>
If <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> is set and
<link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is not
set, the $ meta- character matches only at the very end of
set, the $ meta-character matches only at the very end of
the string.
</simpara>
</listitem>
@ -135,8 +135,8 @@
</listitem>
<listitem>
<simpara>
If <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link> is set, the greediness of the repeti-
tion quantifiers is inverted, that is, by default they are
If <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link> is set, the greediness of the
repetition quantifiers is inverted, that is, by default they are
not greedy, but if followed by a question mark they are.
</simpara>
</listitem>
@ -152,8 +152,8 @@
<refsect2 id="regexp.introduction">
<title>Introduction</title>
<para>
The syntax and semantics of the regular expressions sup-
ported by PCRE are described below. Regular expressions are
The syntax and semantics of the regular expressions
supported by PCRE are described below. Regular expressions are
also described in the Perl documentation and in a number of
other books, some of which have copious examples. Jeffrey
Friedl's "Mastering Regular Expressions", published by
@ -162,8 +162,8 @@
A regular expression is a pattern that is matched against a
subject string from left to right. Most characters stand for
themselves in a pattern, and match the corresponding charac-
ters in the subject. As a trivial example, the pattern
themselves in a pattern, and match the corresponding
characters in the subject. As a trivial example, the pattern
<literal>The quick brown fox</literal>
matches a portion of a subject string that is identical to
itself.
@ -173,9 +173,9 @@
<title>Meta-characters</title>
<para>
The power of regular expressions comes from the
ability to include alternatives and repetitions in the pat-
tern. These are encoded in the pattern by the use of <emphasis>meta</emphasis>-
<emphasis>characters</emphasis>, which do not stand for themselves but instead
ability to include alternatives and repetitions in the
pattern. These are encoded in the pattern by the use of
<emphasis>meta-characters</emphasis>, which do not stand for themselves but instead
are interpreted in some special way.
</para>
<para>
@ -299,8 +299,8 @@
</variablelist>
Part of a pattern that is in square brackets is called a
"character class". In a character class the only meta-
characters are:
"character class". In a character class the only
meta-characters are:
<variablelist>
<varlistentry>
<term><emphasis>\</emphasis></term>
@ -350,23 +350,23 @@
</para>
<para>
For example, if you want to match a "*" character, you write
"\*" in the pattern. This applies whether or not the follow-
ing character would otherwise be interpreted as a meta-
character, so it is always safe to precede a non-alphanumeric
with "\" to specify that it stands for itself. In particu-
lar, if you want to match a backslash, you write "\\".
"\*" in the pattern. This applies whether or not the
following character would otherwise be interpreted as a
meta-character, so it is always safe to precede a non-alphanumeric
with "\" to specify that it stands for itself. In
particular, if you want to match a backslash, you write "\\".
</para>
<para>
If a pattern is compiled with the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option, whi-
tespace in the pattern (other than in a character class) and
If a pattern is compiled with the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option,
whitespace in the pattern (other than in a character class) and
characters between a "#" outside a character class and the
next newline character are ignored. An escaping backslash
can be used to include a whitespace or "#" character as part
of the pattern.
</para>
<para>
A second use of backslash provides a way of encoding non-
printing characters in patterns in a visible manner. There
A second use of backslash provides a way of encoding
non-printing characters in patterns in a visible manner. There
is no restriction on the appearance of non-printing characters,
apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is
@ -569,8 +569,8 @@
</variablelist>
</para>
<para>
Note that octal values of 100 or greater must not be intro-
duced by a leading zero, because no more than three octal
Note that octal values of 100 or greater must not be
introduced by a leading zero, because no more than three octal
digits are ever read.
</para>
<para>
@ -581,8 +581,8 @@
class it has a different meaning (see below).
</para>
<para>
The third use of backslash is for specifying generic charac-
ter types:
The third use of backslash is for specifying generic
character types:
</para>
<para>
<variablelist>
@ -647,8 +647,8 @@
Perl "<literal>word</literal>". The definition of letters and digits is
controlled by PCRE's character tables, and may vary if locale-specific
matching is taking place (see "Locale support"
above). For example, in the "fr" (French) locale, some char-
acter codes greater than 128 are used for accented letters,
above). For example, in the "fr" (French) locale, some
character codes greater than 128 are used for accented letters,
and these are matched by <literal>\w</literal>.
</para>
<para>
@ -659,8 +659,8 @@
is no character to match.
</para>
<para>
The fourth use of backslash is for certain simple asser-
tions. An assertion specifies a condition that has to be met
The fourth use of backslash is for certain simple
assertions. An assertion specifies a condition that has to be met
at a particular point in a match, without consuming any
characters from the subject string. The use of subpatterns
for more complicated assertions is described below. The
@ -752,11 +752,11 @@
Circumflex need not be the first character of the pattern if
a number of alternatives are involved, but it should be the
first thing in each alternative in which it appears if the
pattern is ever to match that branch. If all possible alter-
natives start with a circumflex, that is, if the pattern is
pattern is ever to match that branch. If all possible
alternatives start with a circumflex, that is, if the pattern is
constrained to match only at the start of the subject, it is
said to be an "anchored" pattern. (There are also other con-
structs that can cause a pattern to be anchored.)
said to be an "anchored" pattern. (There are also other
constructs that can cause a pattern to be anchored.)
A dollar character is an assertion which is &true; only if the
current matching point is at the end of the subject string,
@ -779,10 +779,10 @@
before an internal "\n" character, respectively, in addition
to matching at the start and end of the subject string. For
example, the pattern /^abc$/ matches the subject string
"def\nabc" in multiline mode, but not otherwise. Conse-
quently, patterns that are anchored in single line mode
because all branches start with "^" are not anchored in mul-
tiline mode. The <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> option is ignored if
"def\nabc" in multiline mode, but not otherwise.
Consequently, patterns that are anchored in single line mode
because all branches start with "^" are not anchored in
multiline mode. The <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> option is ignored if
<link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is set.
Note that the sequences \A, \Z, and \z can be used to match
@ -798,9 +798,9 @@
Outside a character class, a dot in the pattern matches any
one character in the subject, including a non-printing
character, but not (by default) newline. If the <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>
option is set, then dots match newlines as well. The han-
dling of dot is entirely independent of the handling of cir-
cumflex and dollar, the only relationship being that they
option is set, then dots match newlines as well. The
handling of dot is entirely independent of the handling of
circumflex and dollar, the only relationship being that they
both involve newline characters. Dot has no special meaning
in a character class.
</literallayout>
@ -809,25 +809,25 @@
<refsect2 id="regexp.reference.squarebrackets">
<title>Square brackets</title>
<literallayout>
An opening square bracket introduces a character class, ter-
minated by a closing square bracket. A closing square
An opening square bracket introduces a character class,
terminated by a closing square bracket. A closing square
bracket on its own is not special. If a closing square
bracket is required as a member of the class, it should be
the first data character in the class (after an initial cir-
cumflex, if present) or escaped with a backslash.
the first data character in the class (after an initial
circumflex, if present) or escaped with a backslash.
A character class matches a single character in the subject;
the character must be in the set of characters defined by
the class, unless the first character in the class is a cir-
cumflex, in which case the subject character must not be in
the class, unless the first character in the class is a
circumflex, in which case the subject character must not be in
the set defined by the class. If a circumflex is actually
required as a member of the class, ensure it is not the
first character, or escape it with a backslash.
For example, the character class [aeiou] matches any lower
case vowel, while [^aeiou] matches any character that is not
a lower case vowel. Note that a circumflex is just a con-
venient notation for specifying the characters which are in
a lower case vowel. Note that a circumflex is just a
convenient notation for specifying the characters which are in
the class by enumerating those that are not. It is not an
assertion: it still consumes a character from the subject
string, and fails if the current pointer is at the end of
@ -836,8 +836,8 @@
When caseless matching is set, any letters in a class
represent both their upper case and lower case versions, so
for example, a caseless [aeiou] matches "A" as well as "a",
and a caseless [^aeiou] does not match "A", whereas a case-
ful version would.
and a caseless [^aeiou] does not match "A", whereas a
caseful version would.
The newline character is never treated in any special way in
character classes, whatever the setting of the <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>
@ -848,17 +848,17 @@
of characters in a character class. For example, [d-m]
matches any letter between d and m, inclusive. If a minus
character is required in a class, it must be escaped with a
backslash or appear in a position where it cannot be inter-
preted as indicating a range, typically as the first or last
backslash or appear in a position where it cannot be
interpreted as indicating a range, typically as the first or last
character in the class.
It is not possible to have the literal character "]" as the
end character of a range. A pattern such as [W-]46] is
interpreted as a class of two characters ("W" and "-") fol-
lowed by a literal string "46]", so it would match "W46]" or
interpreted as a class of two characters ("W" and "-")
followed by a literal string "46]", so it would match "W46]" or
"-46]". However, if the "]" is escaped with a backslash it
is interpreted as the end of range, so [W-\]46] is inter-
preted as a single class containing a range followed by two
is interpreted as the end of range, so [W-\]46] is
interpreted as a single class containing a range followed by two
separate characters. The octal or hexadecimal representation
of "]" can also be used to end a range.
@ -875,8 +875,8 @@
appear in a character class, and add the characters that
they match to the class. For example, [\dABCDEF] matches any
hexadecimal digit. A circumflex can conveniently be used
with the upper case character types to specify a more res-
tricted set of characters than the matching lower case type.
with the upper case character types to specify a more
restricted set of characters than the matching lower case type.
For example, the class [^\W_] matches any letter or digit,
but not underscore.
@ -984,8 +984,8 @@
which can be nested. Marking part of a pattern as a subpattern
does two things:
1. It localizes a set of alternatives. For example, the pat-
tern
1. It localizes a set of alternatives. For example, the
pattern
cat(aract|erpillar|)
@ -1131,8 +1131,8 @@
does the right thing with the C comments. The meaning of the
various quantifiers is not otherwise changed, just the preferred
number of matches. Do not confuse this use of ques-
tion mark with its use as a quantifier in its own right.
number of matches. Do not confuse this use of
question mark with its use as a quantifier in its own right.
Because it has two uses, it can sometimes appear doubled, as
in
@ -1374,8 +1374,8 @@
<title>Once-only subpatterns</title>
<literallayout>
With both maximizing and minimizing repetition, failure of
what follows normally causes the repeated item to be re-
evaluated to see if a different number of repeats allows the
what follows normally causes the repeated item to be
re-evaluated to see if a different number of repeats allows the
rest of the pattern to match. Sometimes it is useful to
prevent this, either to change the nature of the match, or
to cause it fail earlier than it otherwise might, when the
@ -1401,8 +1401,8 @@
This kind of parenthesis "locks up" the part of the pattern
it contains once it has matched, and a failure further into
the pattern is prevented from backtracking into it. Back-
tracking past it to previous items, however, works as normal.
the pattern is prevented from backtracking into it.
Backtracking past it to previous items, however, works as normal.
An alternative description is that a subpattern of this type
matches the string of characters that an identical standalone
@ -1419,8 +1419,8 @@
This construction can of course contain arbitrarily complicated
subpatterns, and it can be nested.
Once-only subpatterns can be used in conjunction with look-
behind assertions to specify efficient matching at the end
Once-only subpatterns can be used in conjunction with
look-behind assertions to specify efficient matching at the end
of the subject string. Consider a simple pattern such as
abcd$
@ -1547,8 +1547,8 @@
comment play no part in the pattern matching at all.
If the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, an unescaped # character
outside a character class introduces a comment that contin-
ues up to the next newline character in the pattern.
outside a character class introduces a comment that
continues up to the next newline character in the pattern.
</literallayout>
</refsect2>
@ -1571,8 +1571,8 @@
\( ( (?>[^()]+) | (?R) )* \)
First it matches an opening parenthesis. Then it matches any
number of substrings which can either be a sequence of non-
parentheses, or a recursive match of the pattern itself
number of substrings which can either be a sequence of
non-parentheses, or a recursive match of the pattern itself
(i.e. a correctly parenthesized substring). Finally there is
a closing parenthesis.