diff --git a/reference/pcre/book.xml b/reference/pcre/book.xml
index 76fc2cd111..b63fbf5818 100644
--- a/reference/pcre/book.xml
+++ b/reference/pcre/book.xml
@@ -1,5 +1,5 @@
-
+
@@ -42,6 +42,12 @@
xlink:href="&url.pcre.man;">&url.pcre.man; for more info.
+
+ The PCRE library is a set of functions that implement regular
+ expression pattern matching using the same syntax and semantics
+ as Perl 5, with just a few differences (see below). The current
+ implementation corresponds to Perl 5.005.
+
&reference.pcre.setup;
diff --git a/reference/pcre/pattern.differences.xml b/reference/pcre/pattern.differences.xml
new file mode 100644
index 0000000000..8aeff73678
--- /dev/null
+++ b/reference/pcre/pattern.differences.xml
@@ -0,0 +1,156 @@
+
+
+
+
+ Perl Differences
+ Differences From Perl
+
+ The differences described here are with respect to Perl 5.005.
+
+
+
+ By default, a whitespace character is any character that
+ the C library function isspace() recognizes, though it is
+ possible to compile PCRE with alternative character type
+ tables. Normally isspace() matches space, formfeed, newline,
+ carriage return, horizontal tab, and vertical tab. Perl 5 no
+ longer includes vertical tab in its set of whitespace characters.
+ The \v escape that was in the Perl documentation for
+ a long time was never in fact recognized. However, the character
+ itself was treated as whitespace at least up to 5.002.
+ In 5.004 and 5.005 it does not match \s.
+
+
+
+
+ PCRE does not allow repeat quantifiers on lookahead
+ assertions. Perl permits them, but they do not mean what you
+ might think. For example, (?!a){3} does not assert that the
+ next three characters are not "a". It just asserts that the
+ next character is not "a" three times.
+
+
+
+
+ Capturing subpatterns that occur inside negative
+ lookahead assertions are counted, but their entries in the
+ offsets vector are never set. Perl sets its numerical
+ variables from any such patterns that are matched before the
+ assertion fails to match something (thereby succeeding), but
+ only if the negative lookahead assertion contains just one
+ branch.
+
+
+
+
+ Though binary zero characters are supported in the subject string,
+ they are not allowed in a pattern string because it is passed as a
+ normal C string, terminated by zero. The escape sequence "\x00" can
+ be used in the pattern to represent a binary zero.
+
+
+
+
+ The following Perl escape sequences are not supported:
+ \l, \u, \L, \U. In fact these are implemented by
+ Perl's general string-handling and are not part of its
+ pattern matching engine.
+
+
+
+
+ The Perl \G assertion is not supported as it is not
+ relevant to single pattern matches.
+
+
+
+
+ Fairly obviously, PCRE does not support the (?{code}) and (??{code})
+ construction. However, there is support for recursive patterns.
+
+
+
+
+ There are at the time of writing some oddities in Perl
+ 5.005_02 concerned with the settings of captured strings
+ when part of a pattern is repeated. For example, matching
+ "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
+ "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2
+ unset. However, if the pattern is changed to
+ /^(aa(b(b))?)+$/ then $2 (and $3) get set.
+ In Perl 5.004 $2 is set in both cases, and that is also &true;
+ of PCRE. If in the future Perl changes to a consistent state
+ that is different, PCRE may change to follow.
+
+
+
+
+ Another as yet unresolved discrepancy is that in Perl
+ 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string
+ "a", whereas in PCRE it does not. However, in both Perl and
+ PCRE /^(a)?a/ matched against "a" leaves $1 unset.
+
+
+
+
+ PCRE provides some extensions to the Perl regular
+ expression facilities:
+
+
+
+ Although lookbehind assertions must match fixed length
+ strings, each alternative branch of a lookbehind assertion
+ can match a different length of string. Perl 5.005 requires
+ them all to have the same length.
+
+
+
+
+ If PCRE_DOLLAR_ENDONLY
+ is set and PCRE_MULTILINE is
+ not set, the $ meta-character matches only at the very end of the
+ string.
+
+
+
+
+ If PCRE_EXTRA is
+ set, a backslash followed by a letter with no special meaning is
+ faulted.
+
+
+
+
+ If PCRE_UNGREEDY is
+ set, the greediness of the repetition quantifiers is inverted,
+ that is, by default they are not greedy, but if followed by a
+ question mark they are.
+
+
+
+
+
+
+
+
+
+
diff --git a/reference/pcre/pattern.syntax.xml b/reference/pcre/pattern.syntax.xml
index cfacc77bc1..824d203176 100644
--- a/reference/pcre/pattern.syntax.xml
+++ b/reference/pcre/pattern.syntax.xml
@@ -1,152 +1,10 @@
-
+
Pattern SyntaxDescribes PCRE regex syntax
-
- Description
-
- The PCRE library is a set of functions that implement regular
- expression pattern matching using the same syntax and semantics
- as Perl 5, with just a few differences (see below). The current
- implementation corresponds to Perl 5.005.
-
-
-
-
- Differences From Perl
-
- The differences described here are with respect to Perl 5.005.
-
-
-
- By default, a whitespace character is any character that
- the C library function isspace() recognizes, though it is
- possible to compile PCRE with alternative character type
- tables. Normally isspace() matches space, formfeed, newline,
- carriage return, horizontal tab, and vertical tab. Perl 5 no
- longer includes vertical tab in its set of whitespace characters.
- The \v escape that was in the Perl documentation for
- a long time was never in fact recognized. However, the character
- itself was treated as whitespace at least up to 5.002.
- In 5.004 and 5.005 it does not match \s.
-
-
-
-
- PCRE does not allow repeat quantifiers on lookahead
- assertions. Perl permits them, but they do not mean what you
- might think. For example, (?!a){3} does not assert that the
- next three characters are not "a". It just asserts that the
- next character is not "a" three times.
-
-
-
-
- Capturing subpatterns that occur inside negative
- lookahead assertions are counted, but their entries in the
- offsets vector are never set. Perl sets its numerical
- variables from any such patterns that are matched before the
- assertion fails to match something (thereby succeeding), but
- only if the negative lookahead assertion contains just one
- branch.
-
-
-
-
- Though binary zero characters are supported in the subject string,
- they are not allowed in a pattern string because it is passed as a
- normal C string, terminated by zero. The escape sequence "\x00" can
- be used in the pattern to represent a binary zero.
-
-
-
-
- The following Perl escape sequences are not supported:
- \l, \u, \L, \U. In fact these are implemented by
- Perl's general string-handling and are not part of its
- pattern matching engine.
-
-
-
-
- The Perl \G assertion is not supported as it is not
- relevant to single pattern matches.
-
-
-
-
- Fairly obviously, PCRE does not support the (?{code}) and (??{code})
- construction. However, there is support for recursive patterns.
-
-
-
-
- There are at the time of writing some oddities in Perl
- 5.005_02 concerned with the settings of captured strings
- when part of a pattern is repeated. For example, matching
- "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
- "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2
- unset. However, if the pattern is changed to
- /^(aa(b(b))?)+$/ then $2 (and $3) get set.
- In Perl 5.004 $2 is set in both cases, and that is also &true;
- of PCRE. If in the future Perl changes to a consistent state
- that is different, PCRE may change to follow.
-
-
-
-
- Another as yet unresolved discrepancy is that in Perl
- 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string
- "a", whereas in PCRE it does not. However, in both Perl and
- PCRE /^(a)?a/ matched against "a" leaves $1 unset.
-
-
-
-
- PCRE provides some extensions to the Perl regular
- expression facilities:
-
-
-
- Although lookbehind assertions must match fixed length
- strings, each alternative branch of a lookbehind assertion
- can match a different length of string. Perl 5.005 requires
- them all to have the same length.
-
-
-
-
- If PCRE_DOLLAR_ENDONLY
- is set and PCRE_MULTILINE is
- not set, the $ meta-character matches only at the very end of the
- string.
-
-
-
-
- If PCRE_EXTRA is
- set, a backslash followed by a letter with no special meaning is
- faulted.
-
-
-
-
- If PCRE_UNGREEDY is
- set, the greediness of the repetition quantifiers is inverted,
- that is, by default they are not greedy, but if followed by a
- question mark they are.
-
-
-
-
-
-
-
-
-
Regular Expression Details
diff --git a/reference/pcre/pattern.xml b/reference/pcre/pattern.xml
index 4243b65d03..8d3fe1b282 100644
--- a/reference/pcre/pattern.xml
+++ b/reference/pcre/pattern.xml
@@ -1,10 +1,11 @@
-
+
PCRE Patterns
&reference.pcre.pattern.modifiers;
+ &reference.pcre.pattern.differences;
&reference.pcre.pattern.syntax;