white space and some spelling

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@91117 c90b9560-bf6c-de11-be94-00142212c4b1
2025-03-17 01:18:55 +00:00 · 2002-08-06 20:04:34 +00:00 · 2002-08-06 20:04:34 +00:00 · 931ba4788e
commit 931ba4788e
parent 505aacbeef
1 changed files with 172 additions and 170 deletions
--- a/reference/pcre/functions/pcre.pattern.syntax.xml
+++ b/reference/pcre/functions/pcre.pattern.syntax.xml
@ -1,5 +1,5 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!-- $Revision: 1.4 $ -->
+<!-- $Revision: 1.5 $ -->
 <!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
  <refentry id="pcre.pattern.syntax">
   <refnamediv>
@ -25,8 +25,8 @@
     <listitem>
      <simpara>
       By default, a whitespace character is any character  that
-       the  C  library  function isspace() recognizes, though it is
-       possible to compile PCRE  with  alternative  character  type
+       the C library function isspace() recognizes, though it is
+       possible to compile PCRE  with alternative character type
       tables. Normally isspace() matches space, formfeed, newline,
       carriage return, horizontal tab, and vertical tab. Perl 5 no
       longer  includes vertical tab in its set of whitespace characters.
@ -38,19 +38,19 @@
     </listitem>
     <listitem>
      <simpara>
-     PCRE does  not  allow  repeat  quantifiers  on  lookahead
+     PCRE does not allow repeat quantifiers on lookahead
     assertions. Perl permits them, but they do not mean what you
     might think. For example, (?!a){3} does not assert that  the
-     next  three characters are not "a". It just asserts that the
+     next three characters are not "a". It just asserts that the
     next character is not "a" three times.
      </simpara>
     </listitem>
     <listitem>
      <simpara>
-     Capturing subpatterns that occur inside  negative  looka-
-     head  assertions  are  counted,  but  their  entries  in the
-     offsets vector are never set. Perl sets its numerical  vari-
-     ables  from  any  such  patterns that are matched before the
+     Capturing subpatterns that occur inside negative looka-
+     head assertions are counted, but their entries in the
+     offsets vector are never set. Perl sets its numerical vari-
+     ables from any such patterns that are matched before the
     assertion fails to match something (thereby succeeding), but
     only  if  the negative lookahead assertion contains just one
     branch.
@ -59,8 +59,8 @@
     <listitem>
      <simpara>
     Though binary zero characters are supported in  the  subject  string,
-     they  are  not  allowed  in  a pattern string because it is passed as a
-     normal  C  string,  terminated  by zero. The escape sequence "\\x00" can
+     they are not allowed in a pattern string because it is passed as a
+     normal C string, terminated  by zero. The escape sequence "\\x00" can
     be used in the pattern to represent a binary zero.
      </simpara>
      </listitem>
@ -80,7 +80,7 @@
      </listitem>
      <listitem>
      <simpara>
-     Fairly obviously, PCRE does  not  support  the  (?{code})
+     Fairly obviously, PCRE does not support the (?{code})
     construction.
      </simpara>
      </listitem>
@ -181,7 +181,7 @@
    <para>
     There are two different sets of meta-characters: those  that
     are  recognized anywhere in the pattern except within square
-     brackets, and those that are recognized in square  brackets.
+     brackets, and those that are recognized in square brackets.
     Outside square brackets, the meta-characters are as follows:
      <variablelist>
       <varlistentry>
@ -196,7 +196,7 @@
        <term><emphasis>^</emphasis></term>
           <listitem>
         <simpara>
-          assert start of  subject  (or  line,  in  multiline mode)
+          assert start of subject (or line, in multiline mode)
         </simpara>
        </listitem>
       </varlistentry>
@ -298,8 +298,8 @@
       </varlistentry>
      </variablelist>

-     Part of a pattern that is in square  brackets  is  called  a
-     "character  class".  In  a  character  class  the only meta-
+     Part of a pattern that is in square  brackets is called a
+     "character  class". In a character class the only meta-
     characters are:
      <variablelist>
       <varlistentry>
@ -335,7 +335,7 @@
        </listitem>
       </varlistentry>
      </variablelist>
-     The following sections describe  the  use  of  each  of  the
+     The following sections describe the use of each of the
     meta-characters.
    </para>
    </refsect2>
@ -343,16 +343,16 @@
    <title>backslash</title>
     <para>
     The backslash character has several uses. Firstly, if it  is
-     followed  by  a  non-alphameric character, it takes away any
-     special  meaning  that  character  may  have.  This  use  of
-     backslash  as  an  escape  character applies both inside and
+     followed by a non-alphanumeric character, it takes away any
+     special  meaning that character may have. This use of
+     backslash as an escape character applies both inside and
     outside character classes.
    </para>
    <para>
     For example, if you want to match a "*" character, you write
     "\*" in the pattern. This applies whether or not the follow-
-     ing character would otherwise  be  interpreted  as  a  meta-
-     character,  so it is always safe to precede a non-alphameric
+     ing character would otherwise be interpreted as a meta-
+     character, so it is always safe to precede a non-alphanumeric
     with "\" to specify that it stands for itself.  In  particu-
     lar, if you want to match a backslash, you write "\\".
    </para>
@ -365,11 +365,11 @@
     of the pattern.
    </para>
    <para>
-     A second use of backslash provides a way  of  encoding  non-
-     printing  characters  in patterns in a visible manner. There
-     is no restriction on the appearance of non-printing  charac-
-     ters,  apart from the binary zero that terminates a pattern,
-     but when a pattern is being prepared by text editing, it  is
+     A second use of backslash provides a way of encoding non-
+     printing characters in patterns in a visible manner. There
+     is no restriction on the appearance of non-printing  characters,
+     apart from the binary zero that terminates a pattern,
+     but when a pattern is being prepared by text editing, it is
     usually  easier to use one of the following escape sequences
     than the binary character it represents:
    </para>
@ -450,38 +450,41 @@
      </variablelist>
    </para>
    <para>
-     The precise effect of "<literal>\cx</literal>" is as follows: if "<literal>x</literal>" is a lower
-     case  letter,  it  is converted to upper case. Then bit 6 of
-     the character (hex 40) is inverted.  Thus "<literal>\cz</literal>" becomes  hex
-     1A, but "<literal>\c{</literal>" becomes hex 3B, while "<literal>\c;</literal>" becomes hex 7B.
+     The precise effect of "<literal>\cx</literal>" is as follows: 
+     if "<literal>x</literal>" is a lower case  letter, it is converted
+     to upper case. Then bit 6 of the character (hex 40) is inverted. 
+     Thus "<literal>\cz</literal>" becomes  hex 1A, but
+     "<literal>\c{</literal>" becomes hex 3B, while "<literal>\c;</literal>"
+     becomes hex 7B.
    </para>
    <para>
-     After "<literal>\x</literal>", up to two hexadecimal digits are  read  (letters
-     can be in upper or lower case).
+     After "<literal>\x</literal>", up to two hexadecimal digits are
+     read (letters can be in upper or lower case).
    </para>
    <para>
-     After "<literal>\0</literal>" up to two further octal digits are read. In  both
-     cases,  if  there are fewer than two digits, just those that
-     are present are used. Thus the sequence "<literal>\0\x\07</literal>"  specifies
-     two binary zeros followed by a BEL character.  Make sure you
-     supply two digits after the initial zero  if  the  character
+     After "<literal>\0</literal>" up to two further octal digits are read.
+     In  both cases,  if  there are fewer than two digits, just those that
+     are present are used. Thus the sequence "<literal>\0\x\07</literal>" 
+     specifies two binary zeros followed by a BEL character. Make sure you
+     supply two digits after the initial zero if the character
     that follows is itself an octal digit.
    </para>
    <para>
     The handling of a backslash followed by a digit other than 0
-     is  complicated.   Outside  a character class, PCRE reads it
+     is complicated. Outside a character class, PCRE reads it
     and any following digits as a decimal number. If the  number
     is  less  than  10, or if there have been at least that many
     previous capturing left parentheses in the  expression,  the
-     entire  sequence is taken as a <emphasis>back</emphasis> <emphasis>reference</emphasis>. A description
+     entire  sequence is taken as a <emphasis>back</emphasis> 
+     <emphasis>reference</emphasis>. A description
     of how this works is given later, following  the  discussion
     of parenthesized subpatterns.
    </para>
    <para>
     Inside a character  class,  or  if  the  decimal  number  is
-     greater  than  9 and there have not been that many capturing
-     subpatterns, PCRE re-reads up to three octal digits  follow-
-     ing  the  backslash,  and  generates  a single byte from the
+     greater than 9 and there have not been that many capturing
+     subpatterns, PCRE re-reads up to three octal digits following 
+     the backslash, and generates a single byte from the
     least significant 8 bits of the value. Any subsequent digits
     stand for themselves.  For example:
    </para>
@ -566,15 +569,15 @@
     </variablelist>
    </para>
    <para>
-     Note that octal values of 100 or greater must not be  intro-
-     duced  by  a  leading zero, because no more than three octal
+     Note that octal values of 100 or greater must not be intro-
+     duced by a leading zero, because no more than three octal
     digits are ever read.
    </para>
    <para>
-     All the sequences that define a single  byte  value  can  be
+     All the sequences that define a single byte value can  be
     used both inside and outside character classes. In addition,
-     inside a character class, the sequence "<literal>\b</literal>" is  interpreted
-     as  the  backspace  character  (hex 08). Outside a character
+     inside a character class, the sequence "<literal>\b</literal>"
+     is interpreted as the backspace character (hex 08). Outside a character
     class it has a different meaning (see below).
    </para>
    <para>
@ -635,32 +638,32 @@
    </para>
    <para>
     Each pair of escape sequences partitions the complete set of
-     characters  into  two  disjoint  sets.  Any  given character
+     characters into two disjoint sets. Any given character
     matches one, and only one, of each pair.
    </para>
    <para>
-     A "word" character is any letter or digit or the  underscore
+     A "word" character is any letter or digit or the underscore
     character,  that  is,  any  character which can be part of a
     Perl "<literal>word</literal>". The definition of letters and digits is  
-     controlled  by PCRE's character tables, and may vary if locale-specific
-     matching is  taking  place  (see  "Locale  support"
+     controlled by PCRE's character tables, and may vary if locale-specific
+     matching is taking place (see  "Locale  support"
     above). For example, in the "fr" (French) locale, some char-
-     acter codes greater than 128 are used for accented  letters,
+     acter codes greater than 128 are used for accented letters,
     and these are matched by <literal>\w</literal>.
    </para>
    <para>
-     These character type sequences can appear  both  inside  and
+     These character type sequences can appear both inside and
     outside  character classes. They each match one character of
-     the appropriate type. If the current matching  point  is  at
+     the appropriate type. If the current matching  point is at
     the end of the subject string, all of them fail, since there
     is no character to match.
    </para>
    <para>
     The fourth use of backslash is  for  certain  simple  asser-
     tions. An assertion specifies a condition that has to be met
-     at a particular point in  a  match,  without  consuming  any
-     characters  from  the subject string. The use of subpatterns
-     for more complicated  assertions  is  described  below.  The
+     at a particular point in  a match, without consuming any
+     characters from the subject string. The use of subpatterns
+     for more complicated assertions is described below. The
     backslashed assertions are
    </para>
    <para>
@ -693,7 +696,7 @@
       <term><emphasis>\Z</emphasis></term>
          <listitem>
        <simpara>
-        end of subject or newline at  end  (independent  of
+        end of subject or newline at end (independent of
        multiline mode)
        </simpara>
       </listitem>
@ -702,7 +705,7 @@
       <term><emphasis>\z</emphasis></term>
          <listitem>
        <simpara>
-         end of subject (independent of multiline mode)
+         end of subject(independent of multiline mode)
        </simpara>
       </listitem>
      </varlistentry>
@ -714,20 +717,23 @@
     character, inside a character class).
    </para>
    <para>
-     A word boundary is a position in the  subject  string  where
+     A word boundary is a position in the subject string where
     the current character and the previous character do not both
     match <literal>\w</literal> or <literal>\W</literal> (i.e. one matches 
     <literal>\w</literal> and  the  other  matches
-     <literal>\W</literal>),  or the start or end of the string if the first or last
-     character matches \w, respectively.
+     <literal>\W</literal>), or the start or end of the string if the first
+     or last character matches \w, respectively.
    </para>
    <para>
-     The <literal>\A</literal>, <literal>\Z</literal>, and <literal>\z</literal> assertions differ  from  the  traditional
+     The <literal>\A</literal>, <literal>\Z</literal>, and
+     <literal>\z</literal> assertions differ  from  the  traditional
     circumflex  and  dollar  (described below) in that they only
     ever match at the very start and end of the subject  string,
     whatever  options  are  set.  They  are  not affected by the
-     <link linkend="pcre.pattern.modifiers">PCRE_NOTBOL</link>  or <link linkend="pcre.pattern.modifiers">PCRE_NOTEOL</link>  options. The  difference  between
-     <literal>\Z</literal>  and  <literal>\z</literal>  is that <literal>\Z</literal>
+     <link linkend="pcre.pattern.modifiers">PCRE_NOTBOL</link> or
+     <link linkend="pcre.pattern.modifiers">PCRE_NOTEOL</link> options.
+     The  difference  between <literal>\Z</literal> and
+     <literal>\z</literal>  is that <literal>\Z</literal>
     matches before a newline that is the
     last character of the string as well as at the  end  of  the
     string, whereas <literal>\z</literal> matches only at the end.
@ -744,7 +750,7 @@
     different meaning (see below).

     Circumflex need not be the first character of the pattern if
-     a  number of alternatives are involved, but it should be the
+     a number of alternatives are involved, but it should be the
     first thing in each alternative in which it appears  if  the
     pattern is ever to match that branch. If all possible alter-
     natives start with a circumflex, that is, if the pattern  is
@ -763,7 +769,8 @@

     The meaning of dollar can be changed so that it matches only
     at   the   very   end   of   the   string,  by  setting  the
-     <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>  option at compile or matching time. This
+     <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
+     option at compile or matching time. This
     does not affect the \Z assertion.

     The meanings of the circumflex  and  dollar  characters  are
@ -873,7 +880,7 @@
     For example, the class [^\W_] matches any letter  or  digit,
     but not underscore.

-     All non-alphameric characters other than \,  -,  ^  (at  the
+     All non-alphanumeric characters other than \,  -,  ^  (at  the
     start)  and  the  terminating ] are non-special in character
     classes, but it does no harm if they are escaped.
     </literallayout>
@ -887,8 +894,8 @@

       gilbert|sullivan

-     matches either "gilbert" or "sullivan". Any number of alter-
-     natives  may  appear,  and an empty alternative is permitted
+     matches either "gilbert" or "sullivan". Any number of alternatives
+     may  appear,  and an empty alternative is permitted
     (matching the empty string).   The  matching  process  tries
     each  alternative in turn, from left to right, and the first
     one that succeeds is used. If the alternatives are within  a
@ -933,11 +940,11 @@
       abc(?i)

     which in turn is the same as compiling the pattern abc  with
-     <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link>   set.   In  other words, such "top level" set-
-     tings apply to the whole pattern  (unless  there  are  other
-     changes  inside subpatterns). If there is more than one set-
-     ting of the same option at top level, the rightmost  setting
-     is used.
+     <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link> set.
+     In  other words, such "top level" settings apply to the whole
+     pattern  (unless  there  are  other changes  inside subpatterns).
+     If there is more than one setting of the same option at top level,
+     the rightmost  setting is used.

     If an option change occurs inside a subpattern,  the  effect
     is  different.  This is a change of behaviour in Perl 5.005.
@ -958,8 +965,7 @@
     matches "ab", "aB", "c", and "C", even though when  matching
     "C" the first branch is abandoned before the option setting.
     This is because the effects of  option  settings  happen  at
-     compile  time. There would be some very weird behaviour oth-
-     erwise.
+     compile  time. There would be some very weird behaviour otherwise.

     The PCRE-specific options <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link>  and  
     <link linkend="pcre.pattern.modifiers">PCRE_EXTRA</link>   can
@ -975,25 +981,26 @@
     <title>subpatterns</title>
     <literallayout>
     Subpatterns are delimited by parentheses  (round  brackets),
-     which can be nested.  Marking part of a pattern as a subpat-
-     tern does two things:
+     which can be nested.  Marking part of a pattern as a subpattern
+     does two things:

     1. It localizes a set of alternatives. For example, the pat-
     tern

       cat(aract|erpillar|)

-     matches one of the words "cat",  "cataract",  or  "caterpil-
-     lar".  Without  the  parentheses, it would match "cataract",
+     matches one of the words "cat",  "cataract",  or  "caterpillar".
+     Without  the  parentheses, it would match "cataract",
     "erpillar" or the empty string.

     2. It sets up the subpattern as a capturing  subpattern  (as
-     defined  above).   When the whole pattern matches, that por-
-     tion of the subject string that matched  the  subpattern  is
-     passed  back  to  the  caller  via  the  <emphasis>ovector</emphasis> argument of
-     <function>pcre_exec</function>. Opening parentheses are counted  from  left  to
-     right (starting from 1) to obtain the numbers of the captur-
-     ing subpatterns.
+     defined  above).   When the whole pattern matches, that portion
+     of the subject string that matched  the  subpattern  is
+     passed  back  to  the  caller  via  the  <emphasis>ovector</emphasis>
+     argument of
+     <function>pcre_exec</function>. Opening parentheses are counted
+     from  left  to right (starting from 1) to obtain the numbers of the
+     capturing subpatterns.

     For example, if the string "the red king" is matched against
     the pattern
@ -1004,8 +1011,8 @@
     and are numbered 1, 2, and 3.

     The fact that plain parentheses fulfil two functions is  not
-     always  helpful.  There are often times when a grouping sub-
-     pattern is required without a capturing requirement.  If  an
+     always  helpful.  There are often times when a grouping subpattern
+     is required without a capturing requirement.  If  an
     opening parenthesis is followed by "?:", the subpattern does
     not do any capturing, and is not counted when computing  the
     number of any subsequent capturing subpatterns. For example,
@ -1015,8 +1022,8 @@
       the ((?:red|white) (king|queen))

     the captured substrings are "white queen" and  "queen",  and
-     are  numbered  1  and 2. The maximum number of captured sub-
-     strings is 99, and the maximum number  of  all  subpatterns,
+     are  numbered  1  and 2. The maximum number of captured substrings
+     is 99, and the maximum number  of  all  subpatterns,
     both capturing and non-capturing, is 200.

     As a  convenient  shorthand,  if  any  option  settings  are
@ -1072,8 +1079,8 @@
     matches exactly 8 digits.  An  opening  curly  bracket  that
     appears  in a position where a quantifier is not allowed, or
     one that does not match the syntax of a quantifier, is taken
-     as  a literal character. For example, {,6} is not a quantif-
-     ier, but a literal string of four characters.
+     as  a literal character. For example, {,6} is not a quantifier,
+     but a literal string of four characters.

     The quantifier {0} is permitted, causing the  expression  to
     behave  as  if the previous item and the quantifier were not
@ -1099,13 +1106,13 @@
     fact match no characters, the loop is forcibly broken.

     By default, the quantifiers  are  "greedy",  that  is,  they
-     match  as much as possible (up to the maximum number of per-
-     mitted times), without causing the rest of  the  pattern  to
+     match  as much as possible (up to the maximum number of permitted
+     times), without causing the rest of  the  pattern  to
     fail. The classic example of where this gives problems is in
     trying to match comments in C programs. These appear between
     the  sequences /* and */ and within the sequence, individual
-     * and / characters may appear. An attempt to  match  C  com-
-     ments by applying the pattern
+     * and / characters may appear. An attempt to  match  C  comments
+     by applying the pattern

       /\*.*\*/

@ -1123,8 +1130,8 @@
       /\*.*?\*/

     does the right thing with the C comments. The meaning of the
-     various  quantifiers is not otherwise changed, just the pre-
-     ferred number of matches.  Do not confuse this use of  ques-
+     various  quantifiers is not otherwise changed, just the preferred
+     number of matches.  Do not confuse this use of  ques-
     tion  mark  with  its  use as a quantifier in its own right.
     Because it has two uses, it can sometimes appear doubled, as
     in
@ -1141,33 +1148,32 @@
     default behaviour.

     When a parenthesized subpattern is quantified with a minimum
-     repeat  count  that is greater than 1 or with a limited max-
-     imum, more store is required for the  compiled  pattern,  in
+     repeat  count  that is greater than 1 or with a limited maximum,
+     more store is required for the  compiled  pattern,  in
     proportion to the size of the minimum or maximum.

     If a pattern starts with .* or  .{0,}  and  the  <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> 
     option (equivalent to Perl's /s) is set, thus allowing the .
     to match newlines, then the pattern is implicitly  anchored,
-     because whatever follows will be tried against every charac-
-     ter position in the subject string, so there is no point  in
+     because whatever follows will be tried against every character
+     position in the subject string, so there is no point  in
     retrying  the overall match at any position after the first.
     PCRE treats such a pattern as though it were preceded by \A.
     In  cases where it is known that the subject string contains
-     no newlines, it is worth setting <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>  when  the  pat-
-     tern begins with .* in order to obtain this optimization, or
+     no newlines, it is worth setting <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>  when  the  pattern begins with .* in order to
+     obtain this optimization, or
     alternatively using ^ to indicate anchoring explicitly.

     When a capturing subpattern is repeated, the value  captured
-     is the substring that matched the final iteration. For exam-
-     ple, after
+     is the substring that matched the final iteration. For example, after

       (tweedle[dume]{3}\s*)+

-     has matched "tweedledum tweedledee" the value  of  the  cap-
-     tured  substring  is  "tweedledee".  However,  if  there are
+     has matched "tweedledum tweedledee" the value  of  the  captured
+     substring  is  "tweedledee".  However,  if  there are
     nested capturing  subpatterns,  the  corresponding  captured
-     values  may  have been set in previous iterations. For exam-
-     ple, after
+     values  may  have been set in previous iterations. For example,
+     after
     
       /(a|(b))+/

@ -1191,28 +1197,27 @@
     left  parentheses in the entire pattern. In other words, the
     parentheses that are referenced need not be to the  left  of
     the  reference  for  numbers  less  than 10. See the section
-     entitled "Backslash" above for further details of  the  han-
-     dling of digits following a backslash.
+     entitled "Backslash" above for further details of  the  handling
+     of digits following a backslash.

-     A back reference matches whatever actually matched the  cap-
-     turing subpattern in the current subject string, rather than
+     A back reference matches whatever actually matched the  capturing
+     subpattern in the current subject string, rather than
     anything matching the subpattern itself. So the pattern

       (sens|respons)e and \1ibility

-     matches "sense and sensibility" and "response and  responsi-
-     bility",  but  not  "sense  and  responsibility". If caseful
+     matches "sense and sensibility" and "response and  responsibility",
+     but  not  "sense  and  responsibility". If caseful
     matching is in force at the time of the back reference, then
     the case of letters is relevant. For example,

       ((?i)rah)\s+\1

     matches "rah rah" and "RAH RAH", but  not  "RAH  rah",  even
-     though  the  original  capturing subpattern is matched case-
-     lessly.
+     though  the  original  capturing subpattern is matched caselessly.

-     There may be more than one back reference to the  same  sub-
-     pattern.  If  a  subpattern  has not actually been used in a
+     There may be more than one back reference to the  same  subpattern.
+     If  a  subpattern  has not actually been used in a
     particular match, then any  back  references  to  it  always
     fail. For example, the pattern

@ -1229,15 +1234,14 @@
     A back reference that occurs inside the parentheses to which
     it  refers  fails when the subpattern is first used, so, for
     example, (a\1) never matches.  However, such references  can
-     be useful inside repeated subpatterns. For example, the pat-
-     tern
+     be useful inside repeated subpatterns. For example, the pattern

       (a|b\1)+

     matches any number of "a"s and also "aba", "ababaa" etc.  At
     each iteration of the subpattern, the back reference matches
-     the character string corresponding to  the  previous  itera-
-     tion.  In  order  for this to work, the pattern must be such
+     the character string corresponding to  the  previous  iteration.
+     In order for this to work, the pattern must be such
     that the first iteration does not need  to  match  the  back
     reference.  This  can  be  done using alternation, as in the
     example above, or by a quantifier with a minimum of zero.
@ -1250,8 +1254,8 @@
     An assertion is  a  test  on  the  characters  following  or
     preceding  the current matching point that does not actually
     consume any characters. The simple assertions coded  as  \b,
-     \B,  \A,  \Z,  \z, ^ and $ are described above. More compli-
-     cated assertions are coded as  subpatterns.  There  are  two
+     \B,  \A,  \Z,  \z, ^ and $ are described above. More complicated
+     assertions are coded as  subpatterns.  There  are  two
     kinds:  those that look ahead of the current position in the
     subject string, and those that look behind it.

@ -1278,8 +1282,8 @@
     when  the  next  three  characters  are  "bar". A lookbehind
     assertion is needed to achieve this effect.
     
-     Lookbehind assertions start with (?&lt;=  for  positive  asser-
-     tions and (?&lt;! for negative assertions. For example,
+     Lookbehind assertions start with (?&lt;=  for  positive  assertions
+     and (?&lt;! for negative assertions. For example,

       (?&lt;!foo)bar

@ -1295,8 +1299,8 @@

       (?&lt;!dogs?|cats?)

-     causes an error at compile time. Branches  that  match  dif-
-     ferent length strings are permitted only at the top level of
+     causes an error at compile time. Branches  that  match  different
+     length strings are permitted only at the top level of
     a lookbehind assertion. This is an extension  compared  with
     Perl  5.005,  which  requires all branches to match the same
     length of string. An assertion such as
@ -1304,8 +1308,8 @@
       (?&lt;=ab(c|de))

     is not permitted, because its single  top-level  branch  can
-     match two different lengths, but it is acceptable if rewrit-
-     ten to use two top-level branches:
+     match two different lengths, but it is acceptable if rewritten
+     to use two top-level branches:

       (?&lt;=abc|abde)

@ -1314,8 +1318,8 @@
     by the fixed width and then  try  to  match.  If  there  are
     insufficient  characters  before  the  current position, the
     match is deemed to fail.  Lookbehinds  in  conjunction  with
-     once-only  subpatterns can be particularly useful for match-
-     ing at the ends of strings; an example is given at  the  end
+     once-only  subpatterns can be particularly useful for matching
+     at the ends of strings; an example is given at  the  end
     of the section on once-only subpatterns.

     Several assertions (of any sort) may  occur  in  succession.
@ -1398,23 +1402,22 @@
     This kind of parenthesis "locks up" the  part of the pattern
     it  contains once it has matched, and a failure further into
     the pattern is prevented from backtracking  into  it.  Back-
-     tracking  past  it to previous items, however, works as nor-
-     mal.
+     tracking  past  it to previous items, however, works as normal.

     An alternative description is that a subpattern of this type
-     matches  the  string  of  characters that an identical stan-
-     dalone pattern would match, if anchored at the current point
+     matches  the  string  of  characters that an identical standalone
+     pattern would match, if anchored at the current point
     in the subject string.

     Once-only subpatterns are not capturing subpatterns.  Simple
-     cases  such as the above example can be thought of as a max-
-     imizing repeat that must  swallow  everything  it  can.  So,
+     cases  such as the above example can be thought of as a maximizing
+     repeat that must  swallow  everything  it  can.  So,
     while both \d+ and \d+? are prepared to adjust the number of
     digits they match in order to make the rest of  the  pattern
     match, (?&gt;\d+) can only match an entire sequence of digits.

-     This construction can of course contain arbitrarily  compli-
-     cated subpatterns, and it can be nested.
+     This construction can of course contain arbitrarily  complicated
+     subpatterns, and it can be nested.

     Once-only subpatterns can be used in conjunction with  look-
     behind  assertions  to specify efficient matching at the end
@ -1442,19 +1445,18 @@
     match  only  the  entire  string.  The subsequent lookbehind
     assertion does a single test on the last four characters. If
     it  fails,  the  match  fails immediately. For long strings,
-     this approach makes a significant difference to the process-
-     ing time.
+     this approach makes a significant difference to the processing time.

-     When a pattern contains an unlimited repeat inside a subpat-
-     tern  that  can  itself  be  repeated an unlimited number of
+     When a pattern contains an unlimited repeat inside a subpattern
+     that can itself be repeated an unlimited number of
     times, the use of a once-only subpattern is the only way  to
     avoid  some  failing matches taking a very long time indeed.
     The pattern

       (\D+|&lt;\d+>)*[!?]

-     matches an unlimited number of substrings that  either  con-
-     sist  of  non-digits,  or digits enclosed in &lt;>, followed by
+     matches an unlimited number of substrings that  either  consist
+     of  non-digits,  or digits enclosed in &lt;>, followed by
     either ! or ?. When it matches, it runs quickly. However, if
     it is applied to

@ -1462,8 +1464,8 @@

     it takes a long  time  before  reporting  failure.  This  is
     because the string can be divided between the two repeats in
-     a large number of ways, and all have to be tried. (The exam-
-     ple  used  [!?]  rather  than a single character at the end,
+     a large number of ways, and all have to be tried. (The example
+     used  [!?]  rather  than a single character at the end,
     because both PCRE and Perl have an optimization that  allows
     for  fast  failure  when  a  single  character is used. They
     remember the last single character that is  required  for  a
@ -1472,16 +1474,15 @@

       ((?>\D+)|&lt;\d+>)*[!?]

-     sequences of non-digits cannot be broken, and  failure  hap-
-     pens quickly.
+     sequences of non-digits cannot be broken, and  failure  happens quickly.
     </literallayout>
    </refsect2>

    <refsect2 id="regexp.reference.conditional">
     <title>Conditional subpatterns</title>
     <literallayout>
-     It is possible to cause the matching process to obey a  sub-
-     pattern  conditionally  or to choose between two alternative
+     It is possible to cause the matching process to obey a  subpattern 
+     conditionally  or to choose between two alternative
     subpatterns, depending on the result  of  an  assertion,  or
     whether  a previous capturing subpattern matched or not. The
     two possible forms of conditional subpattern are
@ -1489,16 +1490,16 @@
       (?(condition)yes-pattern)
       (?(condition)yes-pattern|no-pattern)

-     If the condition is satisfied, the yes-pattern is used; oth-
-     erwise  the  no-pattern  (if  present) is used. If there are
+     If the condition is satisfied, the yes-pattern is used; otherwise
+     the  no-pattern  (if  present) is used. If there are
     more than two alternatives in the subpattern, a compile-time
     error occurs.

     There are two kinds of condition. If the  text  between  the
     parentheses  consists  of  a  sequence  of  digits, then the
     condition is satisfied if the capturing subpattern  of  that
-     number  has  previously matched. Consider the following pat-
-     tern, which contains non-significant white space to make  it
+     number  has  previously matched. Consider the following pattern,
+     which contains non-significant white space to make  it
     more  readable  (assume  the  <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link>   option)  and to
     divide it into three parts for ease of discussion:

@ -1519,9 +1520,9 @@

     If the condition is not a sequence of digits, it must be  an
     assertion.  This  may be a positive or negative lookahead or
-     lookbehind assertion. Consider this pattern, again  contain-
-     ing  non-significant  white space, and with the two alterna-
-     tives on the second line:
+     lookbehind assertion. Consider this pattern, again  containing
+     non-significant  white space, and with the two alternatives on
+     the second line:

       (?(?=[^a-z]*[a-z])
       \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
@ -1563,7 +1564,8 @@
     expressions to recurse (amongst other things).  The  special 
     item (?R) is  provided for  the specific  case of recursion. 
     This PCRE  pattern  solves the  parentheses  problem (assume 
-     the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link>  option is set so that white space is 
+     the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link>
+     option is set so that white space is 
     ignored):

       \( ( (?>[^()]+) | (?R) )* \)
@ -1575,15 +1577,15 @@
     a closing parenthesis.

     This particular example pattern  contains  nested  unlimited
-     repeats, and so the use of a once-only subpattern for match-
-     ing strings of non-parentheses is  important  when  applying
+     repeats, and so the use of a once-only subpattern for matching
+     strings of non-parentheses is  important  when  applying
     the  pattern to strings that do not match. For example, when
     it is applied to

       (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()

-     it yields "no match" quickly. However, if a  once-only  sub-
-     pattern  is  not  used,  the match runs for a very long time
+     it yields "no match" quickly. However, if a  once-only  subpattern
+     is  not  used,  the match runs for a very long time
     indeed because there are so many different ways the + and  *
     repeats  can carve up the subject, and all have to be tested
     before failure can be reported.
@ -1656,8 +1658,8 @@
     repeat can match 0, 1, 2, 3, or 4 times,  and  for  each  of
     those  cases other than 0, the + repeats can match different
     numbers of times.) When the remainder of the pattern is such
-     that  the entire match is going to fail, PCRE has in princi-
-     ple to try every possible variation, and this  can  take  an
+     that  the entire match is going to fail, PCRE has in principle
+     to try every possible variation, and this  can  take  an
     extremely long time.

     An optimization catches some of the more simple  cases  such