diff --git a/reference/pcre/functions/pcre.pattern.syntax.xml b/reference/pcre/functions/pcre.pattern.syntax.xml index 72fbe41614..73cffdbf51 100644 --- a/reference/pcre/functions/pcre.pattern.syntax.xml +++ b/reference/pcre/functions/pcre.pattern.syntax.xml @@ -1,5 +1,5 @@ - + @@ -25,8 +25,8 @@ By default, a whitespace character is any character that - the C library function isspace() recognizes, though it is - possible to compile PCRE with alternative character type + the C library function isspace() recognizes, though it is + possible to compile PCRE with alternative character type tables. Normally isspace() matches space, formfeed, newline, carriage return, horizontal tab, and vertical tab. Perl 5 no longer includes vertical tab in its set of whitespace characters. @@ -38,19 +38,19 @@ - PCRE does not allow repeat quantifiers on lookahead + PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits them, but they do not mean what you might think. For example, (?!a){3} does not assert that the - next three characters are not "a". It just asserts that the + next three characters are not "a". It just asserts that the next character is not "a" three times. - Capturing subpatterns that occur inside negative looka- - head assertions are counted, but their entries in the - offsets vector are never set. Perl sets its numerical vari- - ables from any such patterns that are matched before the + Capturing subpatterns that occur inside negative looka- + head assertions are counted, but their entries in the + offsets vector are never set. Perl sets its numerical vari- + ables from any such patterns that are matched before the assertion fails to match something (thereby succeeding), but only if the negative lookahead assertion contains just one branch. @@ -59,8 +59,8 @@ Though binary zero characters are supported in the subject string, - they are not allowed in a pattern string because it is passed as a - normal C string, terminated by zero. The escape sequence "\\x00" can + they are not allowed in a pattern string because it is passed as a + normal C string, terminated by zero. The escape sequence "\\x00" can be used in the pattern to represent a binary zero. @@ -80,7 +80,7 @@ - Fairly obviously, PCRE does not support the (?{code}) + Fairly obviously, PCRE does not support the (?{code}) construction. @@ -181,7 +181,7 @@ There are two different sets of meta-characters: those that are recognized anywhere in the pattern except within square - brackets, and those that are recognized in square brackets. + brackets, and those that are recognized in square brackets. Outside square brackets, the meta-characters are as follows: @@ -196,7 +196,7 @@ ^ - assert start of subject (or line, in multiline mode) + assert start of subject (or line, in multiline mode) @@ -298,8 +298,8 @@ - Part of a pattern that is in square brackets is called a - "character class". In a character class the only meta- + Part of a pattern that is in square brackets is called a + "character class". In a character class the only meta- characters are: @@ -335,7 +335,7 @@ - The following sections describe the use of each of the + The following sections describe the use of each of the meta-characters. @@ -343,16 +343,16 @@ backslash The backslash character has several uses. Firstly, if it is - followed by a non-alphameric character, it takes away any - special meaning that character may have. This use of - backslash as an escape character applies both inside and + followed by a non-alphanumeric character, it takes away any + special meaning that character may have. This use of + backslash as an escape character applies both inside and outside character classes. For example, if you want to match a "*" character, you write "\*" in the pattern. This applies whether or not the follow- - ing character would otherwise be interpreted as a meta- - character, so it is always safe to precede a non-alphameric + ing character would otherwise be interpreted as a meta- + character, so it is always safe to precede a non-alphanumeric with "\" to specify that it stands for itself. In particu- lar, if you want to match a backslash, you write "\\". @@ -365,11 +365,11 @@ of the pattern. - A second use of backslash provides a way of encoding non- - printing characters in patterns in a visible manner. There - is no restriction on the appearance of non-printing charac- - ters, apart from the binary zero that terminates a pattern, - but when a pattern is being prepared by text editing, it is + A second use of backslash provides a way of encoding non- + printing characters in patterns in a visible manner. There + is no restriction on the appearance of non-printing characters, + apart from the binary zero that terminates a pattern, + but when a pattern is being prepared by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents: @@ -450,38 +450,41 @@ - The precise effect of "\cx" is as follows: if "x" is a lower - case letter, it is converted to upper case. Then bit 6 of - the character (hex 40) is inverted. Thus "\cz" becomes hex - 1A, but "\c{" becomes hex 3B, while "\c;" becomes hex 7B. + The precise effect of "\cx" is as follows: + if "x" is a lower case letter, it is converted + to upper case. Then bit 6 of the character (hex 40) is inverted. + Thus "\cz" becomes hex 1A, but + "\c{" becomes hex 3B, while "\c;" + becomes hex 7B. - After "\x", up to two hexadecimal digits are read (letters - can be in upper or lower case). + After "\x", up to two hexadecimal digits are + read (letters can be in upper or lower case). - After "\0" up to two further octal digits are read. In both - cases, if there are fewer than two digits, just those that - are present are used. Thus the sequence "\0\x\07" specifies - two binary zeros followed by a BEL character. Make sure you - supply two digits after the initial zero if the character + After "\0" up to two further octal digits are read. + In both cases, if there are fewer than two digits, just those that + are present are used. Thus the sequence "\0\x\07" + specifies two binary zeros followed by a BEL character. Make sure you + supply two digits after the initial zero if the character that follows is itself an octal digit. The handling of a backslash followed by a digit other than 0 - is complicated. Outside a character class, PCRE reads it + is complicated. Outside a character class, PCRE reads it and any following digits as a decimal number. If the number is less than 10, or if there have been at least that many previous capturing left parentheses in the expression, the - entire sequence is taken as a back reference. A description + entire sequence is taken as a back + reference. A description of how this works is given later, following the discussion of parenthesized subpatterns. Inside a character class, or if the decimal number is - greater than 9 and there have not been that many capturing - subpatterns, PCRE re-reads up to three octal digits follow- - ing the backslash, and generates a single byte from the + greater than 9 and there have not been that many capturing + subpatterns, PCRE re-reads up to three octal digits following + the backslash, and generates a single byte from the least significant 8 bits of the value. Any subsequent digits stand for themselves. For example: @@ -566,15 +569,15 @@ - Note that octal values of 100 or greater must not be intro- - duced by a leading zero, because no more than three octal + Note that octal values of 100 or greater must not be intro- + duced by a leading zero, because no more than three octal digits are ever read. - All the sequences that define a single byte value can be + All the sequences that define a single byte value can be used both inside and outside character classes. In addition, - inside a character class, the sequence "\b" is interpreted - as the backspace character (hex 08). Outside a character + inside a character class, the sequence "\b" + is interpreted as the backspace character (hex 08). Outside a character class it has a different meaning (see below). @@ -635,32 +638,32 @@ Each pair of escape sequences partitions the complete set of - characters into two disjoint sets. Any given character + characters into two disjoint sets. Any given character matches one, and only one, of each pair. - A "word" character is any letter or digit or the underscore + A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is - controlled by PCRE's character tables, and may vary if locale-specific - matching is taking place (see "Locale support" + controlled by PCRE's character tables, and may vary if locale-specific + matching is taking place (see "Locale support" above). For example, in the "fr" (French) locale, some char- - acter codes greater than 128 are used for accented letters, + acter codes greater than 128 are used for accented letters, and these are matched by \w. - These character type sequences can appear both inside and + These character type sequences can appear both inside and outside character classes. They each match one character of - the appropriate type. If the current matching point is at + the appropriate type. If the current matching point is at the end of the subject string, all of them fail, since there is no character to match. The fourth use of backslash is for certain simple asser- tions. An assertion specifies a condition that has to be met - at a particular point in a match, without consuming any - characters from the subject string. The use of subpatterns - for more complicated assertions is described below. The + at a particular point in a match, without consuming any + characters from the subject string. The use of subpatterns + for more complicated assertions is described below. The backslashed assertions are @@ -693,7 +696,7 @@ \Z - end of subject or newline at end (independent of + end of subject or newline at end (independent of multiline mode) @@ -702,7 +705,7 @@ \z - end of subject (independent of multiline mode) + end of subject(independent of multiline mode) @@ -714,20 +717,23 @@ character, inside a character class). - A word boundary is a position in the subject string where + A word boundary is a position in the subject string where the current character and the previous character do not both match \w or \W (i.e. one matches \w and the other matches - \W), or the start or end of the string if the first or last - character matches \w, respectively. + \W), or the start or end of the string if the first + or last character matches \w, respectively. - The \A, \Z, and \z assertions differ from the traditional + The \A, \Z, and + \z assertions differ from the traditional circumflex and dollar (described below) in that they only ever match at the very start and end of the subject string, whatever options are set. They are not affected by the - PCRE_NOTBOL or PCRE_NOTEOL options. The difference between - \Z and \z is that \Z + PCRE_NOTBOL or + PCRE_NOTEOL options. + The difference between \Z and + \z is that \Z matches before a newline that is the last character of the string as well as at the end of the string, whereas \z matches only at the end. @@ -744,7 +750,7 @@ different meaning (see below). Circumflex need not be the first character of the pattern if - a number of alternatives are involved, but it should be the + a number of alternatives are involved, but it should be the first thing in each alternative in which it appears if the pattern is ever to match that branch. If all possible alter- natives start with a circumflex, that is, if the pattern is @@ -763,7 +769,8 @@ The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the - PCRE_DOLLAR_ENDONLY option at compile or matching time. This + PCRE_DOLLAR_ENDONLY + option at compile or matching time. This does not affect the \Z assertion. The meanings of the circumflex and dollar characters are @@ -873,7 +880,7 @@ For example, the class [^\W_] matches any letter or digit, but not underscore. - All non-alphameric characters other than \, -, ^ (at the + All non-alphanumeric characters other than \, -, ^ (at the start) and the terminating ] are non-special in character classes, but it does no harm if they are escaped. @@ -887,8 +894,8 @@ gilbert|sullivan - matches either "gilbert" or "sullivan". Any number of alter- - natives may appear, and an empty alternative is permitted + matches either "gilbert" or "sullivan". Any number of alternatives + may appear, and an empty alternative is permitted (matching the empty string). The matching process tries each alternative in turn, from left to right, and the first one that succeeds is used. If the alternatives are within a @@ -933,11 +940,11 @@ abc(?i) which in turn is the same as compiling the pattern abc with - PCRE_CASELESS set. In other words, such "top level" set- - tings apply to the whole pattern (unless there are other - changes inside subpatterns). If there is more than one set- - ting of the same option at top level, the rightmost setting - is used. + PCRE_CASELESS set. + In other words, such "top level" settings apply to the whole + pattern (unless there are other changes inside subpatterns). + If there is more than one setting of the same option at top level, + the rightmost setting is used. If an option change occurs inside a subpattern, the effect is different. This is a change of behaviour in Perl 5.005. @@ -958,8 +965,7 @@ matches "ab", "aB", "c", and "C", even though when matching "C" the first branch is abandoned before the option setting. This is because the effects of option settings happen at - compile time. There would be some very weird behaviour oth- - erwise. + compile time. There would be some very weird behaviour otherwise. The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can @@ -975,25 +981,26 @@ subpatterns Subpatterns are delimited by parentheses (round brackets), - which can be nested. Marking part of a pattern as a subpat- - tern does two things: + which can be nested. Marking part of a pattern as a subpattern + does two things: 1. It localizes a set of alternatives. For example, the pat- tern cat(aract|erpillar|) - matches one of the words "cat", "cataract", or "caterpil- - lar". Without the parentheses, it would match "cataract", + matches one of the words "cat", "cataract", or "caterpillar". + Without the parentheses, it would match "cataract", "erpillar" or the empty string. 2. It sets up the subpattern as a capturing subpattern (as - defined above). When the whole pattern matches, that por- - tion of the subject string that matched the subpattern is - passed back to the caller via the ovector argument of - pcre_exec. Opening parentheses are counted from left to - right (starting from 1) to obtain the numbers of the captur- - ing subpatterns. + defined above). When the whole pattern matches, that portion + of the subject string that matched the subpattern is + passed back to the caller via the ovector + argument of + pcre_exec. Opening parentheses are counted + from left to right (starting from 1) to obtain the numbers of the + capturing subpatterns. For example, if the string "the red king" is matched against the pattern @@ -1004,8 +1011,8 @@ and are numbered 1, 2, and 3. The fact that plain parentheses fulfil two functions is not - always helpful. There are often times when a grouping sub- - pattern is required without a capturing requirement. If an + always helpful. There are often times when a grouping subpattern + is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, @@ -1015,8 +1022,8 @@ the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and - are numbered 1 and 2. The maximum number of captured sub- - strings is 99, and the maximum number of all subpatterns, + are numbered 1 and 2. The maximum number of captured substrings + is 99, and the maximum number of all subpatterns, both capturing and non-capturing, is 200. As a convenient shorthand, if any option settings are @@ -1072,8 +1079,8 @@ matches exactly 8 digits. An opening curly bracket that appears in a position where a quantifier is not allowed, or one that does not match the syntax of a quantifier, is taken - as a literal character. For example, {,6} is not a quantif- - ier, but a literal string of four characters. + as a literal character. For example, {,6} is not a quantifier, + but a literal string of four characters. The quantifier {0} is permitted, causing the expression to behave as if the previous item and the quantifier were not @@ -1099,13 +1106,13 @@ fact match no characters, the loop is forcibly broken. By default, the quantifiers are "greedy", that is, they - match as much as possible (up to the maximum number of per- - mitted times), without causing the rest of the pattern to + match as much as possible (up to the maximum number of permitted + times), without causing the rest of the pattern to fail. The classic example of where this gives problems is in trying to match comments in C programs. These appear between the sequences /* and */ and within the sequence, individual - * and / characters may appear. An attempt to match C com- - ments by applying the pattern + * and / characters may appear. An attempt to match C comments + by applying the pattern /\*.*\*/ @@ -1123,8 +1130,8 @@ /\*.*?\*/ does the right thing with the C comments. The meaning of the - various quantifiers is not otherwise changed, just the pre- - ferred number of matches. Do not confuse this use of ques- + various quantifiers is not otherwise changed, just the preferred + number of matches. Do not confuse this use of ques- tion mark with its use as a quantifier in its own right. Because it has two uses, it can sometimes appear doubled, as in @@ -1141,33 +1148,32 @@ default behaviour. When a parenthesized subpattern is quantified with a minimum - repeat count that is greater than 1 or with a limited max- - imum, more store is required for the compiled pattern, in + repeat count that is greater than 1 or with a limited maximum, + more store is required for the compiled pattern, in proportion to the size of the minimum or maximum. If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent to Perl's /s) is set, thus allowing the . to match newlines, then the pattern is implicitly anchored, - because whatever follows will be tried against every charac- - ter position in the subject string, so there is no point in + because whatever follows will be tried against every character + position in the subject string, so there is no point in retrying the overall match at any position after the first. PCRE treats such a pattern as though it were preceded by \A. In cases where it is known that the subject string contains - no newlines, it is worth setting PCRE_DOTALL when the pat- - tern begins with .* in order to obtain this optimization, or + no newlines, it is worth setting PCRE_DOTALL when the pattern begins with .* in order to + obtain this optimization, or alternatively using ^ to indicate anchoring explicitly. When a capturing subpattern is repeated, the value captured - is the substring that matched the final iteration. For exam- - ple, after + is the substring that matched the final iteration. For example, after (tweedle[dume]{3}\s*)+ - has matched "tweedledum tweedledee" the value of the cap- - tured substring is "tweedledee". However, if there are + has matched "tweedledum tweedledee" the value of the captured + substring is "tweedledee". However, if there are nested capturing subpatterns, the corresponding captured - values may have been set in previous iterations. For exam- - ple, after + values may have been set in previous iterations. For example, + after /(a|(b))+/ @@ -1191,28 +1197,27 @@ left parentheses in the entire pattern. In other words, the parentheses that are referenced need not be to the left of the reference for numbers less than 10. See the section - entitled "Backslash" above for further details of the han- - dling of digits following a backslash. + entitled "Backslash" above for further details of the handling + of digits following a backslash. - A back reference matches whatever actually matched the cap- - turing subpattern in the current subject string, rather than + A back reference matches whatever actually matched the capturing + subpattern in the current subject string, rather than anything matching the subpattern itself. So the pattern (sens|respons)e and \1ibility - matches "sense and sensibility" and "response and responsi- - bility", but not "sense and responsibility". If caseful + matches "sense and sensibility" and "response and responsibility", + but not "sense and responsibility". If caseful matching is in force at the time of the back reference, then the case of letters is relevant. For example, ((?i)rah)\s+\1 matches "rah rah" and "RAH RAH", but not "RAH rah", even - though the original capturing subpattern is matched case- - lessly. + though the original capturing subpattern is matched caselessly. - There may be more than one back reference to the same sub- - pattern. If a subpattern has not actually been used in a + There may be more than one back reference to the same subpattern. + If a subpattern has not actually been used in a particular match, then any back references to it always fail. For example, the pattern @@ -1229,15 +1234,14 @@ A back reference that occurs inside the parentheses to which it refers fails when the subpattern is first used, so, for example, (a\1) never matches. However, such references can - be useful inside repeated subpatterns. For example, the pat- - tern + be useful inside repeated subpatterns. For example, the pattern (a|b\1)+ matches any number of "a"s and also "aba", "ababaa" etc. At each iteration of the subpattern, the back reference matches - the character string corresponding to the previous itera- - tion. In order for this to work, the pattern must be such + the character string corresponding to the previous iteration. + In order for this to work, the pattern must be such that the first iteration does not need to match the back reference. This can be done using alternation, as in the example above, or by a quantifier with a minimum of zero. @@ -1250,8 +1254,8 @@ An assertion is a test on the characters following or preceding the current matching point that does not actually consume any characters. The simple assertions coded as \b, - \B, \A, \Z, \z, ^ and $ are described above. More compli- - cated assertions are coded as subpatterns. There are two + \B, \A, \Z, \z, ^ and $ are described above. More complicated + assertions are coded as subpatterns. There are two kinds: those that look ahead of the current position in the subject string, and those that look behind it. @@ -1278,8 +1282,8 @@ when the next three characters are "bar". A lookbehind assertion is needed to achieve this effect. - Lookbehind assertions start with (?<= for positive asser- - tions and (?<! for negative assertions. For example, + Lookbehind assertions start with (?<= for positive assertions + and (?<! for negative assertions. For example, (?<!foo)bar @@ -1295,8 +1299,8 @@ (?<!dogs?|cats?) - causes an error at compile time. Branches that match dif- - ferent length strings are permitted only at the top level of + causes an error at compile time. Branches that match different + length strings are permitted only at the top level of a lookbehind assertion. This is an extension compared with Perl 5.005, which requires all branches to match the same length of string. An assertion such as @@ -1304,8 +1308,8 @@ (?<=ab(c|de)) is not permitted, because its single top-level branch can - match two different lengths, but it is acceptable if rewrit- - ten to use two top-level branches: + match two different lengths, but it is acceptable if rewritten + to use two top-level branches: (?<=abc|abde) @@ -1314,8 +1318,8 @@ by the fixed width and then try to match. If there are insufficient characters before the current position, the match is deemed to fail. Lookbehinds in conjunction with - once-only subpatterns can be particularly useful for match- - ing at the ends of strings; an example is given at the end + once-only subpatterns can be particularly useful for matching + at the ends of strings; an example is given at the end of the section on once-only subpatterns. Several assertions (of any sort) may occur in succession. @@ -1398,23 +1402,22 @@ This kind of parenthesis "locks up" the part of the pattern it contains once it has matched, and a failure further into the pattern is prevented from backtracking into it. Back- - tracking past it to previous items, however, works as nor- - mal. + tracking past it to previous items, however, works as normal. An alternative description is that a subpattern of this type - matches the string of characters that an identical stan- - dalone pattern would match, if anchored at the current point + matches the string of characters that an identical standalone + pattern would match, if anchored at the current point in the subject string. Once-only subpatterns are not capturing subpatterns. Simple - cases such as the above example can be thought of as a max- - imizing repeat that must swallow everything it can. So, + cases such as the above example can be thought of as a maximizing + repeat that must swallow everything it can. So, while both \d+ and \d+? are prepared to adjust the number of digits they match in order to make the rest of the pattern match, (?>\d+) can only match an entire sequence of digits. - This construction can of course contain arbitrarily compli- - cated subpatterns, and it can be nested. + This construction can of course contain arbitrarily complicated + subpatterns, and it can be nested. Once-only subpatterns can be used in conjunction with look- behind assertions to specify efficient matching at the end @@ -1442,19 +1445,18 @@ match only the entire string. The subsequent lookbehind assertion does a single test on the last four characters. If it fails, the match fails immediately. For long strings, - this approach makes a significant difference to the process- - ing time. + this approach makes a significant difference to the processing time. - When a pattern contains an unlimited repeat inside a subpat- - tern that can itself be repeated an unlimited number of + When a pattern contains an unlimited repeat inside a subpattern + that can itself be repeated an unlimited number of times, the use of a once-only subpattern is the only way to avoid some failing matches taking a very long time indeed. The pattern (\D+|<\d+>)*[!?] - matches an unlimited number of substrings that either con- - sist of non-digits, or digits enclosed in <>, followed by + matches an unlimited number of substrings that either consist + of non-digits, or digits enclosed in <>, followed by either ! or ?. When it matches, it runs quickly. However, if it is applied to @@ -1462,8 +1464,8 @@ it takes a long time before reporting failure. This is because the string can be divided between the two repeats in - a large number of ways, and all have to be tried. (The exam- - ple used [!?] rather than a single character at the end, + a large number of ways, and all have to be tried. (The example + used [!?] rather than a single character at the end, because both PCRE and Perl have an optimization that allows for fast failure when a single character is used. They remember the last single character that is required for a @@ -1472,16 +1474,15 @@ ((?>\D+)|<\d+>)*[!?] - sequences of non-digits cannot be broken, and failure hap- - pens quickly. + sequences of non-digits cannot be broken, and failure happens quickly. Conditional subpatterns - It is possible to cause the matching process to obey a sub- - pattern conditionally or to choose between two alternative + It is possible to cause the matching process to obey a subpattern + conditionally or to choose between two alternative subpatterns, depending on the result of an assertion, or whether a previous capturing subpattern matched or not. The two possible forms of conditional subpattern are @@ -1489,16 +1490,16 @@ (?(condition)yes-pattern) (?(condition)yes-pattern|no-pattern) - If the condition is satisfied, the yes-pattern is used; oth- - erwise the no-pattern (if present) is used. If there are + If the condition is satisfied, the yes-pattern is used; otherwise + the no-pattern (if present) is used. If there are more than two alternatives in the subpattern, a compile-time error occurs. There are two kinds of condition. If the text between the parentheses consists of a sequence of digits, then the condition is satisfied if the capturing subpattern of that - number has previously matched. Consider the following pat- - tern, which contains non-significant white space to make it + number has previously matched. Consider the following pattern, + which contains non-significant white space to make it more readable (assume the PCRE_EXTENDED option) and to divide it into three parts for ease of discussion: @@ -1519,9 +1520,9 @@ If the condition is not a sequence of digits, it must be an assertion. This may be a positive or negative lookahead or - lookbehind assertion. Consider this pattern, again contain- - ing non-significant white space, and with the two alterna- - tives on the second line: + lookbehind assertion. Consider this pattern, again containing + non-significant white space, and with the two alternatives on + the second line: (?(?=[^a-z]*[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) @@ -1563,7 +1564,8 @@ expressions to recurse (amongst other things). The special item (?R) is provided for the specific case of recursion. This PCRE pattern solves the parentheses problem (assume - the PCRE_EXTENDED option is set so that white space is + the PCRE_EXTENDED + option is set so that white space is ignored): \( ( (?>[^()]+) | (?R) )* \) @@ -1575,15 +1577,15 @@ a closing parenthesis. This particular example pattern contains nested unlimited - repeats, and so the use of a once-only subpattern for match- - ing strings of non-parentheses is important when applying + repeats, and so the use of a once-only subpattern for matching + strings of non-parentheses is important when applying the pattern to strings that do not match. For example, when it is applied to (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() - it yields "no match" quickly. However, if a once-only sub- - pattern is not used, the match runs for a very long time + it yields "no match" quickly. However, if a once-only subpattern + is not used, the match runs for a very long time indeed because there are so many different ways the + and * repeats can carve up the subject, and all have to be tested before failure can be reported. @@ -1656,8 +1658,8 @@ repeat can match 0, 1, 2, 3, or 4 times, and for each of those cases other than 0, the + repeats can match different numbers of times.) When the remainder of the pattern is such - that the entire match is going to fail, PCRE has in princi- - ple to try every possible variation, and this can take an + that the entire match is going to fail, PCRE has in principle + to try every possible variation, and this can take an extremely long time. An optimization catches some of the more simple cases such