diff --git a/reference/pcre/pattern.syntax.xml b/reference/pcre/pattern.syntax.xml index b089853d0e..c8eb07f340 100644 --- a/reference/pcre/pattern.syntax.xml +++ b/reference/pcre/pattern.syntax.xml @@ -1,5 +1,5 @@ - + @@ -80,8 +80,8 @@ - Fairly obviously, PCRE does not support the (?{code}) - construction. + Fairly obviously, PCRE does not support the (?{code}) and (??{code}) + construction. However, there is support for recursive patterns. @@ -658,18 +658,18 @@ - The property names represented by xx above are limited to the Unicode - general category properties. Each character has exactly one such - property, specified by a two-letter abbreviation. For compatibility with + The property names represented by xx above are limited + to the Unicode general category properties. Each character has exactly one + such property, specified by a two-letter abbreviation. For compatibility with Perl, negation can be specified by including a circumflex between the - opening brace and the property name. For example, \p{^Lu} is the same - as \P{Lu}. + opening brace and the property name. For example, \p{^Lu} + is the same as \P{Lu}. - If only one letter is specified with \p or \P, it includes all the - properties that start with that letter. In this case, in the absence of - negation, the curly brackets in the escape sequence are optional; these - two examples have the same effect: + If only one letter is specified with \p or + \P, it includes all the properties that start with that + letter. In this case, in the absence of negation, the curly brackets in the + escape sequence are optional; these two examples have the same effect: \p{L} @@ -728,9 +728,9 @@ For example, \p{Lu} always matches only upper case letters. - The \X escape matches any number of Unicode characters that form an - extended Unicode sequence. \X is equivalent to - (?>\PM\pM*). + The \X escape matches any number of Unicode characters + that form an extended Unicode sequence. \X is equivalent + to (?>\PM\pM*). That is, it matches a character without the "mark" property, followed @@ -741,8 +741,9 @@ Matching characters by Unicode property is not fast, because PCRE has to search a structure that contains data for over fifteen thousand - characters. That is why the traditional escape sequences such as \d and - \w do not use Unicode properties in PCRE. + characters. That is why the traditional escape sequences such as + \d and \w do not use Unicode properties + in PCRE. @@ -801,7 +802,8 @@ Note that the sequences \A, \Z, and \z can be used to match the start and end of the subject in both modes, and if all branches of a pattern start with \A is it always anchored, - whether PCRE_MULTILINE is set or not. + whether PCRE_MULTILINE + is set or not. @@ -940,8 +942,8 @@ PCRE_DOTALL, PCRE_UNGREEDY, PCRE_EXTRA, - and PCRE_EXTENDED - can be changed from within the pattern by + PCRE_EXTENDED + and PCRE_DUPNAMES can be changed from within the pattern by a sequence of Perl option letters enclosed between "(?" and ")". The option letters are: @@ -973,6 +975,10 @@ X for PCRE_EXTRA + + J + for PCRE_INFO_JCHANGED + @@ -1021,7 +1027,8 @@ compile time. There would be some very weird behaviour otherwise. - The PCRE-specific options PCRE_UNGREEDY and + The PCRE-specific options PCRE_UNGREEDY and PCRE_EXTRA can be changed in the same way as the Perl-compatible options by using the characters U and X respectively. The (?X) flag @@ -1106,9 +1113,9 @@ It is possible to name the subpattern with - (?P<name>pattern) since PHP 4.3.3. Array with matches will - contain the match indexed by the string alongside the match indexed by - a number, then. + (?P<name>pattern) since PHP 4.3.3. Array with + matches will contain the match indexed by the string alongside the match + indexed by a number, then. @@ -1237,7 +1244,8 @@ that is the only way the rest of the pattern matches. - If the PCRE_UNGREEDY option is set (an option which is not + If the PCRE_UNGREEDY + option is set (an option which is not available in Perl) then the quantifiers are not greedy by default, but individual ones can be made greedy by following them with a question mark. In other words, it inverts the @@ -1248,7 +1256,8 @@ as many characters as possible and don't return to match the rest of the pattern. Thus .*abc matches "aabc" but .*+abc doesn't because .*+ eats the - whole string. Possessive quantifiers can be used to speed up processing since PHP 4.3.3. + whole string. Possessive quantifiers can be used to speed up processing + since PHP 4.3.3. When a parenthesized subpattern is quantified with a minimum @@ -1257,7 +1266,8 @@ proportion to the size of the minimum or maximum. - If a pattern starts with .* or .{0,} and the PCRE_DOTALL + If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent to Perl's /s) is set, thus allowing the . to match newlines, then the pattern is implicitly anchored, because whatever follows will be tried against every character @@ -1265,7 +1275,9 @@ retrying the overall match at any position after the first. PCRE treats such a pattern as though it were preceded by \A. In cases where it is known that the subject string contains - no newlines, it is worth setting PCRE_DOTALL when the pattern begins with .* in order to + no newlines, it is worth setting PCRE_DOTALL when the + pattern begins with .* in order to obtain this optimization, or alternatively using ^ to indicate anchoring explicitly. @@ -1337,8 +1349,9 @@ following the backslash are taken as part of a potential back reference number. If the pattern continues with a digit character, then some delimiter must be used to terminate the - back reference. If the PCRE_EXTENDED option is set, this can - be whitespace. Otherwise an empty comment can be used. + back reference. If the PCRE_EXTENDED option + is set, this can be whitespace. Otherwise an empty comment can be used. A back reference that occurs inside the parentheses to which @@ -1360,8 +1373,8 @@ Back references to the named subpatterns can be achieved by (?P=name) or, since PHP 5.2.4, also by - \k<name>, \k'name' or - \k{name}. + \k<name>, \k'name', + \k{name} or \g{name}. @@ -1636,8 +1649,9 @@ condition is satisfied if the capturing subpattern of that number has previously matched. Consider the following pattern, which contains non-significant white space to make it - more readable (assume the PCRE_EXTENDED option) and to - divide it into three parts for ease of discussion: + more readable (assume the PCRE_EXTENDED + option) and to divide it into three parts for ease of discussion: ( \( )? [^()]+ (?(1) \) ) @@ -1693,9 +1707,10 @@ comment play no part in the pattern matching at all. - If the PCRE_EXTENDED option is set, an unescaped # character - outside a character class introduces a comment that - continues up to the next newline character in the pattern. + If the PCRE_EXTENDED + option is set, an unescaped # character outside a character class + introduces a comment that continues up to the next newline character + in the pattern. @@ -1763,9 +1778,10 @@ - Since PHP 4.3.3, (?1), (?2) and so on can be used - for recursive subpatterns too. It is also possible to use named - subpatterns: (?P<name>foo). + Since PHP 4.3.3, (?1), (?2) and so on + can be used for recursive subpatterns too. It is also possible to use named + subpatterns: (?P>name) or + (?P&name). If the syntax for a recursive subpattern reference (either by number or @@ -1803,10 +1819,12 @@ regular expressions for efficient performance. - When a pattern begins with .* and the PCRE_DOTALL option is + When a pattern begins with .* and the PCRE_DOTALL option is set, the pattern is implicitly anchored by PCRE, since it can match only at the start of a subject string. However, if - PCRE_DOTALL is not set, PCRE cannot make this optimization, + PCRE_DOTALL + is not set, PCRE cannot make this optimization, because the . metacharacter does not then match a newline, and if the subject string contains newlines, the pattern may match from the character immediately following one of them @@ -1822,7 +1840,8 @@ If you are using such a pattern with subject strings that do not contain newlines, the best performance is obtained by - setting PCRE_DOTALL, or starting the pattern with ^.* to + setting PCRE_DOTALL, + or starting the pattern with ^.* to indicate explicit anchoring. That saves PCRE from having to scan along the subject looking for a newline to restart at.