From 88c1f8d6c9fecf352c5ce152a6f758ed013fc2c0 Mon Sep 17 00:00:00 2001 From: "Christoph M. Becker" Date: Fri, 19 Feb 2021 15:12:33 +0100 Subject: [PATCH] Fix/clarify dirname/basename docs wrt. locales For basename(), we declare the behavior regarding invalid characters in the path as being undefined, since that depends on the availability of mblen, and also on the position of the invalid characters prior to PHP 8.0.0[1]. dirname() is actually not local-aware, but relies on an ASCII compatible character encoding regarding the directory separator. On Windows, it is however, dependent on the currently set codepage (although a fallback is still in place to use the Windows ANSI codepage of the operating system[2], if the string is not valid for the current codepage). Again, we declared failure to comply to these assumptions as resulting in undefined behavior. Users should make sure to pass valid strings. [1] [2] --- reference/filesystem/functions/basename.xml | 2 ++ reference/filesystem/functions/dirname.xml | 13 ++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/reference/filesystem/functions/basename.xml b/reference/filesystem/functions/basename.xml index 0396d4e896..6ed4f3966a 100644 --- a/reference/filesystem/functions/basename.xml +++ b/reference/filesystem/functions/basename.xml @@ -29,6 +29,8 @@ basename is locale aware, so for it to see the correct basename with multibyte character paths, the matching locale must be set using the setlocale function. + If path contains characters which are invalid for the + current locale, the behavior of basename is undefined. diff --git a/reference/filesystem/functions/dirname.xml b/reference/filesystem/functions/dirname.xml index d9bcfe001b..a059fce237 100644 --- a/reference/filesystem/functions/dirname.xml +++ b/reference/filesystem/functions/dirname.xml @@ -27,9 +27,16 @@ - dirname is locale aware, so for it to see the - correct directory name with multibyte character paths, the matching locale must - be set using the setlocale function. + On Windows, dirname assumes the currently set codepage, so for it to see the + correct directory name with multibyte character paths, the matching codepage must + be set. + If path contains characters which are invalid for the + current codepage, the behavior of basename is undefined. + + + On other systems, dirname assumes path + to be encoded in an ASCII compatible encoding. Otherwise the behavior of the + the function is undefined.