Fix/clarify dirname/basename docs wrt. locales

For basename(), we declare the behavior regarding invalid characters in
the path as being undefined, since that depends on the availability of
mblen, and also on the position of the invalid characters prior to PHP
8.0.0[1].

dirname() is actually not local-aware, but relies on an ASCII
compatible character encoding regarding the directory separator.  On
Windows, it is however, dependent on the currently set codepage
(although a fallback is still in place to use the Windows ANSI codepage
of the operating system[2], if the string is not valid for the current
codepage).

Again, we declared failure to comply to these assumptions as resulting
in undefined behavior.  Users should make sure to pass valid strings.

[1] <http://git.php.net/?p=php-src.git;a=commitdiff;h=90705d44e3da1d0aa7b8b4fd921ec597391eccb2>
[2] <5e01542526/win32/codepage.h (L95-L106)>
This commit is contained in:
Christoph M. Becker 2021-02-19 15:12:33 +01:00
parent 871df69f47
commit 88c1f8d6c9
2 changed files with 12 additions and 3 deletions

View file

@ -29,6 +29,8 @@
<function>basename</function> is locale aware, so for it to see the
correct basename with multibyte character paths, the matching locale must
be set using the <function>setlocale</function> function.
If <parameter>path</parameter> contains characters which are invalid for the
current locale, the behavior of <function>basename</function> is undefined.
</para>
</caution>
</refsect1>

View file

@ -27,9 +27,16 @@
</note>
<caution>
<para>
<function>dirname</function> is locale aware, so for it to see the
correct directory name with multibyte character paths, the matching locale must
be set using the <function>setlocale</function> function.
On Windows, <function>dirname</function> assumes the currently set codepage, so for it to see the
correct directory name with multibyte character paths, the matching codepage must
be set.
If <parameter>path</parameter> contains characters which are invalid for the
current codepage, the behavior of <function>basename</function> is undefined.
</para>
<para>
On other systems, <function>dirname</function> assumes <parameter>path</parameter>
to be encoded in an ASCII compatible encoding. Otherwise the behavior of the
the function is undefined.
</para>
</caution>
</refsect1>