mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-15 16:38:54 +00:00

still using this, after discussion on the phpdoc list. From now on, manual.ced will need to be found at ~/.phpdoc/manual.ced. git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@288721 c90b9560-bf6c-de11-be94-00142212c4b1
69 lines
2.5 KiB
XML
69 lines
2.5 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!-- $Revision$ -->
|
|
<article xml:id="xml.encoding" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
|
|
<title>Character Encoding</title>
|
|
<para>
|
|
PHP's XML extension supports the <link
|
|
xlink:href="&url.unicode;">Unicode</link> character set through
|
|
different <glossterm>character encoding</glossterm>s. There are
|
|
two types of character encodings, <glossterm>source
|
|
encoding</glossterm> and <glossterm>target encoding</glossterm>.
|
|
PHP's internal representation of the document is always encoded
|
|
with <literal>UTF-8</literal>.
|
|
</para>
|
|
<para>
|
|
Source encoding is done when an XML document is <link
|
|
linkend="function.xml-parse">parsed</link>. Upon <link
|
|
linkend="function.xml-parser-create">creating an XML
|
|
parser</link>, a source encoding can be specified (this encoding
|
|
can not be changed later in the XML parser's lifetime). The
|
|
supported source encodings are <literal>ISO-8859-1</literal>,
|
|
<literal>US-ASCII</literal> and <literal>UTF-8</literal>. The
|
|
former two are single-byte encodings, which means that each
|
|
character is represented by a single byte.
|
|
<literal>UTF-8</literal> can encode characters composed by a
|
|
variable number of bits (up to 21) in one to four bytes. The
|
|
default source encoding used by PHP is
|
|
<literal>ISO-8859-1</literal>.
|
|
</para>
|
|
<para>
|
|
Target encoding is done when PHP passes data to XML handler
|
|
functions. When an XML parser is created, the target encoding
|
|
is set to the same as the source encoding, but this may be
|
|
changed at any point. The target encoding will affect character
|
|
data as well as tag names and processing instruction targets.
|
|
</para>
|
|
<para>
|
|
If the XML parser encounters characters outside the range that
|
|
its source encoding is capable of representing, it will return
|
|
an error.
|
|
</para>
|
|
<para>
|
|
If PHP encounters characters in the parsed XML document that can
|
|
not be represented in the chosen target encoding, the problem
|
|
characters will be "demoted". Currently, this means that such
|
|
characters are replaced by a question mark.
|
|
</para>
|
|
</article>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-omittag:t
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
indent-tabs-mode:nil
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:nil
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
vim600: syn=xml fen fdm=syntax fdl=2 si
|
|
vim: et tw=78 syn=sgml
|
|
vi: ts=1 sw=1
|
|
-->
|
|
|