mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-19 02:18:56 +00:00
70 lines
2.5 KiB
XML
70 lines
2.5 KiB
XML
![]() |
<?xml version="1.0" encoding="utf-8"?>
|
||
|
<!-- $Revision: 1.2 $ -->
|
||
|
<article xml:id="xml.encoding" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||
|
<title>Character Encoding</title>
|
||
|
<para>
|
||
|
PHP's XML extension supports the <link
|
||
|
xlink:href="&url.unicode;">Unicode</link> character set through
|
||
|
different <glossterm>character encoding</glossterm>s. There are
|
||
|
two types of character encodings, <glossterm>source
|
||
|
encoding</glossterm> and <glossterm>target encoding</glossterm>.
|
||
|
PHP's internal representation of the document is always encoded
|
||
|
with <literal>UTF-8</literal>.
|
||
|
</para>
|
||
|
<para>
|
||
|
Source encoding is done when an XML document is <link
|
||
|
linkend="function.xml-parse">parsed</link>. Upon <link
|
||
|
linkend="function.xml-parser-create">creating an XML
|
||
|
parser</link>, a source encoding can be specified (this encoding
|
||
|
can not be changed later in the XML parser's lifetime). The
|
||
|
supported source encodings are <literal>ISO-8859-1</literal>,
|
||
|
<literal>US-ASCII</literal> and <literal>UTF-8</literal>. The
|
||
|
former two are single-byte encodings, which means that each
|
||
|
character is represented by a single byte.
|
||
|
<literal>UTF-8</literal> can encode characters composed by a
|
||
|
variable number of bits (up to 21) in one to four bytes. The
|
||
|
default source encoding used by PHP is
|
||
|
<literal>ISO-8859-1</literal>.
|
||
|
</para>
|
||
|
<para>
|
||
|
Target encoding is done when PHP passes data to XML handler
|
||
|
functions. When an XML parser is created, the target encoding
|
||
|
is set to the same as the source encoding, but this may be
|
||
|
changed at any point. The target encoding will affect character
|
||
|
data as well as tag names and processing instruction targets.
|
||
|
</para>
|
||
|
<para>
|
||
|
If the XML parser encounters characters outside the range that
|
||
|
its source encoding is capable of representing, it will return
|
||
|
an error.
|
||
|
</para>
|
||
|
<para>
|
||
|
If PHP encounters characters in the parsed XML document that can
|
||
|
not be represented in the chosen target encoding, the problem
|
||
|
characters will be "demoted". Currently, this means that such
|
||
|
characters are replaced by a question mark.
|
||
|
</para>
|
||
|
</article>
|
||
|
|
||
|
<!-- Keep this comment at the end of the file
|
||
|
Local variables:
|
||
|
mode: sgml
|
||
|
sgml-omittag:t
|
||
|
sgml-shorttag:t
|
||
|
sgml-minimize-attributes:nil
|
||
|
sgml-always-quote-attributes:t
|
||
|
sgml-indent-step:1
|
||
|
sgml-indent-data:t
|
||
|
indent-tabs-mode:nil
|
||
|
sgml-parent-document:nil
|
||
|
sgml-default-dtd-file:"../../../manual.ced"
|
||
|
sgml-exposed-tags:nil
|
||
|
sgml-local-catalogs:nil
|
||
|
sgml-local-ecat-files:nil
|
||
|
End:
|
||
|
vim600: syn=xml fen fdm=syntax fdl=2 si
|
||
|
vim: et tw=78 syn=sgml
|
||
|
vi: ts=1 sw=1
|
||
|
-->
|
||
|
|