php-doc-en/reference/xml/encoding.xml
Torben Wilson af4410a7e1 Normalized the sgml-default-dtd-file local-variable line for those
still using this, after discussion on the phpdoc list.
From now on, manual.ced will need to be found at ~/.phpdoc/manual.ced.



git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@288721 c90b9560-bf6c-de11-be94-00142212c4b1
2009-09-25 07:04:39 +00:00

69 lines
2.5 KiB
XML

<?xml version="1.0" encoding="utf-8"?>
<!-- $Revision$ -->
<article xml:id="xml.encoding" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>Character Encoding</title>
<para>
PHP's XML extension supports the <link
xlink:href="&url.unicode;">Unicode</link> character set through
different <glossterm>character encoding</glossterm>s. There are
two types of character encodings, <glossterm>source
encoding</glossterm> and <glossterm>target encoding</glossterm>.
PHP's internal representation of the document is always encoded
with <literal>UTF-8</literal>.
</para>
<para>
Source encoding is done when an XML document is <link
linkend="function.xml-parse">parsed</link>. Upon <link
linkend="function.xml-parser-create">creating an XML
parser</link>, a source encoding can be specified (this encoding
can not be changed later in the XML parser's lifetime). The
supported source encodings are <literal>ISO-8859-1</literal>,
<literal>US-ASCII</literal> and <literal>UTF-8</literal>. The
former two are single-byte encodings, which means that each
character is represented by a single byte.
<literal>UTF-8</literal> can encode characters composed by a
variable number of bits (up to 21) in one to four bytes. The
default source encoding used by PHP is
<literal>ISO-8859-1</literal>.
</para>
<para>
Target encoding is done when PHP passes data to XML handler
functions. When an XML parser is created, the target encoding
is set to the same as the source encoding, but this may be
changed at any point. The target encoding will affect character
data as well as tag names and processing instruction targets.
</para>
<para>
If the XML parser encounters characters outside the range that
its source encoding is capable of representing, it will return
an error.
</para>
<para>
If PHP encounters characters in the parsed XML document that can
not be represented in the chosen target encoding, the problem
characters will be "demoted". Currently, this means that such
characters are replaced by a question mark.
</para>
</article>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->