php-doc-en/reference/xml/encoding.xml
Hannes Magnusson 7f9b25eef2 MFB: Upgrade to the new-reference-structure
- (Created missing setup sections in setup.xml, if any)
 - Moved the intro to book.xml
 - Changed the intro ID from <extname>.intro to intro.<extname>
 - Moved the constants entity to book.xml
 - Changed constants.xml to be an appendix
 - Changed the xml.encoding section to be an article (encoding.xml)
 - Changed the xml.case-folding section to be an article (case-folding)
 - Changed the xml.error-codes section to be an article (error-codes.xml)
 - Changed the xml.eventhandlers section to be an article (eventhandlers.xml)
 - Moved the examples into its own chapter (examples.xml)
 - Moved the configuration, requirements and resources sections to setup.xml
 - Moved the configure entity to setup.xml


git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@248894 c90b9560-bf6c-de11-be94-00142212c4b1
2007-12-23 22:39:46 +00:00

69 lines
2.5 KiB
XML

<?xml version="1.0" encoding="utf-8"?>
<!-- $Revision: 1.2 $ -->
<article xml:id="xml.encoding" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
<title>Character Encoding</title>
<para>
PHP's XML extension supports the <link
xlink:href="&url.unicode;">Unicode</link> character set through
different <glossterm>character encoding</glossterm>s. There are
two types of character encodings, <glossterm>source
encoding</glossterm> and <glossterm>target encoding</glossterm>.
PHP's internal representation of the document is always encoded
with <literal>UTF-8</literal>.
</para>
<para>
Source encoding is done when an XML document is <link
linkend="function.xml-parse">parsed</link>. Upon <link
linkend="function.xml-parser-create">creating an XML
parser</link>, a source encoding can be specified (this encoding
can not be changed later in the XML parser's lifetime). The
supported source encodings are <literal>ISO-8859-1</literal>,
<literal>US-ASCII</literal> and <literal>UTF-8</literal>. The
former two are single-byte encodings, which means that each
character is represented by a single byte.
<literal>UTF-8</literal> can encode characters composed by a
variable number of bits (up to 21) in one to four bytes. The
default source encoding used by PHP is
<literal>ISO-8859-1</literal>.
</para>
<para>
Target encoding is done when PHP passes data to XML handler
functions. When an XML parser is created, the target encoding
is set to the same as the source encoding, but this may be
changed at any point. The target encoding will affect character
data as well as tag names and processing instruction targets.
</para>
<para>
If the XML parser encounters characters outside the range that
its source encoding is capable of representing, it will return
an error.
</para>
<para>
If PHP encounters characters in the parsed XML document that can
not be represented in the chosen target encoding, the problem
characters will be "demoted". Currently, this means that such
characters are replaced by a question mark.
</para>
</article>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"../../../manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->