Added a few grammatical fixes and provided a more in-depth explanation of why we need mbstring because of the limitations of a byte.

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@210636 c90b9560-bf6c-de11-be94-00142212c4b1
2025-03-16 08:58:56 +00:00 · 2006-04-03 21:39:59 +00:00 · 2006-04-03 21:39:59 +00:00 · 7adb369ad7
commit 7adb369ad7
parent 65201dc069
1 changed files with 52 additions and 59 deletions
--- a/reference/mbstring/reference.xml
+++ b/reference/mbstring/reference.xml
@ -1,5 +1,5 @@
 <?xml version="1.0" encoding="iso-8859-1"?>
-<!-- $Revision: 1.22 $ -->
+<!-- $Revision: 1.23 $ -->
 <!-- Purpose: international -->
 <!-- Membership: bundled -->

@ -12,12 +12,14 @@
    &reftitle.intro;
    <para>
     While there are many languages in which every necessary character can
-     be represented by a one-to-one mapping to a 8-bit value, there are also
+     be represented by a one-to-one mapping to an 8-bit value, there are also
     several languages which require so many characters for written
-     communication that cannot be contained within the range a mere byte can
-     code. Multibyte character encoding schemes were developed to express
-     that many (more than 256) characters in the regular bytewise coding
-     system.
+     communication that they cannot be contained within the range a mere byte 
+     can code (A byte is made up of eight bits. Each bit can contain only two 
+     distinct values, one or zero. Because of this, a byte can only represent 
+     256 unique values (two to the power of eight)). Multibyte character 
+     encoding schemes were developed to express more than 256 characters 
+     in the regular bytewise coding system.
    </para>
    <para>
     When you manipulate (trim, split, splice, etc.) strings encoded in a
@ -29,17 +31,12 @@
     most likely loses its original meaning.
    </para>
    <para>
-     <literal>mbstring</literal> provides these multibyte specific
-     string functions that help you deal with multibyte encodings in PHP,
-     which is basically supposed to be used with single byte encodings.
-     In addition to that, <literal>mbstring</literal> handles character
-     encoding conversion between the possible encoding pairs.
-    </para>
-    <para>
-     <literal>mbstring</literal> is also designed to handle Unicode-based
-     encodings such as UTF-8 and UCS-2 and many single-byte encodings
-     for convenience (listed below), whereas <literal>mbstring</literal> was
-     originally developed for use in Japanese web pages.
+     <literal>mbstring</literal> provides multibyte specific string functions 
+     that help you deal with multibyte encodings in PHP. In addition to that, 
+     <literal>mbstring</literal> handles character encoding conversion between 
+     the possible encoding pairs. <literal>mbstring</literal> is designed to 
+     handle Unicode-based encodings such as UTF-8 and UCS-2 and many 
+     single-byte encodings for convenience (listed below).
    </para>

    <section id="mbstring.php4.req">
@ -115,14 +112,14 @@ JIS, SJIS, ISO-2022-JP, BIG-5
     </note>
     <note>
      <para>
-       If you have some database connected with PHP, it is recommended that
-       you use the same character encoding for both database and the
+       If you are connecting to a database with PHP, it is recommended that
+       you use the same character encoding for both the database and the
       <literal>internal encoding</literal> for ease of use and better
       performance.
      </para>
      <para>
       If you are using PostgreSQL, the character encoding used in the
-       database and the one used in the PHP may differ as it supports
+       database and the one used in PHP may differ as it supports
       automatic character set conversion between the backend and the frontend.
      </para>
     </note>
@ -175,7 +172,7 @@ JIS, SJIS, ISO-2022-JP, BIG-5
        </simpara>
        <para> 
         There is no way to control HTTP input character
-         conversion from PHP script. To disable HTTP input character
+         conversion from a PHP script. To disable HTTP input character
         conversion, it has to be done in &php.ini;.
         <example>
          <title>
@ -207,14 +204,14 @@ mbstring.encoding_translation = Off
         There are several ways to enable output character encoding
         conversion. One is using &php.ini;, another
         is using <function>ob_start</function> with
-         <function>mb_output_handler</function> as
+         <function>mb_output_handler</function> as the 
         <literal>ob_start</literal> callback function.
        </para>
        <note>
         <para>
          PHP3-i18n users should note that <literal>mbstring</literal>'s output
          conversion differs from PHP3-i18n. Character encoding is
-          converted using output buffer.
+          converted using an output buffer.
         </para>
        </note>
       </listitem>
@ -268,7 +265,7 @@ ob_start('mb_output_handler');
     <literal>mbstring</literal> functions.
    </simpara>
    <para>
-     The following character encoding is supported in this PHP
+     The following character encodings are supported in this PHP
     extension: 
    </para>
    <itemizedlist>
@ -330,11 +327,11 @@ ob_start('mb_output_handler');
     <listitem><simpara>KOI8-R</simpara></listitem>
    </itemizedlist>
    <para>
-     &php.ini; entry, which accepts encoding name,
-     accepts &quot;<literal>auto</literal>&quot; and
-     &quot;<literal>pass</literal>&quot; also.
-     <literal>mbstring</literal> functions, which accepts encoding
-     name, and accepts &quot;<literal>auto</literal>&quot;.
+     Any &php.ini; entry which accepts an encoding name
+     can also use the values &quot;<literal>auto</literal>&quot; and
+     &quot;<literal>pass</literal>&quot;.
+     <literal>mbstring</literal> functions which accept an encoding
+     name can also use the value &quot;<literal>auto</literal>&quot;.
    </para>
    <para>
     If &quot;<literal>pass</literal>&quot; is set, no character
@ -358,13 +355,13 @@ ob_start('mb_output_handler');
    </title>
    <para>
     You might often find it difficult to get an existing PHP application
-     work in a given multibyte environment. That's mostly because lots of
-     PHP applications out there are written with the standard
-     string functions such as <function>substr</function>, which are
-     known to not properly handle multibyte-encoded strings.
+     to work in a given multibyte environment. This happens because most 
+     PHP applications out there are written with the standard string 
+     functions such as <function>substr</function>, which are known to 
+     not properly handle multibyte-encoded strings.
    </para>
    <para>
-     mbstring supports 'function overloading' feature which enables
+     mbstring supports a 'function overloading' feature which enables
     you to add multibyte awareness to such an application without
     code modification by overloading multibyte counterparts on
     the standard string functions. For example,
@ -374,13 +371,13 @@ ob_start('mb_output_handler');
     single-byte encodings to a multibyte environment in many cases.
    </para>
    <para>
-     To use the function overloading, set
+     To use function overloading, set
     <literal>mbstring.func_overload</literal> in &php.ini; to a
     positive value that represents a combination of bitmasks specifying
     the categories of functions to be overloaded. It should be set
     to 1 to overload the <function>mail</function> function. 2 for string
     functions, 4 for regular expression functions. For example,
-     if is set for 7, mail, strings and regular expression functions should
+     if it is set to 7, mail, strings and regular expression functions will
     be overloaded. The list of overloaded functions are shown below.
     <table>
      <title>Functions to be overloaded</title>
@ -475,18 +472,13 @@ ob_start('mb_output_handler');
   <section id="mbstring.ja-basic">
    <title>Basics of Japanese multi-byte encodings</title>
    <para>
-     It is often said quite hard to figure out how Japanese texts are
-     handled in the computer. This is not only because Japanese characters
-     can only be represented by multibyte encodings, but because different
-     encoding standards are adopted for different purposes / platforms.
-     Moreover, not a few character set standards are used there, which
-     are slightly different from one another. Those facts have often led
-     developers to inevitable mess-up.
-    </para>
-    <para> 
-     To create a working web application that would be put in the Japanese
-     environment, it is important to use the proper character encoding and
-     character set for the task in hand.
+     Japanese characters can only be represented by multibyte encodings, 
+     and multiple encoding standards are used depending on platform and 
+     text purpose. To make matters worse, these encoding standards 
+     differ slightly from one another. In order to create a web 
+     application which would be usable in a Japanese environment, a 
+     developer has to keep these complexities in mind to ensure that the
+     proper character encodings are used.
    </para>
    <para>
     <itemizedlist>
@ -495,18 +487,19 @@ ob_start('mb_output_handler');
      </listitem>
      <listitem>
       <simpara>
-        Most of multibyte characters often appear twice as wide as 
-        a single-byte character on display. Those characters are called
-        "zen-kaku" in Japanese which means "full width", and the other
-        (narrower) characters are called "han-kaku" - means half width.
-        However the graphical properties of the characters depend on
-        the glyphs of the type faces used to display them or print them out.
+        Most Japanese multibyte characters appear twice as wide as
+        single-byte characters. These characters are called &quot;
+        zen-kaku&quot; in Japanese, which means &quot;full width&quot;.
+        Other, narrower, characters are called &quot;han-kaku&quot;,
+        which means &quot;half width&quot;. The graphical properties
+        of the characters, however, depends upon the type faces used
+        to display them.
       </simpara>
      </listitem>
      <listitem>
       <simpara>
        Some character encodings use shift(escape) sequences defined
-        in ISO2022 to switch the code map of the specific code area
+        in ISO-2022 to switch the code map of the specific code area
        (<literal>00h</literal> to <literal>7fh</literal>).
       </simpara>
      </listitem>
@ -533,10 +526,10 @@ ob_start('mb_output_handler');
   <section id="mbstring.ref">
    <title>References</title>
    <para>
-     Multibyte character encoding schemes and the related issues are very
-     complicated. There should be too few space to cover in sufficient details.
-     Please refer to the following URLs and other resources for
-     further readings.
+     Multibyte character encoding schemes and their related issues are
+     fairly complicated, and are beyond the scope of this documentation.
+     Please refer to the following URLs and other resources for further
+     information regarding these topics.
     <itemizedlist>
      <listitem>
       <para>