php-doc-en/reference/strings/functions/levenshtein.xml
Torben Wilson af4410a7e1 Normalized the sgml-default-dtd-file local-variable line for those
still using this, after discussion on the phpdoc list.
From now on, manual.ced will need to be found at ~/.phpdoc/manual.ced.



git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@288721 c90b9560-bf6c-de11-be94-00142212c4b1
2009-09-25 07:04:39 +00:00

204 lines
5.6 KiB
XML

<?xml version="1.0" encoding="utf-8"?>
<!-- $Revision$ -->
<refentry xmlns="http://docbook.org/ns/docbook" xml:id="function.levenshtein" xmlns:xlink="http://www.w3.org/1999/xlink">
<refnamediv>
<refname>levenshtein</refname>
<refpurpose>Calculate Levenshtein distance between two strings</refpurpose>
</refnamediv>
<refsect1 role="description">
&reftitle.description;
<methodsynopsis>
<type>int</type><methodname>levenshtein</methodname>
<methodparam><type>string</type><parameter>str1</parameter></methodparam>
<methodparam><type>string</type><parameter>str2</parameter></methodparam>
</methodsynopsis>
<methodsynopsis>
<type>int</type><methodname>levenshtein</methodname>
<methodparam><type>string</type><parameter>str1</parameter></methodparam>
<methodparam><type>string</type><parameter>str2</parameter></methodparam>
<methodparam><type>int</type><parameter>cost_ins</parameter></methodparam>
<methodparam><type>int</type><parameter>cost_rep</parameter></methodparam>
<methodparam><type>int</type><parameter>cost_del</parameter></methodparam>
</methodsynopsis>
<para>
The Levenshtein distance is defined as the minimal number of
characters you have to replace, insert or delete to transform
<parameter>str1</parameter> into <parameter>str2</parameter>.
The complexity of the algorithm is <literal>O(m*n)</literal>,
where <literal>n</literal> and <literal>m</literal> are the
length of <parameter>str1</parameter> and
<parameter>str2</parameter> (rather good when compared to
<function>similar_text</function>, which is O(max(n,m)**3),
but still expensive).
</para>
<para>
In its simplest form the function will take only the two
strings as parameter and will calculate just the number of
insert, replace and delete operations needed to transform
<parameter>str1</parameter> into <parameter>str2</parameter>.
</para>
<para>
A second variant will take three additional parameters that
define the cost of insert, replace and delete operations. This
is more general and adaptive than variant one, but not as
efficient.
</para>
</refsect1>
<refsect1 role="parameters">
&reftitle.parameters;
<para>
<variablelist>
<varlistentry>
<term><parameter>str1</parameter></term>
<listitem>
<para>
One of the strings being evaluated for Levenshtein distance.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>str2</parameter></term>
<listitem>
<para>
One of the strings being evaluated for Levenshtein distance.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>cost_ins</parameter></term>
<listitem>
<para>
Defines the cost of insertion.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>cost_rep</parameter></term>
<listitem>
<para>
Defines the cost of replacement.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>cost_del</parameter></term>
<listitem>
<para>
Defines the cost of deletion.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect1>
<refsect1 role="returnvalues">
&reftitle.returnvalues;
<para>
This function returns the Levenshtein-Distance between the
two argument strings or -1, if one of the argument strings
is longer than the limit of 255 characters.
</para>
</refsect1>
<refsect1 role="examples">
&reftitle.examples;
<para>
<example>
<title><function>levenshtein</function> example</title>
<programlisting role="php">
<![CDATA[
<?php
// input misspelled word
$input = 'carrrot';
// array of words to check against
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');
// no shortest distance found, yet
$shortest = -1;
// loop through words to find the closest
foreach ($words as $word) {
// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);
// check for an exact match
if ($lev == 0) {
// closest word is this one (exact match)
$closest = $word;
$shortest = 0;
// break out of the loop; we've found an exact match
break;
}
// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}
echo "Input word: $input\n";
if ($shortest == 0) {
echo "Exact match found: $closest\n";
} else {
echo "Did you mean: $closest?\n";
}
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
Input word: carrrot
Did you mean: carrot?
]]>
</screen>
</example>
</para>
</refsect1>
<refsect1 role="seealso">
&reftitle.seealso;
<para>
<simplelist>
<member><function>soundex</function></member>
<member><function>similar_text</function></member>
<member><function>metaphone</function></member>
</simplelist>
</para>
</refsect1>
</refentry>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->