mirror of
https://github.com/sigmasternchen/php-doc-en
synced 2025-03-15 16:38:54 +00:00

git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@311951 c90b9560-bf6c-de11-be94-00142212c4b1
112 lines
3.2 KiB
XML
112 lines
3.2 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!-- $Revision$ -->
|
|
|
|
<chapter xml:id="svm.examples" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
|
|
&reftitle.examples;
|
|
|
|
<para>
|
|
The basic process is to define parameters, supply training data to generate a
|
|
model on, then make predictions based on the model. There are a default set
|
|
of parameters that should get some results with most any input, so we'll start
|
|
by looking at the data.
|
|
</para>
|
|
<para>
|
|
Data is supplied in either a file, a stream, or as an array. If supplied in
|
|
a file or a stream, it must contain one line per training example, which must
|
|
be formatted as an integer class (usually 1 and -1) followed by a series of
|
|
feature/value pairs, in increasing feature order. The features are integers,
|
|
the values floats, usually scaled 0-1. For example:
|
|
</para>
|
|
<para>
|
|
-1 1:0.43 3:0.12 9284:0.2
|
|
</para>
|
|
<para>
|
|
In a document classification problem, say a spam checker, each line would
|
|
represent a document. There would be two classes, -1 for spam, 1 for ham.
|
|
Each feature would represent some word, and the value would represent that
|
|
importance of that word to the document (perhaps the frequency count, with
|
|
the total scaled to unit length). Features that were 0 (e.g. the word did
|
|
not appear in the document at all) would simply not be included.
|
|
</para>
|
|
<para>
|
|
In array mode, the data must be passed as an array of arrays. Each sub-array
|
|
must have the class as the first element, then key => value sets for the
|
|
feature values pairs.
|
|
</para>
|
|
<para>
|
|
This data is passed to the SVM class's train function, which will return an
|
|
SVM model is successful.
|
|
</para>
|
|
<para>
|
|
Once a model has been generated, it can be used to make predictions about
|
|
previously unseen data. This can be passed as an array to the model's
|
|
predict function, in the same format as before, but without the label.
|
|
The response will be the class.
|
|
</para>
|
|
<para>
|
|
Models can be saved and restored as required, using the save and load
|
|
functions, which both take a file location.
|
|
</para>
|
|
<para>
|
|
<example>
|
|
<title>Train from array</title>
|
|
<programlisting role="php">
|
|
<![CDATA[
|
|
<?php
|
|
$data = array(
|
|
array(-1, 1 => 0.43, 3 => 0.12, 9284 => 0.2),
|
|
array(1, 1 => 0.22, 5 => 0.01, 94 => 0.11),
|
|
);
|
|
|
|
$svm = new SVM();
|
|
$model = $svm->train($data);
|
|
|
|
$data = array(1 => 0.43, 3 => 0.12, 9284 => 0.2);
|
|
$result = $model->predict($data);
|
|
var_dump($result);
|
|
$model->save('model.svm');
|
|
?>
|
|
]]>
|
|
</programlisting>
|
|
&example.outputs.similar;
|
|
<screen>
|
|
<![CDATA[
|
|
int(-1)
|
|
]]>
|
|
</screen>
|
|
</example>
|
|
<example>
|
|
<title>Train from a file</title>
|
|
<programlisting role="php">
|
|
<![CDATA[
|
|
<?php
|
|
$svm = new SVM();
|
|
$model = $svm->train("traindata.txt");
|
|
?>
|
|
]]>
|
|
</programlisting>
|
|
</example>
|
|
</para>
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: sgml
|
|
sgml-omittag:t
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
indent-tabs-mode:nil
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:nil
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
vim600: syn=xml fen fdm=syntax fdl=2 si
|
|
vim: et tw=78 syn=sgml
|
|
vi: ts=1 sw=1
|
|
-->
|
|
|