php-doc-en/functions/mnogosearch.xml
Hartmut Holzgraefe 7839d91186 added DO NOT EDIT noctice to old english functions files,
removing the others


git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@78562 c90b9560-bf6c-de11-be94-00142212c4b1
2002-04-17 19:58:46 +00:00

1388 lines
48 KiB
XML

<!-- D O N O T E D I T T H I S F I L E ! ! !
it is still here for historical reasons only
(as translators may need to check old revision diffs)
if you want to change things documented in this file
you should now edit the files found under en/reference
instead -->
<?xml version="1.0" encoding="iso-8859-1"?>
<!-- $Revision: 1.37 $ -->
<reference id="ref.mnogo">
<title>mnoGoSearch Functions</title>
<titleabbrev>mnoGoSearch</titleabbrev>
<partintro>
<simpara>
These functions allow you to access mnoGoSearch (former
UdmSearch) free search engine. In order to have these
functions available, you must compile php with mnogosearch
support by using the
<link linkend="install.configure.with-mnogosearch"><option role="configure">
--with-mnogosearch</option></link>
option. If you use this option without specifying the
path to mnogosearch, php will look for mnogosearch under
/usr/local/mnogosearch path by default. If you installed
mnogosearch at other path you should specify it:
<link linkend="install.configure.with-mnogosearch"><option role="configure">
--with-mnogosearch=DIR</option></link>.
</simpara>
<para>
mnoGoSearch is a full-featured search engine software for intranet and internet servers,
distributed under the GNU license. mnoGoSearch has number of unique features, which makes
it appropriate for a wide range of application from search within your site to a specialized
search system such as cooking recipes or newspaper search, ftp archive search, news articles search,
etc. It offers full-text indexing and searching for HTML, PDF, and text documents. mnoGoSearch
consists of two parts. The first is an indexing mechanism (indexer). The purpose of indexer is
to walk through HTTP, FTP, NEWS servers or local files, recursively grabbing all the documents
and storing meta-data about that documents in a SQL database in a smart and effective manner.
After every document is referenced by its corresponding URL, meta-data collected by indexer is
used later in a search process. The search is performed via Web interface. C CGI, PHP and Perl
search front ends are included.
</para>
<note>
<para>
php contains built-in mysql access library, which can be used to
access mysql. It is known that mnoGoSearch is not compatible with
this built-in library and can work only with generic mysql
libraries. Thus, if you use mnoGoSearch with mysql, during php
configuration you have to indicate directory of mysql
installation, that was used during mnoGoSearch configuration, i.e. for example:
<link linkend="install.configure.with-mnogosearch"><option role="configure">
--with-mnogosearch --with-mysql=/usr</option></link>.
</para>
</note>
<simpara>
You need at least 3.1.10 version of mnoGoSearch installed to use
these functions.
</simpara>
<simpara>
More information about mnoGoSearch can be found at
<ulink url="&url.mnogo;">&url.mnogo;</ulink>.
</simpara>
</partintro>
<refentry id="function.udm-add-search-limit">
<refnamediv>
<refname>udm_add_search_limit</refname>
<refpurpose>Add various search limits</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_add_search_limit</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>int</type><parameter>var</parameter></methodparam>
<methodparam><type>string</type><parameter>val</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_add_search_limit</function> returns &true; on success,
&false; on error. Adds search restrictions.
</para>
<para>
<parameter>agent</parameter> - a link to Agent, received after call to
<function>udm_alloc_agent</function>.
</para>
<para>
<parameter>var</parameter> - defines parameter, indicating limit.
</para>
<para>
<parameter>val</parameter> - defines value of the current parameter.
</para>
<para>
Possible <parameter>var</parameter> values:
</para>
<itemizedlist>
<listitem>
<simpara>
UDM_LIMIT_URL - defines document URL limitations to limit search
through subsection of database. It supports SQL % and _ LIKE wildcards,
where % matches any number of characters, even zero characters,
and _ matches exactly one character. E.g. http://my.domain.__/catalog
may stand for http://my.domain.ru/catalog and http://my.domain.ua/catalog.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_LIMIT_TAG - defines site TAG limitations. In indexer-conf you can
assign specific TAGs to various sites and parts of a site. Tags in
mnoGoSearch 3.1.x are lines, that may contain metasymbols % and _.
Metasymbols allow searching among groups of tags.
E.g. there are links with tags ABCD and ABCE, and search restriction
is by ABC_ - the search will be made among both of the tags.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_LIMIT_LANG - defines document language limitations.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_LIMIT_CAT - defines document category limitations. Categories are
similar to tag feature, but nested. So you can have one category inside
another and so on. You have to use two characters for each level. Use a
hex number going from 0-F or a 36 base number going from 0-Z.
Therefore a top-level category like 'Auto' would be 01. If it has a
subcategory like 'Ford', then it would be 01 (the parent category) and then
'Ford' which we will give 01. Put those together and you get 0101. If 'Auto'
had another subcategory named 'VW', then it's id would be 01 because it
belongs to the 'Ford' category and then 02 because it's the next category.
So it's id would be 0102. If VW had a sub category called 'Engine' then it's
id would start at 01 again and it would get the 'VW' id 02 and 'Auto' id of
01, making it 010201. If you want to search for sites under that category
then you pass it cat=010201 in the url.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_LIMIT_DATE - defines limitation by date document was modified.
</simpara>
<simpara>
Format of parameter value: a string with first character &lt; or &gt;,
then with no space - date in unixtime format, for example:
</simpara>
<simpara>
Udm_Add_Search_Limit($udm,UDM_LIMIT_DATE,"&lt;908012006");
</simpara>
<simpara>
If &gt; character is used, then search will be restricted to those
documents having modification date greater than entered. If &lt;, then smaller.
</simpara>
</listitem>
</itemizedlist>
</refsect1>
</refentry>
<refentry id="function.udm-alloc-agent">
<refnamediv>
<refname>udm_alloc_agent</refname>
<refpurpose>Allocate mnoGoSearch session</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_alloc_agent</methodname>
<methodparam><type>string</type><parameter>dbaddr</parameter></methodparam>
<methodparam choice="opt"><type>string</type><parameter>dbmode</parameter></methodparam>
</methodsynopsis>
<para><function>udm_alloc_agent</function> returns mnogosearch agent
identifier on success, &false; on error. This function creates a
session with database parameters.
</para>
<para>
<parameter>dbaddr</parameter> - URL-style database description.
Options (type, host, database name, port, user and password) to connect
to SQL database. Do not matter for built-in text files support. Format:
DBAddr DBType:[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/ Currently
supported DBType values are: mysql, pgsql, msql, solid, mssql, oracle,
ibase. Actually, it does not matter for native libraries support.
But ODBC users should specify one of supported values. If your database
type is not supported, you may use "unknown" instead.
</para>
<para>
<parameter>dbmode</parameter> - You may select SQL database mode of
words storage. When "single" is specified, all words are stored in the same
table. If "multi" is selected, words will be located in different tables
depending of their lengths. "multi" mode is usually faster but requires
more tables in database. If "crc" mode is selected, mnoGoSearch will
store 32 bit integer word IDs calculated by CRC32 algorythm instead of
words. This mode requres less disk space and it is faster comparing with "single"
and "multi" modes. "crc-multi" uses the same storage structure with the
"crc" mode, but also stores words in different tables depending on
words lengths like "multi" mode. Format: DBMode single/multi/crc/crc-multi</para>
<note>
<para>
<parameter>dbaddr</parameter> and <parameter>dbmode</parameter> must match
those used during indexing.
</para>
</note>
<note>
<para>
In fact this function does not open connection to database and
thus does not check entered login and password. Actual connection to
database and login/password verification is done by <function>udm_find</function>.
</para>
</note>
</refsect1>
</refentry>
<refentry id="function.udm-api-version">
<refnamediv>
<refname>udm_api_version</refname>
<refpurpose>Get mnoGoSearch API version.</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_api_version</methodname>
<void/>
</methodsynopsis>
<para><function>udm_api_version</function> returns mnoGoSearch API version number. E.g. if
mnoGoSearch 3.1.10 API is used, this function will return <literal>30110</literal>.</para>
<para>This function allows user to identify which API functions are available, e.g.
<function>udm_get_doc_count</function> function is only available in mnoGoSearch 3.1.11 or later. </para>
<simpara>Example:</simpara>
<informalexample>
<programlisting role="php">
<![CDATA[
if (udm_api_version() >= 30111) {
print "Total number of urls in database: ".udm_get_doc_count($udm)."<br>\n";
}
]]>
</programlisting>
</informalexample>
</refsect1>
</refentry>
<refentry id="function.udm-cat-path">
<refnamediv>
<refname>udm_cat_path</refname>
<refpurpose>Get the path to the current category.</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>array</type><methodname>udm_cat_path</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>string</type><parameter>category</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_cat_path</function> returns array describing path in the
categories tree from the tree root to the current category.
</para>
<para>
<parameter>agent</parameter> - agent link identifier.
</para>
<para>
<parameter>category</parameter> - current category - the one to get path to.
</para>
<para>
Returns array with the following format:
</para>
<para>
The array consists of pairs. Elements with even index numbers contain category
paths, odd elements contain corresponding category names.
</para>
<para>
For example, the call <literal>$array=udm_cat_path($agent, '02031D');</literal>
may return the following array:
</para>
<literallayout>
<![CDATA[
$array[0] will contain ''
$array[1] will contain 'Root'
$array[2] will contain '02'
$array[3] will contain 'Sport'
$array[4] will contain '0203'
$array[5] will contain 'Auto'
$array[4] will contain '02031D'
$array[5] will contain 'Ferrari'
]]>
</literallayout>
<example>
<title>
Specifying path to the current category in the following format:
'&gt; Root &gt; Sport &gt; Auto &gt; Ferrari'
</title>
<programlisting role="php">
<![CDATA[
<?php
$cat_path_arr = udm_cat_path($udm_agent,$cat);
$cat_path = '';
for ($i=0; $i<count($cat_path_arr); $i+=2) {
$path = $cat_path_arr[$i];
$name = $cat_path_arr[$i+1];
$cat_path .= " > <a href=\"$PHP_SELF?cat=$path\">$name</a> ";
}
?>
]]>
</programlisting>
</example>
</refsect1>
</refentry>
<refentry id="function.udm-cat-list">
<refnamediv>
<refname>udm_cat_list</refname>
<refpurpose>Get all the categories on the same level with the current one.</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>array</type><methodname>udm_cat_list</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>string</type><parameter>category</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_cat_list</function> returns array listing all categories of the same
level as current category in the categories tree.
</para>
<para>
The function can be useful for developing categories tree browser.
</para>
<para>
Returns array with the following format:
</para>
<para>
The array consists of pairs. Elements with even index numbers contain category
paths, odd elements contain corresponding category names.
</para>
<literallayout>
<![CDATA[
$array[0] will contain '020300'
$array[1] will contain 'Audi'
$array[2] will contain '020301'
$array[3] will contain 'BMW'
$array[4] will contain '020302'
$array[5] will contain 'Opel'
...
etc.
]]>
</literallayout>
<literallayout>
Following is an example of displaying links of the current level in format:
Audi
BMW
Opel
...
</literallayout>
<informalexample>
<programlisting role="php">
<![CDATA[
<?php
$cat_list_arr = udm_cat_list($udm_agent,$cat);
$cat_list = '';
for ($i=0; $i<count($cat_list_arr); $i+=2) {
$path = $cat_list_arr[$i];
$name = $cat_list_arr[$i+1];
$cat_list .= "<a href=\"$PHP_SELF?cat=$path\">$name</a><br>";
}
?>
]]>
</programlisting>
</informalexample>
</refsect1>
</refentry>
<refentry id="function.udm-clear-search-limits">
<refnamediv>
<refname>udm_clear_search_limits</refname>
<refpurpose>Clear all mnoGoSearch search restrictions</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_clear_search_limits</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_clear_search_limits</function> resets defined search
limitations and returns &true;.
</para>
</refsect1>
</refentry>
<refentry id="function.udm-errno">
<refnamediv>
<refname>udm_errno</refname>
<refpurpose>Get mnoGoSearch error number</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_errno</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_errno</function> returns mnoGoSearch error number, zero if no error.
</para>
<para>
<parameter>agent</parameter> - link to agent identifier, received
after call to <function>udm_alloc_agent</function>.
</para>
<para>
Receiving numeric agent error code.
</para>
</refsect1>
</refentry>
<refentry id="function.udm-error">
<refnamediv>
<refname>udm_error</refname>
<refpurpose>Get mnoGoSearch error message</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>string</type><methodname>udm_error</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_error</function> returns mnoGoSearch error message,
empty string if no error.
</para>
<para>
<parameter>agent</parameter> - link to agent identifier, received
after call to <function>udm_alloc_agent</function>.
</para>
<para>
Receiving agent error message.
</para>
</refsect1>
</refentry>
<refentry id="function.udm-find">
<refnamediv>
<refname>udm_find</refname>
<refpurpose>Perform search</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_find</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>string</type><parameter>query</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_find</function> returns result link identifier on
success, &false; on error.
</para>
<para>
The search itself. The first argument - session, the next one -
query itself. To find something just type words you want to find
and press SUBMIT button. For example, "mysql odbc". You should
not use quotes " in query, they are written here only to divide a
query from other text. mnoGoSearch will find all documents that
contain word "mysql" and/or word "odbc". Best documents having
bigger weights will be displayed first. If you use search mode
ALL, search will return documents that contain both (or more)
words you entered. In case you use mode ANY, the search will
return list of documents that contain any of the words you
entered. If you want more advanced results you may use query
language. You should select "bool" match mode in the search
from.
</para>
<simpara>
mnoGoSearch understands the following boolean operators:
</simpara>
<simpara>
&amp; - logical AND. For example, &quot;mysql &amp;
odbc&quot;. mnoGoSearch will find any URLs that contain both
&quot;mysql&quot; and &quot;odbc&quot;.
</simpara>
<simpara>
| - logical OR. For example &quot;mysql|odbc&quot;. mnoGoSearch
will find any URLs, that contain word &quot;mysql&quot; or word
&quot;odbc&quot;.
</simpara>
<simpara>
~ - logical NOT. For example &quot;mysql &amp; ~odbc&quot;.
mnoGoSearch will find URLs that contain word &quot;mysql&quot;
and do not contain word &quot;odbc&quot; at the same time. Note
that ~ just excludes given word from results. Query
&quot;~odbc&quot; will find nothing!
</simpara>
<simpara>
() - group command to compose more complex queries. For example
&quot;(mysql | msql) &amp; ~postgres&quot;. Query language is
simple and powerful at the same time. Just consider query as
usual boolean expression.
</simpara>
</refsect1>
</refentry>
<refentry id="function.udm-free-agent">
<refnamediv>
<refname>udm_free_agent</refname>
<refpurpose>Free mnoGoSearch session</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_free_agent</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_free_agent</function> returns &true; on success, &false; on error.
</para>
<para>
<parameter>agent</parameter> - link to agent identifier, received `
after call to <function>udm_alloc_agent</function>.
</para>
<para>
Freeing up memory allocated for agent session.
</para>
</refsect1>
</refentry>
<refentry id="function.udm-free-ispell-data">
<refnamediv>
<refname>udm_free_ispell_data</refname>
<refpurpose>Free memory allocated for ispell data</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_free_ispell_data</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_free_ispell_data</function> always returns &true;.
</para>
<para>
<parameter>agent</parameter> - agent link identifier, received after
call to <function>udm_alloc_agent</function>.
</para>
<note>
<para>
This function is supported beginning from version 3.1.12 of
mnoGoSearch and it does not do anything in previous versions.
</para>
</note>
</refsect1>
</refentry>
<refentry id="function.udm-free-res">
<refnamediv>
<refname>udm_free_res</refname>
<refpurpose>Free mnoGoSearch result</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_free_res</methodname>
<methodparam><type>int</type><parameter>res</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_free_res</function> returns &true; on success, &false; on error.
</para>
<para>
<parameter>res</parameter> - a link to result identifier,
received after call to <function>udm_find</function>.
</para>
<para>
Freeing up memory allocated for results.
</para>
</refsect1>
</refentry>
<refentry id="function.udm-get-doc-count">
<refnamediv>
<refname>udm_get_doc_count</refname>
<refpurpose>Get total number of documents in database.</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_get_doc_count</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_get_doc_count</function> returns number of documents in database.
</para>
<para>
<parameter>agent</parameter> - link to agent identifier, received after
call to <function>udm_alloc_agent</function>.
</para>
<note>
<simpara>
This function is supported only in mnoGoSearch 3.1.11 or later.
</simpara>
</note>
</refsect1>
</refentry>
<refentry id="function.udm-get-res-field">
<refnamediv>
<refname>udm_get_res_field</refname>
<refpurpose>Fetch mnoGoSearch result field</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>string</type><methodname>udm_get_res_field</methodname>
<methodparam><type>int</type><parameter>res</parameter></methodparam>
<methodparam><type>int</type><parameter>row</parameter></methodparam>
<methodparam><type>int</type><parameter>field</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_get_res_field</function> returns result field value on
success, &false; on error.
</para>
<para>
<parameter>res</parameter> - a link to result identifier, received
after call to <function>udm_find</function>.
</para>
<para>
<parameter>row</parameter> - the number of the link on the current page.
May have values from 0 to
<parameter>UDM_PARAM_NUM_ROWS-1</parameter>.
</para>
<para>
<parameter>field</parameter> - field identifier, may have the following values:
</para>
<itemizedlist>
<listitem>
<simpara>
UDM_FIELD_URL - document URL field
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_CONTENT - document Content-type field (for example, text/html).
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_CATEGORY - document category field. Use
<function>udm_cat_path</function> to get full path to current category
from the categories tree root. (This parameter is available only in PHP
4.0.6 or later).
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_TITLE - document title field.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_KEYWORDS - document keywords field (from META KEYWORDS tag).
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_DESC - document description field (from META DESCRIPTION tag).
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_TEXT - document body text (the first couple of lines to give an
idea of what the document is about).
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_SIZE - document size.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_URLID - unique URL ID of the link.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_RATING - page rating (as calculated by mnoGoSearch).
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_MODIFIED - last-modified field in unixtime format.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_ORDER - the number of the current document in set of found documents.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_FIELD_CRC - document CRC.
</simpara>
</listitem>
</itemizedlist>
</refsect1>
</refentry>
<refentry id="function.udm-get-res-param">
<refnamediv>
<refname>udm_get_res_param</refname>
<refpurpose>Get mnoGoSearch result parameters</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>string</type><methodname>udm_get_res_param</methodname>
<methodparam><type>int</type><parameter>res</parameter></methodparam>
<methodparam><type>int</type><parameter>param</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_get_res_param</function> returns result parameter value on
success, &false; on error.
</para>
<para>
<parameter>res</parameter> - a link to result identifier, received after
call to <function>udm_find</function>.
</para>
<para>
<parameter>param</parameter> - parameter identifier, may have the following values:
</para>
<itemizedlist>
<listitem>
<simpara>
UDM_PARAM_NUM_ROWS - number of received found links on the current page. It is equal to
UDM_PARAM_PAGE_SIZE for all search pages, on the last page - the rest of links.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_FOUND - total number of results matching the query.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_WORDINFO - information on the words found. E.g. search for
"a good book" will return "a: stopword, good:5637, book: 120"
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_SEARCHTIME - search time in seconds.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_FIRST_DOC - the number of the first document displayed on current page.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_LAST_DOC - the number of the last document displayed on current page.
</simpara>
</listitem>
</itemizedlist>
</refsect1>
</refentry>
<refentry id="function.udm-load-ispell-data">
<refnamediv>
<refname>udm_load_ispell_data</refname>
<refpurpose>Load ispell data</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_load_ispell_data</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>int</type><parameter>var</parameter></methodparam>
<methodparam><type>string</type><parameter>val1</parameter></methodparam>
<methodparam><type>string</type><parameter>val2</parameter></methodparam>
<methodparam><type>int</type><parameter>flag</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_load_ispell_data</function> loads ispell data. Returns &true;
on success, &false; on error.
</para>
<para>
<parameter>agent</parameter> - agent link identifier, received after call
to <function>udm_alloc_agent</function>.
</para>
<para>
<parameter>var</parameter> - parameter, indicating the source for ispell
data. May have the following values:
</para>
<para>
After using this function to free memory allocated for ispell data, please
use <function>udm_free_ispell_data</function>, even if you use UDM_ISPELL_TYPE_SERVER mode.
</para>
<para>
The fastest mode is UDM_ISPELL_TYPE_SERVER. UDM_ISPELL_TYPE_TEXT is slower
and UDM_ISPELL_TYPE_DB is the slowest. The above pattern is &true; for
mnoGoSearch 3.1.10 - 3.1.11. It is planned to speed up DB mode in future
versions and it is going to be faster than TEXT mode.
</para>
<itemizedlist>
<listitem>
<simpara>
UDM_ISPELL_TYPE_DB - indicates that ispell data should be loaded from SQL.
In this case, parameters <parameter>val1</parameter> and <parameter>val2</parameter>
are ignored and should be left blank. <parameter>flag</parameter>
should be equal to <literal>1</literal>.
</simpara>
<note>
<para>
<parameter>flag</parameter> indicates that after loading ispell data
from defined source it sould be sorted (it is necessary for correct
functioning of ispell). In case of loading ispell data from files
there may be several calls to <function>udm_load_ispell_data</function>,
and there is no sense to sort data after every call, but only after
the last one. Since in db mode all the data is loaded by one call,
this parameter should have the value <literal>1</literal>. In this mode
in case of error, e.g. if ispell tables are absent, the function will
return &false; and code and error message will be accessible through
<function>udm_error</function> and <function>udm_errno</function>.
</para>
</note>
<simpara>Example:</simpara>
<informalexample>
<programlisting role="php">
<![CDATA[
if (! udm_load_ispell_data($udm,UDM_ISPELL_TYPE_DB,'','',1)) {
printf("Error #%d: '%s'\n", udm_errno($udm), udm_error($udm));
exit;
}
]]>
</programlisting>
</informalexample>
</listitem>
<listitem>
<para>
UDM_ISPELL_TYPE_AFFIX - indicates that ispell data should be loaded from
file and initiates loading affixes file. In this case <parameter>val1</parameter>
defines double letter language code for which affixes are loaded,
and <parameter>val2</parameter> - file path. Please note, that if
a relative path entered, the module looks for the file not in UDM_CONF_DIR,
but in relation to current path, i.e. to the path where the script is executed.
In case of error in this mode, e.g. if file is absent, the function will return
&false;, and an error message will be displayed. Error message text cannot be
accessed through <function>udm_error</function> and <function>udm_errno</function>,
since those functions can only return messages associated with SQL. Please,
see <parameter>flag</parameter> parameter description in UDM_ISPELL_TYPE_DB.
</para>
<simpara>Example:</simpara>
<informalexample>
<programlisting role="php">
<![CDATA[
if ((! udm_load_ispell_data($udm,UDM_ISPELL_TYPE_AFFIX,'en','/opt/ispell/en.aff',0)) ||
(! udm_load_ispell_data($udm,UDM_ISPELL_TYPE_AFFIX,'ru','/opt/ispell/ru.aff',0)) ||
(! udm_load_ispell_data($udm,UDM_ISPELL_TYPE_SPELL,'en','/opt/ispell/en.dict',0)) ||
(! udm_load_ispell_data($udm,UDM_ISPELL_TYPE_SPELL,'ru','/opt/ispell/ru.dict',1))) {
exit;
}
]]>
</programlisting>
</informalexample>
<note>
<para>
<parameter>flag</parameter> is equal to <literal>1</literal> only in the last call.
</para>
</note>
</listitem>
<listitem>
<para>
UDM_ISPELL_TYPE_SPELL - indicates that ispell data should be loaded from
file and initiates loading of ispell dictionary file. In this case
<parameter>val1</parameter> defines double letter language code for which
affixes are loaded,
and <parameter>val2</parameter> - file path. Please note, that if a relative
path entered, the module looks for the file not in UDM_CONF_DIR, but in
relation to current path, i.e. to the path where the script is executed.
In case of error in this mode, e.g. if file is absent, the function will
return &false;, and an error message will be displayed. Error message text
cannot be accessed through <function>udm_error</function> and
<function>udm_errno</function>, since those functions can only return messages
associated with SQL. Please, see <parameter>flag</parameter> parameter
description in UDM_ISPELL_TYPE_DB.
</para>
<informalexample>
<programlisting role="php">
<![CDATA[
if ((! Udm_Load_Ispell_Data($udm,UDM_ISPELL_TYPE_AFFIX,'en','/opt/ispell/en.aff',0)) ||
(! Udm_Load_Ispell_Data($udm,UDM_ISPELL_TYPE_AFFIX,'ru','/opt/ispell/ru.aff',0)) ||
(! Udm_Load_Ispell_Data($udm,UDM_ISPELL_TYPE_SPELL,'en','/opt/ispell/en.dict',0)) ||
(! Udm_Load_Ispell_Data($udm,UDM_ISPELL_TYPE_SPELL,'ru','/opt/ispell/ru.dict',1))) {
exit;
}
]]>
</programlisting>
</informalexample>
<note>
<para>
<parameter>flag</parameter> is equal to <literal>1</literal> only in the last call.
</para>
</note>
</listitem>
<listitem>
<para>
UDM_ISPELL_TYPE_SERVER - enables spell server support.
<parameter>val1</parameter> parameter indicates
address of the host running spell server. <parameter>val2</parameter> `
is not used yet, but in future releases it is going to indicate number
of port used by spell server. <parameter>flag</parameter> parameter in
this case is not needed since ispell data is stored
on spellserver already sorted.
</para>
<para>
Spelld server reads spell-data from a separate configuration file
(/usr/local/mnogosearch/etc/spelld.conf by default), sorts it and stores in
memory. With clients server communicates in two ways: to indexer all the
data is transferred (so that indexer starts faster), from search.cgi server
receives word to normalize and then passes over to client (search.cgi) list
of normalized word forms. This allows fastest, compared to db and text modes
processing of search queries (by omitting loading and sorting all the spell data).
</para>
<para>
<function>udm_load_ispell_data</function> function in UDM_ISPELL_TYPE_SERVER
mode does not actually load ispell data, but only defines server address.
In fact, server is automatically used by <function>udm_find</function>
function when performing search. In case of errors, e.g. if spellserver
is not running or invalid host indicated, there are no messages returned
and ispell conversion does not work.
</para>
<note>
<para>
This function is available in mnoGoSearch 3.1.12 or later.
</para>
</note>
<simpara>Example:</simpara>
<informalexample>
<programlisting role="php">
<![CDATA[
if (!udm_load_ispell_data($udm,UDM_ISPELL_TYPE_SERVER,'','',1)) {
printf("Error loading ispell data from server<br>\n");
exit;
}
]]>
</programlisting>
</informalexample>
</listitem>
</itemizedlist>
</refsect1>
</refentry>
<refentry id="function.udm-set-agent-param">
<refnamediv>
<refname>udm_set_agent_param</refname>
<refpurpose>Set mnoGoSearch agent session parameters</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_set_agent_param</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>int</type><parameter>var</parameter></methodparam>
<methodparam><type>string</type><parameter>val</parameter></methodparam>
</methodsynopsis>
<para>
<function>udm_set_agent_param</function> returns &true; on success,
&false; on error. Defines mnoGoSearch session parameters.
</para>
<simpara>
The following parameters and their values are available:
</simpara>
<itemizedlist>
<listitem>
<simpara>
UDM_PARAM_PAGE_NUM - used to choose search results page number (results
are returned by pages beginning from 0, with UDM_PARAM_PAGE_SIZE results per page).
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_PAGE_SIZE - number of search results displayed on one page.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_SEARCH_MODE - search mode. The following values available: UDM_MODE_ALL -
search for all words; UDM_MODE_ANY - search for any word; UDM_MODE_PHRASE -
phrase search; UDM_MODE_BOOL - boolean search. See <function>udm_find</function>
for details on boolean search.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_CACHE_MODE - turns on or off search result cache mode.
When enabled, the search engine will store
search results to disk. In case a similar search is performed later,
the engine will take results from the cache for faster performance.
Available values: UDM_CACHE_ENABLED, UDM_CACHE_DISABLED.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_TRACK_MODE - turns on or off trackquery mode. Since
version 3.1.2 mnoGoSearch has a query tracking support.
Note that tracking is implemented in SQL version only and not available
in built-in database.
To use tracking, you have to create tables for tracking support.
For MySQL, use create/mysql/track.txt.
When doing a search, front-end uses those tables to store query words,
a number of found documents and current UNIX timestamp in seconds.
Available values: UDM_TRACK_ENABLED, UDM_TRACK_DISABLED.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_PHRASE_MODE - defines whether index database using phrases
("phrase" parameter in indexer.conf).
Possible values: UDM_PHRASE_ENABLED and UDM_PHRASE_DISABLED.
Please note, that if phrase search is enabled (UDM_PHRASE_ENABLED),
it is still possible to do search in any mode (ANY, ALL, BOOL or PHRASE).
In 3.1.10 version of mnoGoSearch phrase search is supported only in sql
and built-in database modes,
while beginning with 3.1.11 phrases are supported in cachemode as well.
</simpara>
<simpara>
Examples of phrase search:
</simpara>
<simpara>
"Arizona desert" - This query returns all indexed documents that contain
"Arizona desert" as a phrase. Notice that you need to put double quotes
around the phrase
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_CHARSET - defines local charset. Available values: set of
charsets supported by mnoGoSearch, e.g. koi8-r, cp1251, ...
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_STOPFILE - Defines name and path
to stopwords file. (There is a small difference with mnoGoSearch
- while in mnoGoSearch if relative path or no path entered, it
looks for this file in relation to UDM_CONF_DIR, the module looks for
the file in relation to current path, i.e. to the path where the
php script is executed.)
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_STOPTABLE - Load stop words from the given SQL table. You may use
several StopwordTable commands.
This command has no effect when compiled without SQL database support.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_WEIGHT_FACTOR - represents weight factors for specific document parts.
Currently body, title, keywords, description, url are supported.
To activate this feature please use degrees of 2 in *Weight commands of
the indexer.conf. Let's imagine that we have these weights:
</simpara>
<literallayout>
URLWeight 1
BodyWeight 2
TitleWeight 4
KeywordWeight 8
DescWeight 16
</literallayout>
<simpara>
As far as indexer uses bit OR operation for word weights when some
word presents several time in the same document, it is possible at search
time to detect word appearance in different document parts. Word which
appears only in the body will have 00000010 argegate weight (in binary notation).
Word used in all document parts will have 00011111 aggregate weight.
</simpara>
<simpara>
This parameter's value is a string of hex digits ABCDE. Each digit is a
factor for corresponding bit in word weight. For the given above weights
configuration:
</simpara>
<literallayout>
E is a factor for weight 1 (URL Weight bit)
D is a factor for weight 2 (BodyWeight bit)
C is a factor for weight 4 (TitleWeight bit)
B is a factor for weight 8 (KeywordWeight bit)
A is a factor for weight 16 (DescWeight bit)
</literallayout>
<simpara>
Examples:
</simpara>
<simpara>
UDM_PARAM_WEIGHT_FACTOR=00001 will search through URLs only.
</simpara>
<simpara>
UDM_PARAM_WEIGHT_FACTOR=00100 will search through Titles only.
</simpara>
<simpara>
UDM_PARAM_WEIGHT_FACTOR=11100 will search through Title,Keywords,Description
but not through URL and Body.
</simpara>
<simpara>
UDM_PARAM_WEIGHT_FACTOR=F9421 will search through:
</simpara>
<literallayout>
Description with factor 15 (F hex)
Keywords with factor 9
Title with factor 4
Body with factor 2
URL with factor 1
</literallayout>
<simpara>
If UDM_PARAM_WEIGHT_FACTOR variable is ommited, original weight value is
taken to sort results. For a given above weight configuration it means
that document description has a most big weight 16.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_WORD_MATCH - word match. You may use this parameter to choose
word match type. This feature works only in "single" and "multi" modes
using SQL based and built-in database. It does not work in cachemode and other modes
since they use word CRC and do not support substring search. Available values:
</simpara>
<simpara>UDM_MATCH_BEGIN - word beginning match;</simpara>
<simpara>UDM_MATCH_END - word ending match;</simpara>
<simpara>UDM_MATCH_WORD - whole word match;</simpara>
<simpara>UDM_MATCH_SUBSTR - word substring match.</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_MIN_WORD_LEN - defines minimal word length.
Any word shorter this limit is considered to be a stopword. Please note
that this parameter value is inclusive, i.e. if UDM_PARAM_MIN_WORD_LEN=3,
a word 3 characters long will not be considered a stopword, while
a word 2 characters long will be. Default value is 1.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_ISPELL_PREFIXES - Possible values: UDM_PREFIXES_ENABLED and
UDM_PREFIXES_DISABLED, that respectively enable or disable using prefixes.
E.g. if a word "tested" is in search query, also words like "test",
"testing", etc. Only suffixes are supported by default. Prefixes usually
change word meanings, for example if somebody is searching for the word "tested"
one hardly wants "untested" to be found. Prefixes support may also be
found useful for site's spelling checking purposes. In order to enable
ispell, you have to load ispell data with <function>udm_load_ispell_data</function>.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_CROSS_WORDS - enables or disables crosswords support.
Possible values: UDM_CROSS_WORDS_ENABLED and UDM_CROSS_WORDS_DISABLED.
</simpara>
<simpara>
The corsswords feature allows to assign words between &lt;a href="xxx"&gt; and &lt;/a&gt;
also to a document this link leads to. It works in SQL database mode and
is not supported in built-in database and Cachemode.
</simpara>
<note>
<simpara>
Crosswords are supported only in mnoGoSearch 3.1.11 or later.
</simpara>
</note>
</listitem>
<listitem>
<simpara>
UDM_PARAM_VARDIR - specifies a custom path to directory where indexer
stores data when using built-in database and in cache mode.
By default <literal>/var</literal> directory of
mnoGoSearch installation is used. Can have
only string values. The parameter is available in PHP 4.1.0 or later.
</simpara>
</listitem>
<listitem>
<simpara>
UDM_PARAM_VARDIR - specifies a custom path to directory where indexer
stores data when using built-in database and in cache mode.
By default <literal>/var</literal> directory of
mnoGoSearch installation is used. Can have
only string values. The parameter is available in PHP 4.1.0 or later.
</simpara>
</listitem>
</itemizedlist>
</refsect1>
</refentry>
<refentry id='function.udm-check-charset'>
<refnamediv>
<refname>udm_check_charset</refname>
<refpurpose>
Check if the given charset is known to mnogosearch
</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_check_charset</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>string</type><parameter>charset</parameter></methodparam>
</methodsynopsis>
<para>
&warn.undocumented.func;
</para>
</refsect1>
</refentry>
<refentry id='function.udm-check-stored'>
<refnamediv>
<refname>udm_check_stored</refname>
<refpurpose>
Check connection to stored
</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_check_stored</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>int</type><parameter>link</parameter></methodparam>
<methodparam><type>string</type><parameter>doc_id</parameter></methodparam>
</methodsynopsis>
<para>
&warn.undocumented.func;
</para>
</refsect1>
</refentry>
<refentry id='function.udm-close-stored'>
<refnamediv>
<refname>udm_close_stored</refname>
<refpurpose>
Close connection to stored
</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_close_stored</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>int</type><parameter>link</parameter></methodparam>
</methodsynopsis>
<para>
&warn.undocumented.func;
</para>
</refsect1>
</refentry>
<refentry id='function.udm-crc32'>
<refnamediv>
<refname>udm_crc32</refname>
<refpurpose>
Return CRC32 checksum of gived string
</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_crc32</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>string</type><parameter>str</parameter></methodparam>
</methodsynopsis>
<para>
&warn.undocumented.func;
</para>
</refsect1>
</refentry>
<refentry id='function.udm-open-stored'>
<refnamediv>
<refname>udm_open_stored</refname>
<refpurpose>
Open connection to stored
</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<methodsynopsis>
<type>int</type><methodname>udm_open_stored</methodname>
<methodparam><type>int</type><parameter>agent</parameter></methodparam>
<methodparam><type>string</type><parameter>storedaddr</parameter></methodparam>
</methodsynopsis>
<para>
&warn.undocumented.func;
</para>
</refsect1>
</refentry>
</reference>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"../../manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->