From 75afe0859318f702995092b30b73a0847b8de85c Mon Sep 17 00:00:00 2001 From: Dmitry Tkatchenko Date: Fri, 2 Mar 2001 10:02:46 +0000 Subject: [PATCH] Added descriptions of functions: udm_get_doc_count, udm_api_version, Crosswords support, cosmetic changes git-svn-id: https://svn.php.net/repository/phpdoc/en/trunk@42492 c90b9560-bf6c-de11-be94-00142212c4b1 --- functions/mnogosearch.xml | 880 +++++++++++++++++++++----------------- 1 file changed, 479 insertions(+), 401 deletions(-) diff --git a/functions/mnogosearch.xml b/functions/mnogosearch.xml index fb207133f6..8b136d96b3 100644 --- a/functions/mnogosearch.xml +++ b/functions/mnogosearch.xml @@ -1,4 +1,4 @@ - + mnoGoSearch Functions mnoGoSearch @@ -46,239 +46,7 @@ - - - udm_alloc_agent - Allocate mnoGoSearch session - - - Description - - - int udm_alloc_agent - string dbaddr - string - - dbmode - - - - - - udm_alloc_agent returns mnogosearch agent - identifier on success, FALSE on error. This function creates a - session with database parameters. - - - dbaddr - URL-style database description. Options (type, host, database name, port, user and password) to connect to SQL database. - Do not matter for built-in text files support. Format: DBAddr DBType:[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/ - Currently supported DBType values are: mysql, pgsql, msql, solid, mssql, oracle, ibase. Actually, it does not matter for native libraries support. - But ODBC users should specify one of supported values. If your database type is not supported, you may use "unknown" instead. - - - dbmode - You may select SQL database mode of words storage. When "single" is specified, all words are stored in the same - table. If "multi" is selected, words will be located in different tables depending of their lengths. "multi" mode is usually faster - but requires more tables in database. If "crc" mode is selected, mnoGoSearch will store 32 bit integer - word IDs calculated by CRC32 algorythm instead of words. This mode requres less disk space and it is faster comparing with "single" - and "multi" modes. "crc-multi" uses the same storage structure with the "crc" mode, but also stores words in different tables depending on - words lengths like "multi" mode. Format: DBMode single/multi/crc/crc-multi - - - - dbaddr and dbmode must match those used during indexing. - - - - - In fact this function does not open connection to database and - thus does not check entered login and password. Actual connection to database and login/password verification is done by udm_find. - - - - - - - - udm_set_agent_param - Set mnoGoSearch agent session parameters - - - Description - - - int udm_set_agent_param - int agent - int var - string val - - - - udm_set_agent_param returns TRUE on success, - FALSE on error. Defines mnoGoSearch session parameters. - - - The following parameters and their values are available: - - - - - UDM_PARAM_PAGE_NUM - used to choose search results page number (results are returned by pages beginning from - 0, with UDM_PARAM_PAGE_SIZE results per page). - - - - - UDM_PARAM_PAGE_SIZE - number of search results displayed on one page. - - - - - UDM_PARAM_SEARCH_MODE - search mode. The following values available: UDM_MODE_ALL - - search for all words; UDM_MODE_ANY - search for any word; UDM_MODE_PHRASE - phrase search; UDM_MODE_BOOL - boolean search. See udm_find for details on boolean search. - - - - - UDM_PARAM_CACHE_MODE - turns on or off search result cache mode. When enabled, the search engine will store - search results to disk. In case a similar search is performed later, the engine will take results from the cache for faster performance. - Available values: UDM_CACHE_ENABLED, UDM_CACHE_DISABLED. - - - - - UDM_PARAM_TRACK_MODE - turns on or off trackquery mode. Since version 3.1.2 mnoGoSearch has a query tracking support. - Note that tracking is implemented in SQL version only and not available in built-in database. - To use tracking, you have to create tables for tracking support. For MySQL, use create/mysql/track.txt. - When doing a search, front-end uses those tables to store query words, a number of found documents and current UNIX timestamp in seconds. - Available values: UDM_TRACK_ENABLED, UDM_TRACK_DISABLED. - - - - - UDM_PARAM_PHRASE_MODE - defines whether index database using phrases ("phrase" parameter in indexer.conf). - Possible values: UDM_PHRASE_ENABLED and UDM_PHRASE_DISABLED. - Please note, that if phrase search is enabled (UDM_PHRASE_ENABLED), - it is still possible to do search in any mode (ANY, ALL, BOOL or PHRASE). - In 3.1.10 version of mnoGoSearch phrase search is supported only in sql and buuilt-in database modes, - while beginning with 3.1.11 phrases are supported in cachemode as well. - - - Examples of phrase search: - - - "Arizona desert" - This query returns all indexed documents that contain "Arizona desert" as a phrase. Notice that you need to put double quotes around the phrase - - - - - UDM_PARAM_CHARSET - defines local charset. Available values: set of charsets supported by mnoGoSearch, - e.g. koi8-r, cp1251, ... - - - - - UDM_PARAM_STOPFILE - Defines name and path - to stopwords file. (There is a small difference with mnoGoSearch - - while in mnoGoSearch if relative path or no path entered, it - looks for this file in relation to UDM_CONF_DIR, the module looks for - the file in relation to current path, i.e. to the path where the - php script is executed.) - - - - - UDM_PARAM_STOPTABLE - Load stop words from the given SQL table. You may use several StopwordTable commands. - This command has no effect when compiled without SQL database support. - - - - - - UDM_PARAM_WEIGHT_FACTOR - represents weight factors for specific document parts. Currently body, title, keywords, description, url are supported. - To activate this feature please use degrees of 2 in *Weight commands of - the indexer.conf. Let's imagine that we have these weights: - - URLWeight 1 - BodyWeight 2 - TitleWeight 4 - KeywordWeight 8 - DescWeight 16 - - As far as indexer uses bit OR operation for word weights when some - word presents several time in the same document, it is possible at search - time to detect word appearance in different document parts. Word which - appears only in the body will have 00000010 argegate weight (in binary notation). - Word used in all document parts will have 00011111 aggregate weight. - - - This parameter's value is a string of hex digits ABCDE. Each digit is a factor for corresponding bit in word weight. For the given above weights - configuration: - - E is a factor for weight 1 (URL Weight bit) - D is a factor for weight 2 (BodyWeight bit) - C is a factor for weight 4 (TitleWeight bit) - B is a factor for weight 8 (KeywordWeight bit) - A is a factor for weight 16 (DescWeight bit) - - Examples: - - - UDM_PARAM_WEIGHT_FACTOR=00001 will search through URLs only. - - - UDM_PARAM_WEIGHT_FACTOR=00100 will search through Titles only. - - - UDM_PARAM_WEIGHT_FACTOR=11100 will search through Title,Keywords,Desctription but not through URL and Body. - - - UDM_PARAM_WEIGHT_FACTOR=F9421 will search through: - - Description with factor 15 (F hex) - Keywords with factor 9 - Title with factor 4 - Body with factor 2 - URL with factor 1 - - If UDM_PARAM_WEIGHT_FACTOR variable is ommited, original weight value is - taken to sort results. For a given above weight configuration it means - that document description has a most big weight 16. - - - - - UDM_PARAM_WORD_MATCH - word match. You may use this parameter to choose word match type. This feature works only - in "single" and "multi" modes using SQL based and built-in database. It does not work in cachemode and other modes - since they use word CRC and do not support substring search. - Available values: - - UDM_MATCH_BEGIN - word beginning match; - UDM_MATCH_END - word ending match; - UDM_MATCH_WORD - whole word match; - UDM_MATCH_SUBSTR - word substring match. - - - - UDM_PARAM_MIN_WORD_LEN - defines minimal word length. - Any word shorter this limit is considered to be a stopword. Please note that this paraneter value is inclusive, - i.e. if UDM_PARAM_MIN_WORD_LEN=3, a word 3 characters long will not be considered a stopword, while - a word 2 characters long will be. Default value is 1. - - - - - UDM_PARAM_ISPELL_PREFIXES - Possible values: UDM_PREFIXES_ENABLED and UDM_PREFIXES_DISABLED, - that respectively enable or disable using prefixes. E.g. if a word "tested" is in search query, also words like "test", "testing", etc. - Only suffixes are supported by default. Prefixes usually change word meanings, for example if somebody is searching for the word "tested" - one hardly wants "untested" to be found. Prefixes support may also be found useful for site's - spelling checking purposes. In order to enable ispell, you have to load ispell data with udm_load_ispell_data. - - - - - - - + udm_add_search_limit Add various search limits @@ -364,8 +132,92 @@ - - + + + + udm_alloc_agent + Allocate mnoGoSearch session + + + Description + + + int udm_alloc_agent + string dbaddr + string + + dbmode + + + + + + udm_alloc_agent returns mnogosearch agent + identifier on success, FALSE on error. This function creates a + session with database parameters. + + + dbaddr - URL-style database description. Options (type, host, database name, port, user and password) to connect to SQL database. + Do not matter for built-in text files support. Format: DBAddr DBType:[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/ + Currently supported DBType values are: mysql, pgsql, msql, solid, mssql, oracle, ibase. Actually, it does not matter for native libraries support. + But ODBC users should specify one of supported values. If your database type is not supported, you may use "unknown" instead. + + + dbmode - You may select SQL database mode of words storage. When "single" is specified, all words are stored in the same + table. If "multi" is selected, words will be located in different tables depending of their lengths. "multi" mode is usually faster + but requires more tables in database. If "crc" mode is selected, mnoGoSearch will store 32 bit integer + word IDs calculated by CRC32 algorythm instead of words. This mode requres less disk space and it is faster comparing with "single" + and "multi" modes. "crc-multi" uses the same storage structure with the "crc" mode, but also stores words in different tables depending on + words lengths like "multi" mode. Format: DBMode single/multi/crc/crc-multi + + + + dbaddr and dbmode must match those used during indexing. + + + + + In fact this function does not open connection to database and + thus does not check entered login and password. Actual connection to database and login/password verification is done by udm_find. + + + + + + + + udm_api_version + Get mnoGoSearch API version. + + + Description + + + int udm_api_version + + + + + udm_api_version returns mnoGoSearch API version number. E.g. if + mnoGoSearch 3.1.10 API is used, this function will return 30110. + + + This function allows user to identify which API functions are available, e.g. + udm_get_doc_count function is only available in mnoGoSearch 3.1.11 or later. + + Example: + + + if (Udm_Api_Version() >= 30111) { + print "Total number of urls in database: ".Udm_Get_Doc_Count($udm)."<br>\n"; + } + + + + + + + udm_clear_search_limits Clear all mnoGoSearch search restrictions @@ -384,6 +236,60 @@ + + + udm_errno + Get mnoGoSearch error number + + + Description + + + int udm_errno + int agent + + + + udm_errno returns mnoGoSearch error number, + zero if no error. + + + agent - link to agent identifier, received + after call to udm_alloc_agent. + + + Receiving numeric agent error code. + + + + + + + udm_error + Get mnoGoSearch error message + + + Description + + + string udm_error + int agent + + + + udm_error returns mnoGoSearch error message, + empty string if no error. + + + agent - link to agent identifier, received + after call to udm_alloc_agent. + + + Receiving agent error message. + + + + udm_find @@ -446,68 +352,113 @@ - + - udm_get_res_param - Get mnoGoSearch result parameters + udm_free_agent + Free mnoGoSearch session Description - string udm_get_res_param - int res - int param + int udm_free_agent + int agent - udm_get_res_param returns result parameter - value on success, FALSE on error. + udm_free_agent returns TRUE on success, FALSE on error. + + + agent - link to agent identifier, received + after call to udm_alloc_agent. + + + Freeing up memory allocated for agent session. + + + + + + + udm_free_ispell_data + Free memory allocated for ispell data + + + Description + + + int udm_free_ispell_data + int agent + + + + udm_free_ispell_data always returns TRUE. + + + agent - agent link identifier, received after call to udm_alloc_agent. + + + + In mnoGoSearch 3.1.10 this function is not yet implemented, it is added for compatibility with future versions and does not perform anything yet. + + + + + + + + udm_free_res + Free mnoGoSearch result + + + Description + + + int udm_free_res + int res + + + + udm_free_res returns TRUE on success, FALSE on error. res - a link to result identifier, received after call to udm_find. - param - parameter identifier, may have the - following values: + Freeing up memory allocated for results. - - - - UDM_PARAM_NUM_ROWS - number of received found links on the current page. It is equal to - UDM_PARAM_PAGE_SIZE for all search pages, on the last page - the rest of links. - - - - - UDM_PARAM_FOUND - total number of results matching the query. - - - - - UDM_PARAM_WORDINFO - information on the words found. E.g. search for "a good book" will return "a: stopword, good:5637, book: 120" - - - - - UDM_PARAM_SEARCHTIME - search time in seconds. - - - - - UDM_PARAM_FIRST_DOC - the number of the first document displayed on current page. - - - - - UDM_PARAM_LAST_DOC - the number of the last document displayed on current page. - - - + + + udm_get_doc_count + Get total number of documents in database. + + + Description + + + int udm_get_doc_count + int agent + + + + udm_get_doc_count returns nuimber of documents in database. + + + agent - link to agent identifier, received + after call to udm_alloc_agent. + + + + This function is supported only in mnoGoSearch 3.1.11 or later. + + + + + udm_get_res_field @@ -601,8 +552,71 @@ + + + + + udm_get_res_param + Get mnoGoSearch result parameters + + + Description + + + string udm_get_res_param + int res + int param + + + + udm_get_res_param returns result parameter + value on success, FALSE on error. + + + res - a link to result identifier, + received after call to udm_find. + + + param - parameter identifier, may have the + following values: + + + + + UDM_PARAM_NUM_ROWS - number of received found links on the current page. It is equal to + UDM_PARAM_PAGE_SIZE for all search pages, on the last page - the rest of links. + + + + + UDM_PARAM_FOUND - total number of results matching the query. + + + + + UDM_PARAM_WORDINFO - information on the words found. E.g. search for "a good book" will return "a: stopword, good:5637, book: 120" + + + + + UDM_PARAM_SEARCHTIME - search time in seconds. + + + + + UDM_PARAM_FIRST_DOC - the number of the first document displayed on current page. + + + + + UDM_PARAM_LAST_DOC - the number of the last document displayed on current page. + + + + - + + udm_load_ispell_data @@ -715,140 +729,204 @@ - - - - udm_free_ispell_data - Free memory allocated for ispell data - - - Description - - - int udm_free_ispell_data - int agent - - - - udm_free_ispell_data always returns TRUE. - - - agent - agent link identifier, received after call to udm_alloc_agent. - - - - In mnoGoSearch 3.1.10 this function is not yet implemented, it is added for compatibility with future versions and does not perform anything yet. - - - - - - - - udm_free_res - Free mnoGoSearch result - - - Description - - - int udm_free_res - int res - - - - udm_free_res returns TRUE on success, FALSE on error. - - - res - a link to result identifier, - received after call to udm_find. - - - Freeing up memory allocated for results. - - - - + - udm_free_agent - Free mnoGoSearch session + udm_set_agent_param + Set mnoGoSearch agent session parameters Description - int udm_free_agent + int udm_set_agent_param int agent + int var + string val - udm_free_agent returns TRUE on success, FALSE on error. + udm_set_agent_param returns TRUE on success, + FALSE on error. Defines mnoGoSearch session parameters. - - agent - link to agent identifier, received - after call to udm_alloc_agent. - - - Freeing up memory allocated for agent session. - - + + The following parameters and their values are available: + + + + + UDM_PARAM_PAGE_NUM - used to choose search results page number (results are returned by pages beginning from + 0, with UDM_PARAM_PAGE_SIZE results per page). + + + + + UDM_PARAM_PAGE_SIZE - number of search results displayed on one page. + + + + + UDM_PARAM_SEARCH_MODE - search mode. The following values available: UDM_MODE_ALL - + search for all words; UDM_MODE_ANY - search for any word; UDM_MODE_PHRASE - phrase search; UDM_MODE_BOOL - boolean search. See udm_find for details on boolean search. + + + + + UDM_PARAM_CACHE_MODE - turns on or off search result cache mode. When enabled, the search engine will store + search results to disk. In case a similar search is performed later, the engine will take results from the cache for faster performance. + Available values: UDM_CACHE_ENABLED, UDM_CACHE_DISABLED. + + + + + UDM_PARAM_TRACK_MODE - turns on or off trackquery mode. Since version 3.1.2 mnoGoSearch has a query tracking support. + Note that tracking is implemented in SQL version only and not available in built-in database. + To use tracking, you have to create tables for tracking support. For MySQL, use create/mysql/track.txt. + When doing a search, front-end uses those tables to store query words, a number of found documents and current UNIX timestamp in seconds. + Available values: UDM_TRACK_ENABLED, UDM_TRACK_DISABLED. + + + + + UDM_PARAM_PHRASE_MODE - defines whether index database using phrases ("phrase" parameter in indexer.conf). + Possible values: UDM_PHRASE_ENABLED and UDM_PHRASE_DISABLED. + Please note, that if phrase search is enabled (UDM_PHRASE_ENABLED), + it is still possible to do search in any mode (ANY, ALL, BOOL or PHRASE). + In 3.1.10 version of mnoGoSearch phrase search is supported only in sql and buuilt-in database modes, + while beginning with 3.1.11 phrases are supported in cachemode as well. + + + Examples of phrase search: + + + "Arizona desert" - This query returns all indexed documents that contain "Arizona desert" as a phrase. Notice that you need to put double quotes around the phrase + + + + + UDM_PARAM_CHARSET - defines local charset. Available values: set of charsets supported by mnoGoSearch, + e.g. koi8-r, cp1251, ... + + + + + UDM_PARAM_STOPFILE - Defines name and path + to stopwords file. (There is a small difference with mnoGoSearch + - while in mnoGoSearch if relative path or no path entered, it + looks for this file in relation to UDM_CONF_DIR, the module looks for + the file in relation to current path, i.e. to the path where the + php script is executed.) + + + + + UDM_PARAM_STOPTABLE - Load stop words from the given SQL table. You may use several StopwordTable commands. + This command has no effect when compiled without SQL database support. + + + + + UDM_PARAM_WEIGHT_FACTOR - represents weight factors for specific document parts. Currently body, title, keywords, description, url are supported. + To activate this feature please use degrees of 2 in *Weight commands of + the indexer.conf. Let's imagine that we have these weights: + + URLWeight 1 + BodyWeight 2 + TitleWeight 4 + KeywordWeight 8 + DescWeight 16 + + As far as indexer uses bit OR operation for word weights when some + word presents several time in the same document, it is possible at search + time to detect word appearance in different document parts. Word which + appears only in the body will have 00000010 argegate weight (in binary notation). + Word used in all document parts will have 00011111 aggregate weight. + + + This parameter's value is a string of hex digits ABCDE. Each digit is a factor for corresponding bit in word weight. For the given above weights + configuration: + + E is a factor for weight 1 (URL Weight bit) + D is a factor for weight 2 (BodyWeight bit) + C is a factor for weight 4 (TitleWeight bit) + B is a factor for weight 8 (KeywordWeight bit) + A is a factor for weight 16 (DescWeight bit) + + Examples: + + + UDM_PARAM_WEIGHT_FACTOR=00001 will search through URLs only. + + + UDM_PARAM_WEIGHT_FACTOR=00100 will search through Titles only. + + + UDM_PARAM_WEIGHT_FACTOR=11100 will search through Title,Keywords,Desctription but not through URL and Body. + + + UDM_PARAM_WEIGHT_FACTOR=F9421 will search through: + + Description with factor 15 (F hex) + Keywords with factor 9 + Title with factor 4 + Body with factor 2 + URL with factor 1 + + If UDM_PARAM_WEIGHT_FACTOR variable is ommited, original weight value is + taken to sort results. For a given above weight configuration it means + that document description has a most big weight 16. + + + + + UDM_PARAM_WORD_MATCH - word match. You may use this parameter to choose word match type. This feature works only + in "single" and "multi" modes using SQL based and built-in database. It does not work in cachemode and other modes + since they use word CRC and do not support substring search. + Available values: + + UDM_MATCH_BEGIN - word beginning match; + UDM_MATCH_END - word ending match; + UDM_MATCH_WORD - whole word match; + UDM_MATCH_SUBSTR - word substring match. + + + + UDM_PARAM_MIN_WORD_LEN - defines minimal word length. + Any word shorter this limit is considered to be a stopword. Please note that this paraneter value is inclusive, + i.e. if UDM_PARAM_MIN_WORD_LEN=3, a word 3 characters long will not be considered a stopword, while + a word 2 characters long will be. Default value is 1. + + + + + UDM_PARAM_ISPELL_PREFIXES - Possible values: UDM_PREFIXES_ENABLED and UDM_PREFIXES_DISABLED, + that respectively enable or disable using prefixes. E.g. if a word "tested" is in search query, also words like "test", "testing", etc. + Only suffixes are supported by default. Prefixes usually change word meanings, for example if somebody is searching for the word "tested" + one hardly wants "untested" to be found. Prefixes support may also be found useful for site's + spelling checking purposes. In order to enable ispell, you have to load ispell data with udm_load_ispell_data. + + + + + UDM_PARAM_CROSS_WORDS - enables or disables crosswords support. + Possible values: UDM_CROSS_WORDS_ENABLED and UDM_CROSS_WORDS_DISABLED. + + + The corsswords feature allows to assign words between <a href="xxx"> and </a> + also to a document this link leads to. It works in SQL database mode and + is not supported in built-in database and Cachemode. + + + + Crosswords are supported only in mnoGoSearch 3.1.11 or later. + + + + + - - - - udm_errno - Get mnoGoSearch error number - - - Description - - - int udm_errno - int agent - - - - udm_errno returns mnoGoSearch error number, - zero if no error. - - - agent - link to agent identifier, received - after call to udm_alloc_agent. - - - Receiving numeric agent error code. - - - - - - - udm_error - Get mnoGoSearch error message - - - Description - - - string udm_error - int agent - - - - udm_error returns mnoGoSearch error message, - empty string if no error. - - - agent - link to agent identifier, received - after call to udm_alloc_agent. - - - Receiving agent error message. - - - - + +