☰ User manual
v3.0.0+
Status: | effective |
Progress: | 100% |
Version: | 3.0.0+ |
English
Full-text search
From semantic-mediawiki.org
Full-text search | |
---|---|
Full-text search support for properties which data types use strings of characters or text to store their database tables . e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc. | |
Keywords | |
Table of Contents | |
Semantic MediaWiki 2.5.0Released on 14 March 2017 and compatible with MW 1.23.0 - 1.29.x. adds experimental support for accessing the full-text capabilities of the relational databases (SQL back-end) for properties whose data types use strings of characters or text to store their database tables, e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc.
Features[edit]
General notes[edit]
- The
FT_SEARCH
table aggregates search content for datatypes storing their data asBLOB
andURI
values, e.g. datatype "Page"Holds names of wiki pages, and displays them as a link, datatype "Text"Holds text of arbitrary length, datatype "Code"Holds technical, pre-formatted texts (similar to datatype Text) or datatype "URL"Holds URIs, URNs and URLs, etc. - These datatypes use either
CHAR
,VARCHAR
, orTEXT
to store their data in the database tables. - Supported operations rely on the relational backend database (MySQL, MariaDB and SQLite).
- For MySQL and MariaDB databases,
IN BOOLEAN MODE
is used as default search mode. This allows for a number of special operators to be used by the software. - Relevance and scores are not used for any sorting purpose, e.g. as in best match.
TextSanitizer
relies on the "onoi/tesa" library1 to help with the sanitization of text or string elements to provide some text manipulation support as well as a possibility to use language detection if enabled. This library is pre-installed for use by Semantic MediaWiki.- Custom stopwords are only applied by the "onoi/tesa" library1 in case the language detection is enabled but MySQL/MariaDB provide their own standard list2 which are enabled by default
- Starting with Semantic MediaWiki 3.0.0Released on 11 October 2018 and compatible with MW 1.27.0 - 1.31.x.:
- If the
SMW_FIELDT_CHAR_NOCASE
option to configuration parameter$smwgFieldTypeFeatures
Sets relational database specific field type features is enabled the full-text search only comes into effect for selections using the comparators~
and!~
.3 - API-module "smwtask"Allows to invoke and execute internal Semantic MediaWiki tasks is used instead of a socket connection via a special page to invoke extra "work" after an update has been completed as part of an independent transaction.4 See also configuration parameter
$smwgPostEditUpdate
Sets how many jobs should be executed as part of a post-edit event.
- If the
Notes on language support for Chinese, Japanese, and Korean (CJK)[edit]
- General CJK support is a challenging endeavour due to text elements to be broken into corresponding tokens that are not separate by spaces
- The "onoi/tesa" library1 provides some simple
Tokenizer
's which does not require language detection and will try to provide rudimentary CJK search out-of-the box. This requires ICU 54+. - Mroonga is a MySQL storage engine and said to be a CJK-ready fulltext search, column store
- MySQL comes with an optional ngram Full-Text Parser and MeCab Full-Text Parser Plugin.
- According to this issue, MariadDB is missing those parser plug-ins. Support is still wanting in 2023.
Instructions[edit]
- For users
- Searching contains some examples and descriptions about the available search syntax
- For system administrators
- How to enable and configure full-text search on your wiki
- Indexing describes some methods on how to manually create and update the index table
- For developers
- Technical notes provides some information on the technical implementation, fine-tuning, and performance
References
- a b c | "onoi/tesa" - Metin veya dizi öğelerinin sterilize edilmesine yardımcı olacak küçük bir kitaplık.
- ^ | https://dev.mysql.com/doc/refman/5.6/en/fulltext-stopwords.html ve https://mariadb.com/kb/en/mariadb/stopwords/
- ^ Semantic MediaWiki: GitHub issue comment gh:smw:2499:307624826
- ^ Semantic MediaWiki: GitHub pull request gh:smw:3318