Semantic search in documents (indexing)

From semantic-mediawiki.org

> Currently i'm wondering if it's possible to do a full text search on documents such as docx and pdf on my wiki page.

If we talk about Semantic MediaWiki then the answer is no.

> Is there such extension or a way already that searches in documents ? I hope someone can help me out with this, because i've been stuck for days, for trying to reinvent the wheel.

Maybe CirrusSearch [0] which underneath uses ElasticSearch can help with file indexing.

[0] https://www.mediawiki.org/wiki/Extension:CirrusSearch

Cheeers

12:51, 9 October 2015

Thank you for your reply I was actually wondering about this extension

https://www.mediawiki.org/wiki/Extension:SolrStore

13:42, 9 October 2015

Right, forget about this one maybe it will work (I never used it).

13:48, 9 October 2015
 

I just created Full-text search, it would be of great help if you could extend this page with your research experience about this topic.

13:53, 9 October 2015

Of course, will do

14:12, 9 October 2015

apparently https://www.elastic.co/guide/en/elasticsearch/guide/current/full-text-search.html does support documented full text search since the engine is based on Lucene's ! Solrstore is outdated btw.

11:22, 19 October 2015

> apparently https://www.elastic.co/guide/en/elasticsearch/guide/current/full-text-search.html does support documented full text search

In general, arbitrary text search (matching a string) and attributive search (matching an entity on certain attributive conditions) are two distinct approaches and suffice depending on the objective one tries to achieve.

Having full-text search (as a string match) combined with an attributive search is the challenge as both result sets need to be combined into one.

In the past we (core-team) had some minor exchange about ElasticSearch [2, 3, 4] integration but since that requires commitment and manpower it hasn't been on anyone's agenda.

[2] https://github.com/eea/eea.elasticsearch.river.rdf

[3] http://elasticsearch-users.115913.n3.nabble.com/Faceted-search-using-RDF-triple-like-related-documents-td4049280.html

[4] https://github.com/jprante/elasticsearch-plugin-rdf-jena

11:32, 19 October 2015