Development notes on SPARQL/RDF store integration

From semantic-mediawiki.org
Jump to: navigation, search

Since version 1.6.0, SMW supports direct SPARQL database integration. See Help:Using SPARQL and RDF stores for details. The below article still contains some unrealized ideas that could be subject to future developments.

This page summarizes the discussions about the project of including better RDF, SPARQL, and SPARUL support into SMW core, discussions which started in earnest at SMWCon Fall 2010. The core objectives of this effort are to consolidate the following features of various existing SMW extensions:

  • synchronise SMW data with various RDF stores that allow more efficient/convenient management of semantic data than MySQL,
  • query SMW wikis using SPARQL like #ask on pages,
  • redirect #ask queries to be evaluated based on an RDF store instead of using MySQL for querying,
  • update SMW via a SPARUL web service (this will remain an extension).

Ongoing discussions on this matter will happen via the semediawiki-devel mailing list, using the tag [RDF] to flag related messages.

Participants

Various parties have expressed interest in this topic (feel free to add your name/affiliation/project here):

Comparison of Semantic MediaWiki RDF store connectors

There are already a few extensions that offer the possibility to connect the SMW to an RDF store. These connectors differ in their functionality and scalability. The goals and philosophies of their creators also differ somewhat.

Extension Architecture Underlying Store Open Source Features
Embed query results in a wiki page Multiple query output formats Query external endpoints Expose wiki data via endpoint Supports multiple endpoint implementations Import triples (update facts in wiki-text)
SMW+ Triple Store Connector Java Jena (free), OntoBroker (commercial) No Yes Yes No Yes Yes No
SparqlExtension Java/Web services Jena Yes Yes Yes Yes Yes Yes No
RDFIO PHP ARC2 (PHP) Yes No No No Yes No Yes
LinkedWiki PHP 4store + arc2 with hacking sparql 1.1 Yes Yes Yes Yes Yes Yes No
  • Note that, although it is not a SPARQL connector, the SPARQL Query extension also defines a #sparql parser function to display embedded results of queries inside pages.

Requirements

  • Use RDF stores to answer SMW queries (including existing #ask)
  • Answer SPARQL queries (by a simple web service)
  • Abstraction layer implementing SPARQL and SPARQL/Update (SPARUL) for connecting to standard (W3C) triple stores.
  • Embed SPARQL query results into wiki pages, formatted like #ask can be formatted now
  • Allow for use of any SMW query result format to display query results
  • Support for various RDF stores:

Realisation

The task can be decomposed into a number of sub-projects, some of which can be realised rather independently.

Wiki-to-RDF mapping

SMW core already has a mapping from wiki data to RDF, and this is typically used by RDF store connectors. This mapping might need to be extended or modified in order to realise all relevant queries on the data that is found in these stores.

As an example, the proper interpretation of wiki redirects as synonyms for equality could be achieved in different ways.

Mapping SPARQL results to SMW query results

All RDF storage connectors will have to map SPARQL query results to SMW data. Ideally, SPARQL results should be represented as SMWQueryResult objects that can be displayed by existing result formats without changes. There are some challenges here:

  • SMW queries always return a list of pages only, while SPARQL queries return a table that may or may not contain "pages" in any column.
(In both cases, printouts can be applied to get more data to display for the "main page" in each row; this is not happening as part of the querying but later on during display; result formats may even execute further requests to get more data.)
  • SPARQL query results contain RDF terms (RDF(S) literals, blank nodes, URIs), but for display in SMW these must be translated back into SMWDataValue objects. Unfortunately, there is not a one-to-one mapping between SMW types and types used in RDF/OWL.

Modifying the SMW store API and implementations to use RDF stores

The SMW store interface needs to be extended with a function that takes SPARQL queries and that returns SMW query result objects similar to the current query functions. SPARQL queries will essentially remain strings, since we do not want a PHP-side SPARQL parser in SMW core. Some minor extensions (e.g. adding standard namespace declarations) might be done by actual implementations that process SPARQL. The function may return false if SPARQL is not supported at all by the selected store interpretation.

The following new classes are needed in core:

  • Base class of RDF-store based SMW stores (should use the simple SMW Light store and provide #ask to SPARQL translation for inline queries, default RDF translations, and default SPARQL-result-to-wiki mappings)
  • Generic SPARQL/SPARUL SMW store implementation that can connect to different RDF stores supporting these standards (SPARUL is currently not a standard).

Further stores might be provided by extensions or in core:

  • Direct bindings to ARC2 (via PHP)
  • Direct bindings to other stores (using proprietary APIs for efficiency reasons)

Note that a relational database like MySQL might still be used for storing a 2nd copy of that data, but that the DB layout can be simplified a lot (as already done in SMW Light) since there would be no need to provide for efficient SQL queries.

The #sparql parser function

The user interface of a parser function to call SPARQL queries, most likely simply called #sparql, needs to be defined. In general, it should be very similar to #ask, supporting parameters of the same name and with the same meaning. But some of the current implementations may have further features that may or may not be supported.

SPARQL service for the wiki

It would not be hard to make the SPARQL features available to a web service that SMW offers.

Timeline

Most of this functionality is planned for release in SMW 1.6.0; though no date exists yet for the 1.6.0 release.

  • "{{{People}}}" cannot be used as a page name in this wiki.
  • "{{{Target version}}}" cannot be used as a page name in this wiki.

--Markus Krötzsch 20:10, 2 August 2011 (CEST)