Archive:Ontology import

From semantic-mediawiki.org


This page contains outdated information and is thus OBSOLETE!
This documentation page applies to all SMW versions from 0.1 to 0.7.
      Other languages: zh-hans


Semantic MediaWiki has a feature (still in beta, and disabled in version 1.0) to import ontologies into a Semantic MediaWiki installation. The ontologies have to follow certain formats, in order to be useful. Further down you will find a description of an alternative way of importing data from ontologies (or, actually, other formats as well).

Ontology format[edit]

The ontology elements -- i.e. classes, properties and individuals -- should all have labels. The labels will be used to name the relations, the categories, and the article pages, and also to create the appropriate annotations -- i.e. typed links or categorization links -- on the article pages. The mapping of the import naturally follows the mapping of the export, so it looks like this:

OWL Construct Semantic MediaWiki
Class Category
Datatype property Properties and types
Object property Properties and types also (??)
Class instantiation Page categorization (e.g. [[Category:X]])
Subclass of Category subcategorization (e.g. [[Category:X]] on a category page)
Individual Article (in Main namespace)
Instantiated datatype property Attribute annotation (e.g. [[X::Y]])
Instantiated object property Typed link (e.g. [[X::Y]])

Note that the ontology needs to be in OWL DL in RDF-serialization, not just general RDF or RDFS, and that all properties and classes have to be defined as such in order to be recognized. Only explicit statements get imported, i.e. no reasoning occurs. So even if you can infer from the ontology that Adam is a man, he will not be put into the Category:Man unless such a triple is in the ontology, and Man is defined as an OWLClass. If you want to import implicit statements, you have to make them explicit first. You can use any reasoner that allows you to materialize such statements.

Note also that all further constructs from OWL DL, like inverse relations, complement classes, class union, etc., will not be imported into the ontology. If you want to use more complex ontologies with the Semantic MediaWiki, check out the publication ow:Reusing Ontological Background Knowledge in Semantic Wikis.

How it works[edit]

On the page Special:Import ontology you can upload the ontology file. The special page is only available to users with admin privileges. After you have chosen the ontology file the system parses it (using RAP, thanks!), you will be presented with a list of all importable statements, i.e. especially statements that are not within the wiki already (though this display is a bit buggy, sorry for that). Here you can choose every statement to import, and you can enter a small text to be imported alongside the import (for example a template that will resolve to a message telling the user that this information was imported from a particular ontology).

After you have chosen the appropriate statements and set all other options, click on the import button at the far end of the page, and wait. A few moments later, the statements should have been imported (check Recent changes).

Note that this part is still somewhat buggy. You may want to try smaller portions of the ontology first, or even single statements, to see if it works as you want it to.

Alternative import[edit]

As an alternative to this experimental feature, you can use pre-processing tools to annotate wiki pages with the wiki text for SMW properties, then import those pages using MediaWiki import tool(s).

A more robust way to import ontologies, is to use a framework like the Python Wikipedia Bot. It should work with other wikis as well, not just with Wikipedia, but you will have to create a new family file in order to get access to your wiki. In this case you are not constrained to using OWL DL compliant ontologies.

For example, on the Ontoworld wiki we imported the delegates list from the ow:ESWC2006 ontology. We sketch the program in the following. It uses the rdflib library for the RDF parsing, and it uses the Wikipedia Bot framework to work with Wikipedia. It creates a template out of the RDF. It could also create sentences with typed links inside (see towards the end of the code for an example), or check the output of the page first if the triple to be added is already included (and thus may be skipped).

from rdflib import Graph, URIRef, Literal, Namespace, RDF
import wikipedia, login, category

family = "ontoworld" # note that you need to setup the appropriate family file

i = Graph()

i.bind("foaf", "http://xmlns.com/foaf/0.1/")
RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#")
RDFS = Namespace("http://www.w3.org/2000/01/rdf-schema#")
FOAF = Namespace("http://xmlns.com/foaf/0.1/")
i.load("eswc.rdf")

ow = wikipedia.Site('en')
login.LoginManager('password', False, ow)

unchanged = list()   # in order to safe those that already have a page

# iterates through everything that has the type Person
# (note, only explicit assertions -- rdflib does not do reasoning here!)
for s in i.subjects(RDF["type"], FOAF["Person"]):
        for n in i.objects(s, FOAF["name"]):  # reads the name
            p = wikipedia.Page(ow,n)          # gets the page with that name
            if p.exists():
                unchanged.append(n)
            else: # create the instantiated template
                h = '{{Person|'   '\n'
                h  = ' Name='   n
                
                for hp in i.objects(s, FOAF["workplaceHomepage"]):
                    h  = '|'   '\n'
                    hp = hp[7:]
                    h  = ' Homepage='   hp
                    if len(hp)>23: # if the name of the page is too long,
                        h  = '|'   '\n'
                        if hp.find("/"): # make something shorter
                            hp = hp[0:hp.find("/")]
                        h  = ' Homepage label= at '   hp

                for hp in i.objects(s, RDFS["seeAlso"]):
                    h  = '|'   '\n'
                    h  = ' FOAF='   hp
                h  = '\n'   '}}' # end Person template

                # write a sentence
                h  = '\n'   "'''"   n   "''' attended the [[delegate at::ESWC2006]]."

                # add a category
                h  = '\n'   '\n'   '[[Category:Person]]'
                print n   ' changed'
                p.put(h, 'Added from ontology')

wikipedia.stopme()
print unchanged

As you have the full power of Python available, you can basically parse any machine-readable document and process it any way you like. As of 2006, and as long as the ontology import is still not perfect, this is the recommended way to import data into the ontology (especially since it allows you much more freedom in stating the facts and reusing templates than the ontology import ever will).

There is an alternative description for importing data with a script.