Help:Repairing SMW's data

From semantic-mediawiki.org
Jump to: navigation, search
SMW admin manual
Installation
Configuration
Concept caching
Pretty URIs
Using SPARQL and RDF stores
Troubleshooting
Repairing SMW's data
Extensions
Basic extensions
Semantic extensions
SMW user manual

All data that Semantic MediaWiki (SMW) uses is stored in wiki pages. If the data should ever get out of date or contain any errors, then it is always possible to completely rebuild the data from the wiki. No data is ever lost. Refreshing data is also needed on some software upgrades, and after the first installation (since it also gathers some existing meta data).

This page describes ways to repair/initialise basically any SMW installation. The data of a single page can be refreshed by simply editing and saving it. If there are many pages it is more convenient to use a feature of Special:SMWAdmin for doing this automatically. There is also a maintenance script for doing this from the command line: "rebuildData.php" ("SMW_refreshData.php" ≤ SMW 1.9.1).

To make sure that all wiki pages display the new data after the repair, you can run
touch LocalSettings.php
(or, if there is no command line access, edit it in some trivial way). This will invalidate any MediaWiki page caches that may otherwise make you see old versions of wiki pages.

Using Special:SMWAdmin

The administration special page "Special:SMWAdmin" offers a feature for repairing all data. This page is only available to wiki users with administrator status. Moreover, the update process can only be started or stopped online if the configuration parameter $smwgAdminRefreshStore is set to true (default).

Once initiated, the update takes time. The progress can be viewed on Special:SMWAdmin. Even if the option $smwgAdminRefreshStore is disabled after starting the update, the ongoing process will continue and can be tracked online. Stopping the process is only possible if $smwgAdminRefreshStore is enabled.

The time the update will take varies from wiki to wiki. The update progresses during each page view. If many people view your wiki, then the update progresses more quickly. If there are a large number of pages, then the update will take longer. It is normal that the update progresses faster until it reaches 50%, since only property and type pages are refreshed during that part. The actual update of all wiki pages starts at 50%.

You can speed up the process by using one of the two following options:

  • If you have shell access, you can use the MediaWiki maintenance script "runJobs.php". Please consider specifying a parameter --maxjobs 1000 or similar so that each run of the script is bounded in duration. Otherwise the script tends to occupy increasing amounts of memory.
  • If you do not have shell access you may use the MediaWiki setting $wgJobRunRate in your "LocalSettings.php" file to increase the number of jobs which should be performed per request. This increases the speed of the update process. Please note that it also increases the load on the system which may have an negative effect on the performance of your wiki.
  • You can also run a script to automatically hit the web site a certain number of times, so that you don't have to either wait for the site to be hit or keep reloading it in the browser. You can find an example of such a script here: "hitURL.php". This can be done in conjunction with increasing the value of $wgJobRunRate.
Special:SMWAdmin
Special:SMWAdmin informing about the start of an update process
Special:SMWAdmin during data repair

Using the SMW maintenance script

While the above method can basically be done without utilising a maintenance script, there is also a script "rebuildData.php" that directly refreshes selected portions of the wiki without any prior access to it. The basic operation of "rebuildData.php" is to go through all pages of the wiki and to re-store the semantic data for each. Normally, the script can be run by changing to the directory [path to SMW]/maintenance of your SMW installation, and the executing

php rebuildData.php -v

where of course PHP needs to be installed on the command line. If this does not work on your site (e.g. due to unusual directory structures), read the file [path to SMW]/maintenance/README in that directory.

The above script goes through all pages in the order they are stored in your wiki database, and refreshes their data. The parameter -v makes sure that the script's progress is printed. The script can be aborted by pressing "CRTL-C" as usual. The index numbers shown by the script refer not only to page indices as used in MediaWiki, but also to indices SMW uses in its semantic data. For this reason, the script may process indices that are higher than the maximal page index in the wiki.

If you have a large number of pages then the script may consume a lot of memory during its execution, and it is better to stop after, say, 2000 pages. This is due to a PHP memory leak. As a workaround, the script can be run for only part of the pages at a time: use the parameters -s and -e to give a first and last page id to be refreshed, e.g.

php rebuildData.php -v -s 1000 -e 2999

Multiple runs of this script might be needed, e.g. since data for properties can only be stored when the datatype of the property was stored. You can run the script with parameters -tp to refresh only type and property pages at first, so that these are already available when doing the second refresh. Overall, more than two refreshes should not be required in normal cases.

To make sure that all wiki pages display the new data after the refresh, you can run
touch LocalSettings.php
This will invalidate any MediaWiki page caches that may otherwise make you see old versions of wiki pages.

Rebuilding everything

The above methods should be able to fix data records in SMW in most cases. However, it is conceivable that some erroneous content of the SMW storage still persists for some reason. In this case, it makes sense to completely delete and reinstall the database structures of SMW before refreshing all data.

To completely delete all SMW data, the setup script "SMW_setup.php" is used with parameter --delete:

php SMW_setup.php --delete
After this, proceed as if re-installing SMW anew by first running
php SMW_setup.php
again, and then triggering the repair of all data using one of the above methods.

The script "rebuildData.php" can be also used with parameter -f to delete and recreate all data in one step. In this case, it is suggested to first rebuild the records for all properties and types, and to process the remaining data afterwards. So one would run:

php rebuildData.php -ftpv
php rebuildData.php -v

Note that of course only the first run uses -f. On large wikis, the parameters -s and -e can again be used as explained in the previous section.

Automatic repair features

Some changes on wiki pages require that the data of other pages is updated as well. For example, if a template that contains semantic annotations is changed, then the data for all pages using this template might also require update. Likewise, if the datatype of some property is changed, all pages using this property should be refreshed. SMW usually takes care of such updates automatically. As in MediaWiki, it may take some time until all required updates are completed. There is no convenient way to review the progress.

Caveats

When adding new namespaces or activating existing namespaces with the setting true for the configuration parameter $smwgNamespacesWithSemanticLinks, the default special property "Modification date" will not be created by repairing or initialising data according to the explanations described on this page. This property will be created each time a page in this new namespace is modified, after it was added to the wiki.



This documentation page applies to all SMW versions from 1.4.0 to the most current version.
Other versions: 1.0 – 1.3       Other languages: dezh-hans

Help:Repairing SMW's data en 1.4.0