Maintenance script "dispose­Outdated­Entities.php"

From semantic-mediawiki.org
disposeOutdatedEntities.php
Allows to dispose outdated entities
Further Information
Provided by: Semantic MediaWiki
Added: 3.2.0
Removed: still in use
Location (path): ./extensions/SemanticMediaWiki/maintenance/
Table of Contents

The "disposeOutdatedEntities.php" maintenance script allows to dispose outdated entities, i.e. entities which were marked as deleted. The maintenance script was introduced in Semantic MediaWiki 3.2.0Released on 7 September 2020 and compatible with MW 1.31.0 - 1.35.x..12

Depending on the size and editing activity of the respective wiki it is recommended to run this maintenance script either daily or weekly via cron.

Usage[edit]

php disposeOutdatedEntities.php [--with-maintenance-log] [--of <N> --shard <k>]
This only shows the script specific parameters.

Using this maintenance script is equivalent to the following script usages in earlier versions of Semantic MediaWiki:

Semantic MediaWiki 3.0.x to Semantic MediaWiki 3.1.x
php rebuildData.php --skip-properties --dispose-outdated
Semantic MediaWiki 2.4.x to Semantic MediaWiki 2.5.x
php rebuildData.php --skip-properties -s 1 -e 1

Parameters[edit]

Maintenance scripts provide generic maintenance parameters, script dependent parameters and depending on the maintenance script script specific parameters which are described on this page if provided.

Script specific parameters
Parameter Description
--with-maintenance-log Adds a log entry to the "Semantic MediaWiki log" on special page "Log" (&type=smw).3

NoteNote:  If you are using this parameter make sure that MediaWiki's configuration parameter $wgMaxNameChars (MediaWiki.org) is set to a value not lower than "29".4 Otherwise an exception will be issued informing about the minimum value for this setting ("32" or higher is recommended).5

--of <N> Total number of parallel shards to split the disposal across. Used together with --shard to run several processes at once, each handling a disjoint slice of the outdated entities (selected via smw_id % N). Available since Semantic MediaWiki 7.0.0Released on 4 June 2026 and compatible with MW 1.43.x - 1.46.x..6
--shard <k> Zero-based index (0 to N-1) of the shard handled by this process; requires --of. Query link cleanup runs on shard 0 only. Available since Semantic MediaWiki 7.0.0Released on 4 June 2026 and compatible with MW 1.43.x - 1.46.x..6

Parallel disposal[edit]

Since Semantic MediaWiki 7.0.0Released on 4 June 2026 and compatible with MW 1.43.x - 1.46.x. the disposal removes entity references in batched deletes (a single WHERE ... IN (...) per table for each batch of entities instead of one delete per entity) which makes a single run substantially faster on large wikis.6

The disposal can additionally be split across several parallel processes using the --of and --shard parameters. Each process handles a disjoint slice of the outdated entities (smw_id % N), so the processes do not collide on the same rows:

php disposeOutdatedEntities.php --of 4 --shard 0 &
php disposeOutdatedEntities.php --of 4 --shard 1 &
php disposeOutdatedEntities.php --of 4 --shard 2 &
php disposeOutdatedEntities.php --of 4 --shard 3 &
wait

The number of shards should be tuned to what the database can absorb, since disposal is write bound. Query link cleanup runs on shard 0 only.

Note[edit]

If this maintenance script was not run for a while and as a result a threshold of 20,000 outdated entities is reached on the wiki a maintenance alert will be added to the Semantic MediaWiki dashboard reminding of this task.1

See also[edit]



References

  1. a b  Semantic MediaWiki: GitHub pull request gh:smw:4484
  2. ^  Semantic MediaWiki: GitHub pull request gh:smw:4744
  3. ^  Semantic MediaWiki: GitHub pull request gh:smw:4703
  4. ^  Semantic MediaWiki: GitHub issue gh:smw:1983
  5. ^  Semantic MediaWiki: GitHub pull request gh:smw:1985
  6. a b c  |  Semantic MediaWiki: GitHub pull request gh:smw:6972