GSoC 2012

From semantic-mediawiki.org
Google Summer of Code: 201120122013MediaWiki GSoC projects

Semantic MediaWiki (or, more specifically, the Open Semantic Data Association) applied (again) in 2012 to be a mentoring organization for the Google Summer of Code (a program in which Google funds students to work on open-source projects over the summer), but again we were rejected. However, we may able to do a few projects via the Wikimedia Foundation, which was accepted into GSoC. The set of ideas that was on this page has now been moved to the WMF's list of MediaWiki GSoC project ideas.

Also have a look at the roadmap, which contains todos for both SMW and extensions.

Accepted proposal[edit]

Nischay Nahata, working on optimizing the performance of the Semantic MediaWiki extension. Mentorship is taken up by Markus Krötzsch and Jeroen De Dauw, aiming to reduce SMW’s energy consumption and making it greener. (see also GSoC'12 project - GreenSMW)

Other GSoC 2012 proposals and ideas[edit]

SMW query management and smart updates[edit]

Component: Semantic MediaWiki (core)

Expected results: New capabilities added to SMW

Short explanation: Query management is a proposed addition to the capabilities of Semantic MediaWiki that would allow automatic updating of queries and gathering of query statistics. This would work by storing query meta data as semantic properties, which can then be queried. Query management would allow automatic updating of query results when their source data is modified. This ensures up-to-date query results everywhere, without the need of more resource-intensive solutions like disabling the cache, or rebuilding all pages via a cron-job. This automatic updating is made possible by storing query dependencies among the query meta-data. Query management would allow you to query various things about query usage such as where queries are located, how much dependencies they have, how long/expensive they are, time of their last update, etc. With this information you can get a better overview of how queries are used across your wiki and pinpoint inefficient usage.

Prerequisites: prior programming experience, working knowledge of PHP, decent database knowledge is a plus

More powerful result formatting for SMW[edit]

Component: Semantic MediaWiki (core)

Expected results: Modifications to SMW and additions to SMW extensions

Short explanation: SMW and its extensions provide many features for displaying data in wiki pages. The goal of this project is to further improve these features to provide more dynamic and flexible result views for users. Concrete tasks include (but are not limited to):

  • Support for interactive result formats that allow you to filter and expand data (obviously each format will have to implement this itself, but a general infrastructure to handle HTTP reqs needed for this functionality in SMW would be very nice)
  • Syntax extension to group results
  • Syntax extension to allow modifying of the page name based on properties. Might make sense to implement this as a more general feature.
  • More control over the display of values for properties. For example when using #show to list all attendees of a single event, you can't really change the display right now.

Prerequisites: PHP, JavaScript

An elegant and simple database layer for SMW[edit]

Component: Semantic MediaWiki (core)

Expected results: A simpler and cleaner database schema for storing SMW's data

Short explanation: The data that SMW stores is very simple, mainly properties and values using a dozen simple datatypes. The relational database storage code of SMW is not simple, mainly because the data model in SMW has been simplified only after the current code was written. The proposal is to start from scratch: look at the data model and the required access methods, and create a cleaner database access class that is easy to maintain. Due to the new RDF store connectors of SMW, the most complicated code for query answering is not needed, and one can focus on a simple data exchange layer. This project is for those who enjoy streamlining code to make it more elegant and efficient.

Prerequisites: desire to write simple code, prior programming experience, basic knowledge of PHP and SQL

Improving the interplay between Spark and SMW[edit]

Component: Extension "Semantic Result Formats" and Spark

Expected results: Being able to use Spark with Semantic MediaWiki as the backend store easily, and using Spark within SMW with data from an external source

Short explanation: Spark is a JavaScript library which allows to take SPARQL query results and visualize them within any HTML5 site. It is basically like inline queries in SMW, but against any SPARQL endpoint and with no required backend. The idea would be to extend Spark so that it can be used against SMW data and not only against SPARQL endpoints, explore if the #ask syntax makes sense, and add a Semantic Result Format that allows to integrate Spark into Semantic MediaWiki.

Prerequisites: JavaScript, PHP (a little)

Adding unit tests to SMW[edit]

Component: SMW core, possibly extensions

Expected results: Create unit tests (mainly PHPUnit, possibly also some QUnit) so we notice when something breaks.

Short explanation: SMW currently has no unit tests, and really could use some as subtle behavior changes get introduced over time that are not documented and often not intended. A good place to start adding tests are the DataValue classes, which can use testing for their parsing and their formatting methods. More details on SMW unit testing can be found here.

Prerequisites: PHP

Semantic Drilldown improvements[edit]

Component: Extension "Semantic Drilldown"

Expected results: Various improvements to Semantic Drilldown.

Short explanation: Semantic Drilldown is an extension that lets users drill down on pages via semantic properties. It is a popular extension (and one of only a handful of SMW extensions enabled on Wikia), but it has a number of important weaknesses:

  • Compound data defined within pages, using either subobjects or Extension "Semantic Internal Objects", cannot be filtered on.
  • "Concepts" cannot be filtered on.
  • Results can't be shown in multiple formats at the same time (like a map and a list).
  • The display of results in columns can be awkward (see here, for example).
  • When drilling through subcategories, the full path of subcategories isn't shown.
  • The interface currently doesn't offer flexibility between doing an "AND" and "OR" (or "NOT", for that matter) of different values.
  • It may be possible to improve the extension's performance, using some sort of caching system.

Google Spreadsheet Result Format for SRF[edit]

Component: Extension "Semantic Result Formats"

Expected results: Exports inline query results to a worksheet in a given Google Docs spreadsheet.

Short explanation: Knowledge Workers use/abuse spreadsheets for a lot of their data processing needs. Having the ability to save query results to spreadsheets should increase adoption and has the added benefit of gaining access to additional spreadsheet features for data post-processing (e.g. formulas, pivot tables, etc.). In addition, Google Charts can use a Google Spreadsheet as a datasource. This gives the ability of creating complex dynamic charts with full access to the Google Charts featureset that can live outside of SMW but still use data from the wiki. Right now, current SRF visualizations are constrained to use a very limited subset of their underlying visualization frameworks. Perhaps, a complementary Google Charts extension can even be made to utilize transformed query result data that was post-processed by a Google Spreadsheet that was initially populated by this result format.

Finally, using Google APIs/resources should garner some brownie points from GSoC administrators.

Prerequisites: PHP, Google Spreadsheets API, Google Documents List API

Mentor: Joel Natividad

Semantic Forms Rules[edit]

Component: New extension

Expected results: Enable a much more dynamic behavior of Extension "Page Forms" than currently possible

Short explanation: Currently dynamic behavior of Semantic Forms is achieved by adding additional parameters to inputs (or by having dedicated inputs). This is a good approach where the dynamic behavior only concerns one input. Autocompletion would be an example. It becomes awkward when more than one field is concerned, e.g. with show on select. And it fails altogether, if more than two inputs are involved or if the desired behavior is more complex. Generally any behavior involving two or more inputs should not be handled by one of the affected inputs, but by a dedicated controlling entity that exists once for every form. In fact, in many cases this would even be beneficial where only one input is involved, as it would enable one behavior for multiple input types without having to duplicate code.

The idea now is to define rules consisting of triggers, conditions and actions. The conditions are evaluated when certain triggers are detected. Depending on the result the associated actions would be performed. Individual rules could be kept very simple, but the result of the evaluation of a rule would be stored and could later be referenced in other rules. This way rules could be chained and more complex rules could be constructed from simple ones. See a preliminary spec.

Prerequisites: JavaScript (incl. jQuery), PHP

Mentor signup: Stephan Gambke

Extend and improve RDBMS support of SMW[edit]

Component: Semantic MediaWiki

Expected results: Additions to the relational database store support of SMW, and improvements to the existing implementation.

Short explanation: SMW currently has support for MySQL and, to some degree, PostgreSQL. This is done via a single "SQL store", which varies its behaviour slightly based on the type of database used. Having 2 separate stores would likely be better. This store currently also lacks support for special data types, such as geographical entities, which cannot be interacted with, since this would require putting SQL functions around field names or values in SQL statements, which is currently not possible. Adding support for the other RDBMSes (partially) supported by MediaWiki core (mainly Oracle and MSSQL) would also be nice.

Prerequisites: PHP, SQL, the RDBMS you want to work with

Mobile support for GUIs for SMW and SF[edit]

Component: Semantic MediaWiki core, Semantic Forms, possibly other SMW extensions with GUIs

Expected results: Make versions of SMW's and related extensions' graphical user interfaces that are optimized for mobile devices. In particular, SMW's Special:Ask and Semantic Forms' Special:FormEdit.

Short explanation: Currently SMW lacks specific support for mobile. Although the interfaces are usable on mobile devices, they could be a lot more optimized.

Prerequisites: PHP, UI experience

Interactive Ontology Visualizer/Navigator[edit]

Component: New Extension

Expected results: Visualize Ontology using D3.js or similar Javascript framework, and use it to navigate the wiki

Short explanation: There is currently no graphical, interactive way to visualize the Wiki Ontology. In the Halo project, the Data Explorer (nee Ontology Browser) and Semantic Treeview extension have served the purpose to a limited extent. Perhaps we can use the D3 Graph format, with each node represented by either specially declared instance images (perhaps thru an instance avatar image property) and/or a category avatar image. The visualization also doubles as a navigator that automatically adjusts as you browse the wiki.

Prerequisites: PHP, Javascript, D3.js

OData Result Format[edit]

Component: Extension "Semantic Result Formats"

Expected results: Query Results are published using the OData specification at persistent URLs

Short explanation: Even though SMW primarily publishes RDF data, in compliance with true open standards, it would be great to support the Open Data Protocol as well to "embrace and co-opt" this Microsoft specification for Open Data as a lot of enterprise-class tools and customers use OData. Of primary interest is Tableau Public, a very powerful visualization tool, which supports OData data sources even for the Free Public version.

Prerequisites: PHP

Further ideas[edit]