SEQwiki

From semantic-mediawiki.org
SEQwiki
Wiki of the Month January 2012
Image for SEQwiki
This website is no longer existing.
Statistics (2011/12/13)
Pages: 700
Users: 1000 (79 active)
Properties: 50
Categories: 10
Templates: 30
Forms: 3
Wikiapiary: Link
Table of Contents

SEQwiki is a wiki database of the available tools for analyzing high-throughput sequencing (HTS) data (it currently includes over 500 such tools), a global listing of HTS service providers (it includes over 100), and a set of tutorials explaining how to analyze HTS data to address specific biological questions.

DNA is the information of life and DNA sequencing allows us to read the information contained in DNA. HTS is a catch-all term for a set of new technologies that are revolutionizing our ability to sequence DNA. For example, sequencing an entire human genome using HTS today costs just $4000 and takes only a few weeks. In contrast, the first human genome sequence (finished in 2000) cost three billion dollars and took over 10 years!

SEQwiki homepage

SEQwiki is built and hosted by the SEQanswers community, a forum dedicated to sharing knowledge on all aspects of HTS. The wiki was conceived and launched in late 2009 to replace a static list of software in a thread on the forum.

The wiki is used by those analyze HTS data, and anyone interested in keeping track of the tools and developments in sequencing technology.

View of a typical page[edit]

view of a typical page

The largest collection of pages in the wiki describe software tools for HTS analysis.

The free text summary and description of a tool is shown in the middle of the page. The info box on the right shows the standard data for the tool, for example, the input and output formats, software licence, and other important facts for users. The description is followed by links to the tool's homepage and then the references (listings of related publications in peer reviewed journals).

At the bottom the page, users can search for more information of a tool on the web (Google and Clusty), on Wikipedia, and also in scientific databases. Links are also provided for searching for references to the tool in the SEQanswers forum and BioStar.

Using the collected data, users can query for tools within SEQwiki.

Use of SMW[edit]

Geographical distribution of Sequencing facilities on record in SEQwiki

The number of different tools available for HTS analysis is almost overwhelming. An ever growing list of tools was being maintained on the SEQanswers forum, as this was seen as an important resource for the growing community. However, the compilation of the static list from forum posts had several serious problems:

  • There was an ever increasing number of tools with a variety of different application domains.
  • The burden of curation was falling on just a few list moderators.
  • Only a few different columns of information were being collected for each tool.
  • The list was only searchable by free text, and couldn't be queried or filtered.

By adopting SMW, we solved all of these problems in a single stroke, allowing the detailed collection of tools to be maintained by the whole community. Many different users have now contributed to the list, often adding just 1, but some adding more than 100 tools to the list. Many tools have only a few key facts annotated, but many fields have been defined for tools, such as the function, operating system, programming language, method, provenance, logo, author, and type of HTS data that the tool is designed to work with.

SMW allowed us to annotate tools with complex data types, such as publications. By using links between publication pages, we can, for example, query and present the number of tools published in the journals of Science and Nature, the most popular journals for publication of HTS tools, the growth in the number of published tools by date, and even the number of references for each tool.

Extensions used in addition to Semantic MediaWiki[edit]

Figure1. Users can narrow down the search by software language, purpose and application domain
Semantic Drilldown (Figure 1) 
The Semanitc Drilldown view allows users to narrow down the choices of a massive collection of tools according to 1) the programming languages, 2) the operating system the software is designed for the, 3) technologies of the sequencing platforms the software is compatible with, 4) the analysis method and 5) the biological domain.
Figure2. Users are guided in creating and editing a new tool
Semantic Forms (Figure 2) 
Through the powerful Semantic Forms, the user is guided through the process of adding standardized data for the tools, including summary, authorships, technical information and references. Forms uses jQuery-based autocompletion to help users discover existing tools and common values for the form fields.
External Data 
To query an API to collect the number of citations for each publication linked to each tool (and aggregate the number of citations for the tool).
Parser Functions / Variables 
To expand the wiki syntax and allow a richer 'language' with which to create applications within SMW.

Perspectives[edit]

SMW is a powerful extension, and is very complex to program. An IDE for developing SMW powered sites would be great, allowing editors to coordinate the design and editing of forms, pages and displays (often these elements all need to be edited in concert).

Features[edit]

We developed some really nifty, complex applications with SMW, SMW/SF and SD.

  1. Building an article score based on the number of fields that users had edited. This allowed us to rank articles by score for the attention of the editors [1]. See the code here: [2]
  2. Building a referencing system for adding references to a page and listing the added references to that page: [3] [4]
  3. Collecting the number of citations for each publication and aggregating it per tool.


Contributions[edit]

Usage of SEQwiki over the years (1)
  • NGS analysts: If you come by a good tool, suggest it to your fellows here
  • Developers: Academic and commercial developers, advertise your tools objectively here


References[edit]

(1) Li JW, Robison K, Martin M, Sjodin A, Usadel B, Young M, Olivares EC, Bolser DM: The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis. Nucleic Acids Res 2011.