Magus

Magus, the Magnome genome understanding system, is a web-based collaborative system for comparative analysis and annotation of complete, related genomes.

Overview

The MAGUS genome annotation system integrates genome sequences and sequences features, in silico analyses, and views of external data resources into a familiar user interface requiring only a Web navigator. MAGUS implements annotation workflows and enforces curation standards to guarantee consistency and integrity. As a novel feature the system provides a workflow for simultaneous annotation of related genomes through the use of protein families identified by in silico analyses; this has resulted in a three-fold increase in curation speed, compared to one-at-a-time curation of individual genes. This allows us to maintain standards of high-quality manual annotation while efficiently using the time of volunteer curators.

MAGUS is built on: a standard sequence feature database, the Stein lab generic genome browser, various biomedical ontologies (http://obo.sf.net), and a web interface implementing a representational state transfer (REST) architecture.

Gene locus annotation

magus_locus_sm.png An annotation locus in Magus is a region that includes one or several overlapping genes or gene models. The first task in annotating a locus is to choose between competing gene models. Annotation loci are drawn in a separate track in the browser view on the right, above the gene models and the other analysis results. A list of the competing models and their size is shown in the clickable list on the left. When a gene model is validated, it is shown separately at the beginning of the list, as shown in the picture.

Even after annotation, a locus may contain several validated genes. This is because a locus is defined by overlapping predictions, for example, if a prediction spans two validated genes. A locus can also contain splice variants of the same gene; each variant is treated as a separate object since it is the predicted mature mRNA and not the gene itself that is annotated (see below).

A locus may also contain a pseudo-gene, which is not transcribed and produces no mRNA. Such elements as annotated as loci, not genes. They may have strict or fuzzy bounds.

magus_locus_sm.png Individual transcribable gene elements are explored and annotated using a gene page composed of five parts organized on one web page. The cartouche in the upper left describes the element and various simple analyses of the gene and its predicted translation product. It also contains a `quick annotation' box with a suggested defline. The gene neighborhood in the upper right shows a clickable inset produced by the genome browser, showing neighboring genes, GeneMark analyses, equence alignments, and competing models. Following that is a table of available in silico analyses for this gene model, including detailed sequence alignment reports and GeneMark reports, searches for transmembrane spans, and InterPro domain searches. Finally, the last two parts (not shown in the picture) are an annotation area for tag-value pairs using a controlled vocabulary, and a listing of sequence data (DNA, predicted mature mRNA, predicted translation product) with links to a custom blast search for each sequence.

Homolog groups annotation

magus_group_sm.png Magus provides a unique facility for annotating homolog groups in several genomes. A homolog group model is a collection of gene loci containing genes thought to be homologous. They may be defined explicitly by the administrator using an arbitrary in silico analysis, or generated on the fly from PSITBLASTN searches against protein family PSSMs. Loci may be removed from consideration in the group by unchecking a selection box on the left. When a homolog group model is validated, it becomes a homolog group.

The first step in annotating a putative homolog group is to resolve any loci that do not contain a validated gene. Typically this is very easy, since the context of the homolog group gives strong clues about which gene model is likely to be the right one to validate. Once all of the relevant loci contain validated genes, the putative homolog group can be analyzed and annotated as a group. The homolog group annotation page also contains five sections. The cartouche describes the group and the way it was identified. The top right corner contains a multiple alignment of the predicted translation products. Next is an annotation area for the putative group members. A suggested group defline is provided based on group members that have already been annotated; this group defline can be propagated to individual members by clicking a button, and subsequently edited individually. When the group is validated or updated, all of the gene annotations are updated simultaneously.

magus_gp_neighborhood_sm.png Following the annotation area is a comparison of gene neighborhoods in the different genomes. A wide genomic neighborhood around each group member is shown in a vertical list, centered around the putative group member. This allows rapid comparison of the gene neighborhoods in each region, and visual estimation of synteny conservation in each of the genomes. This can be used at first to choose the correct gene model in a locus. Later, in can be used to navigate from one group to the next: within syntenic regions, one can expect to find several groups of homologs conserved from one species to the next.

The final section contains sequence data for the predicted translation products of the group members, and a link for constructing custom multiple alignments.

In this way teams of annotators can work together on a collection of related genomes, simultaneously annotating homologs from several genomes at once, with a corresponding improvement both in speed and in consistency.

Management tools

Magus provides a general system of todo lists created by the administrator. Gene loci and homolog groups in particular can be assigned to individual annotators, and their status can be tracked on a per-list dashboard page. Typically this is used to follow the progress during an annotation phase. In Génolevures we used four distinct phases with a data freeze between each one:
  1. Putative homolog groups (simultaneous annotation 4 species)
  2. Singleton gene loci (individual gene annotation)
  3. Complete chromosomes (sequence walk)
  4. Complete genomes (sequence walk)

Technical details

See also the working documents MvcModel, MvcControllers, MvcViews

Contact

-- DavidSherman - 19 Oct 2010