Cassava Online Archive

Cassava (Manihot esculenta, Euphorbiaceae) is an important source of energy for humans and animals and is used as a raw material for many industrial processes. Therefore, cassava is considered as one of the most useful starch crops and is expected to be used as sources of industrial and food. There is an increasing need for information, such as published literature and sequence registration in public databanks, on cassava.
 The Cassava Online Archive provides cassava mRNA sequences and ESTs currently available from NCBI (Genbank/EMBL/DDBJ) and their annotations. This database allows searches with gene function, accession number, and sequence similarity (BLAST). The annotations in the Cassava Online Archive are based on the similarity search results collated from several protein databases and the similarity map results from the cassava (Manihot esculenta, Euphorbiaceae), castor bean (Ricinus communis, Euphorbiaceae), poplar (Populus trichocarpa, Salicaceae), grape (Vitis vinifera, Vitaceae), and Arabidopsis thaliana (Brassicaceae) genome sequences. In order to improve the annotations, domain organization as predicted by InterProScan and Gene Ontology (GO) terms are included in this database.
 We plan to expand the contents of Cassava Online Archive by including information on its biochemical pathways, diverse genetic types with useful traits, molecular markers that can be used for mapping, etc.

  Assembly of cassava transcript sequences

Express sequence tags (ESTs) and mRNA sequences of cassava available in the National Center for Bioinformatics (NCBI) database were assembled by using the CAP3 program (Huang and Madan, 1999); redundant sequences were omitted from the database. The non-redundant sequences were used as query sequences for the annotation process of this database.
 We performed the CAP3 program using "-p 95 -z 1" options (relatively tight execution conditions) because the cassava transcripts available in the NCBI database have been derived from various cassava varieties.


In order to predict the function of the genes encoding the cassava transcripts included in this database, we performed similarity search against various datasets. The following databases were applied for the similarity search: the green plant (Viridiplantae) mRNA sequences (not ESTs) in the NCBI database as typical representatives of the transcript dataset of plants, the green plant (Viridiplantae) protein sequences in the NCBI database (organized by The Arabidopsis Information Resource (TAIR)) and the UniProt/trembl database of the European Bioinformatics Institute (EBI) as the typical representatives of protein dataset of plants, and the protein sequences of cassava (JGI annotation v4.1), castor bean (Ricinus communis; The Institute of Genomic Research (TIGR)), poplar (Populus trichocarpa; Joint Genome Institute (JGI)), grape (Vitis vinifera; International Grape Genome Program (IGGP)), and Arabidopsis thaliana (TAIR) derived from the predicted coding sequences (CDS) and captured full-length cDNAs as the annotated dicot-plant protein dataset.

  Mapping to genome sequence of cassava, castor bean (R. communis), poplar (P. trichocarpa), grape (V. vinifera), and A. thaliana

In order to obtain the gene structure of the cassava transcripts, the cassava transcripts were mapped to the cassava, castor bean, poplar, grape, and Arabidopsis genome sequences. The exon-intron structures and associated genome annotation data have been displayed using the generic genome browser (Gbrowse).

  RIKEN ver.1 microarray

We developed a 60-mer oligonucleotide Agilent microarray representing 21,522 probes by using the transcript sequences (11,422 contigs and 18,214 singlets) in this database.

  DNA Polymorphism Discovery and Primer Design

Using the assembly of predicted transcribed sequences from cassava genome draft sequences and expressed sequence tags from GenBank, we discovered 10,546 single nucleotide polymorphisms (SNPs) and 756 insertions and deletions (InDels). To facilitate the development of molecular markers, we designed 9,383 PCR primer pairs to amplify the genomic region around each DNA polymorphism.


Genome-wide discovery and information resource development of DNA polymorphisms in cassava.
Sakurai T, Mochida K, Yoshida T, Akiyama K, Ishitani M, Seki M, Shinozaki K.
 PLoS One. 2013 Sep 11 PubMed Logo