We retrieved and cleaned cassava sequences from the EST archive of GenBank prior to classifying them by cassava variety, resulting in a total of 80,523 sequences from 17 cassava varieties or libraries (CAS36.01, CAS36.04, CM21772, CM523-7, MCol22, IAC 12.829, KU50/MTAI16, MBra685, MCol1522, MNga2, MPer183, Mirassol, SG107-35, ‘Sauti, Gomani, Mbundumali, TME 1, and Mkondezi’, ‘TMS30572 and CM2177-2’, and variety unknown). We appended the annotated transcript sequences from the cassava genome draft sequence (variety AM560-2) to the classified sequence set, and obtained a total of 114,674 sequences as the starting data for this study.
Using the alignment of predicted transcribed sequences from cassava genome draft sequences and expressed sequence tags from GenBank, we discovered 10,546 single nucleotide polymorphisms (SNPs) and 756 insertions and deletions (InDels). To facilitate the development of molecular markers, we designed 9,383 PCR primer pairs to amplify the genomic region around each DNA polymorphism.
Genome-wide discovery and information resource development of DNA polymorphisms in cassava.
Sakurai T, Mochida K, Yoshida T, Akiyama K, Ishitani M, Seki M, Shinozaki K.