Large number of repetitive elements in the draft genome assembly of Dipteryx alata (Fabaceae)
Dipteryx alata (Fabaceae), locally known as Baru, is a non-model, native tree species endemic to the Brazilian Savanna (Cerrado), with economic potential due to its use as food, medicine, animal forage, lumber, and in recovery of degraded areas and landscaping. Although D. alata is recognized as an important Brazilian resource, currently there is no genomic information for this species. We generated 22 Gb raw reads from the genomes of D. alata trees using the Illumina MiSeq platform. These were assembled in 275,707 nuclear genomic sequences (N50 = 1598 bp) with a total of 355 Mb, which corresponds to 44% of the whole genome. We detected 21,981 microsatellite regions, of which 49.3% were dinucleotides and 42.7% trinucleotides. We found 421,701 transposable elements (TEs) in 39.29% of the sequences. Long terminal repeat retrotransposons were the most abundant TEs. This is one of the first genomic scale studies for a native Cerrado species. The results can be used for the development of molecular markers for studies on evolution, population genetics and conservation of D. alata.