Cleistogenes songorica v1.1 Assembly & Annotation

Overview

Analysis Name Cleistogenes songorica v1.1 Assembly & Annotation
Sequencing technology PacBio
Assembly method GS De Novo Assembler v1.01
Release Date 2020-11-26
Reference Publication(s)

Zhang J, Wu F, Yan Q, John UP, Cao M, Xu P, Zhang Z, Ma T, Zong X, Li J, Liu R, Zhang Y, Zhao Y, Kanzana G, Lv Y, Nan Z, Spangenberg G, Wang Y. The genome of Cleistogenes songorica provides a blueprint for functional dissection of dimorphic flower differentiation and drought adaptability. Plant Biotechnol J. 2021 Mar;19(3):532-547. doi: 10.1111/pbi.13483.

Summary

Cleistogenes songorica (2n = 4x = 40) is a desert grass with a unique dimorphic flowering mechanism and an ability to survive extreme drought. Little is known about the genetics underlying drought tolerance and its reproductive adaptability. Here, we sequenced and assembled a high-quality chromosome-level C. songorica genome (contig N50 = 21.28 Mb). Complete assemblies of all telomeres, and of ten chromosomes were derived. C. songorica underwent a recent tetraploidization (~19 million years ago) and four major chromosomal rearrangements. Expanded genes were significantly enriched in fatty acid elongation, phenylpropanoid biosynthesis, starch and sucrose metabolism, and circadian rhythm pathways. By comparative transcriptomic analysis we found that conserved drought tolerance related genes were expanded. Transcription of CsMYB genes was associated with differential development of chasmogamous and cleistogamous flowers, as well as drought tolerance. Furthermore, we found that regulation modules encompassing miRNA, transcription factors and target genes are involved in dimorphic flower development, validated by overexpression of CsAP2_9 and its targeted miR172 in rice. Our findings enable further understanding of the mechanisms of drought tolerance and flowering in C. songorica, and provide new insights into the adaptability of native grass species in evolution, along with potential resources for trait improvement in agronomically important species.

Assembly statistics

Genome size (bp)540,115,686
GC content45.01%
Genome sequence No.103
Maximum genome sequence length (bp)38,883,845
Minimum genome sequence length (bp)44,100
Average genome sequence length (bp)5,243,842
Genome sequence N50 (bp)28,480,272
Genome sequence N90 (bp)19,463,092
Assembly level Chromosome

Assembly

The Cleistogenes songorica v1.1 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) GWHANUQ00000000.genome.fasta.gz

Gene Predictions

The Cleistogenes songorica v1.1 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) GWHANUQ00000000.gff.gz
CDS sequences (FASTA file) GWHANUQ00000000.RNA.fasta.gz
Protein sequences (FASTA file) GWHANUQ00000000.Protein.faa.gz

Functional Analysis

Functional annotation for the Cleistogenes songorica v1.1 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Cleistogenes_songorica.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatestBLASTn HittBLASTn %IDDomain
DUF247II-Z1ΨGWHANUQ00000003260158582135864-2135947Mlutarioriparius76DUF247
DUF247II-Z2ΨGWHANUQ00000017315294021899121-1899405Ecrus-galli80DUF247
HPS10-Z1GWHANUQ00000003260158582138346-2138478,
2138623-2138732
LpsZ_chromosome226-
HPS10-Z2GWHANUQ00000017315294021891555-1891652,
1891795-1891927
AerianthaHPS10-Z51-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences