Arabidopsis arenosa UiO_Aaren_v1.0 Assembly & Annotation

Overview

Analysis Name Arabidopsis arenosa UiO_Aaren_v1.0 Assembly & Annotation
Sequencing technology PacBio Sequel; Illumina HiSeq
Assembly method Canu v. 2.1
Release Date 2022-11-14
Reference Publication(s)

Bramsiepe J, Krabberød AK, Bjerkan KN, Alling RM, Johannessen IM, Hornslien KS, Miller JR, Brysting AK, Grini PE. Structural evidence for MADS-box type I family expansion seen in new assemblies of Arabidopsis arenosa and A. lyrata. Plant J. 2023 Nov;116(3):942-961. doi: 10.1111/tpj.16401.

SUMMARY

Arabidopsis thaliana diverged from A. arenosa and A. lyrata at least 6 million years ago. The three species differ by genome-wide polymorphisms and morphological traits. The species are to a high degree reproductively isolated, but hybridization barriers are incomplete. A special type of hybridization barrier is based on the triploid endosperm of the seed, where embryo lethality is caused by endosperm failure to support the developing embryo. The MADS-box type I family of transcription factors is specifically expressed in the endosperm and has been proposed to play a role in endosperm-based hybridization barriers. The gene family is well known for its high evolutionary duplication rate, as well as being regulated by genomic imprinting. Here we address MADS-box type I gene family evolution and the role of type I genes in the context of hybridization. Using two de-novo assembled and annotated chromosome-level genomes of A. arenosa and A. lyrata ssp. petraea we analyzed the MADS-box type I gene family in Arabidopsis to predict orthologs, copy number, and structural genomic variation related to the type I loci. Our findings were compared to gene expression profiles sampled before and after the transition to endosperm cellularization in order to investigate the involvement of MADS-box type I loci in endosperm-based hybridization barriers. We observed substantial differences in type-I expression in the endosperm of A. arenosa and A. lyrata ssp. petraea, suggesting a genetic cause for the endosperm-based hybridization barrier between A. arenosa and A. lyrata ssp. petraea.

Assembly statistics

Genome size153 Mb
Total ungapped length152.9 Mb
Number of chromosomes8
Number of scaffolds264
Scaffold N5019.2 Mb
Scaffold L504
Number of contigs403
Contig N506.5 Mb
Contig L508
GC percent36
Genome coverage78.0x
Assembly levelChromosome

Assembly

The Arabidopsis arenosa UiO_Aaren_v1.0 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) Arabidopsis_arenosa_genome.softmasked.fna.gz

Gene Predictions

The Arabidopsis arenosa UiO_Aaren_v1.0 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) A.arenosa.gff.gz
CDS sequences (FASTA file) A_arenosa.cds.fa.gz
Protein sequences (FASTA file) A_arenosa.pep.fa.gz

Functional Analysis

Functional annotation for the Arabidopsis arenosa UiO_Aaren_v1.0 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Arabidopsis_arenosa_UiO_Aaren_v1.0.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatesBLASTp HitBLASTp %ID
SRKscaffold_7 214348808227742-8229044,8229931-8230065,8230163-8230344,
8230422-8230632,8230720-8230957,8231047-8231197,
8231284-8231613
spP0DH86SRK_ARATH65
SCRscaffold_7 2143488011026945-11027014,11026539-11026768XP_006414465.178

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences