Hordeum vulgare MorexV3 Assembly & Annotation

Overview

Analysis Name Hordeum vulgare MorexV3 Assembly & Annotation
Sequencing technology HiFi, Nanopore, Bionano, Illumina
Assembly method CANU
Release Date 2021-03-12
Reference Publication(s)

Mascher M, Wicker T, Jenkins J, Plott C, Lux T, Koh CS, Ens J, Gundlach H, Boston LB, Tulpová Z, Holden S, Hernández-Pinzón I, Scholz U, Mayer KFX, Spannagl M, Pozniak CJ, Sharpe AG, Šimková H, Moscou MJ, Grimwood J, Schmutz J, Stein N. Long-read sequence assembly: a technical evaluation in barley. Plant Cell. 2021 Jul 19;33(6):1888-1906. doi: 10.1093/plcell/koab077.

Abstract

Sequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.

Assembly statistics

Before gap-fillingAfter gap-filling
Assembly size4.2 Gb
Number of scaffolds386
Number of contigs588439
Scaffold N50118.9 Mb
Scaffold N9021.9 Mb
Contig N5031.9 Mb69.6 Mb
Contig N907.2 Mb19.3 Mb
Gap size3.37 Mb1.32 Mb
Assembly levelChromosomeChromosome

Assembly

The Hordeum vulgare MorexV3 Assembly file is available in FASTA format.

Downloads

Chromosomes (FASTA file) Hordeum_vulgare.MorexV3_pseudomolecules_assembly.dna.toplevel.fa.gz

Gene Predictions

The Hordeum vulgare MorexV3 genome gene prediction files are available in GFF3 and FASTA format.

Downloads

Genes (GFF3 file) Hordeum_vulgare.MorexV3_pseudomolecules_assembly.53.gff3.gz
CDS sequences (FASTA file) Hordeum_vulgare.MorexV3_pseudomolecules_assembly.cds.all.fa.gz
Protein sequences (FASTA file) Hordeum_vulgare.MorexV3_pseudomolecules_assembly.pep.all.fa.gz

Functional Analysis

Functional annotation for the Hordeum vulgare MorexV3 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).

Downloads

Domain from InterProScan Hordeum_vulgare.Pfam.tsv.gz

S genes

Summary

QueryChromosomeSize(bp)CoordinatestBLASTn HittBLASTn %IDDomain
DUF247II-S1H51650593283113352-83115031LpSDUF247-II_chromosome175DUF247
DUF247II-ZΨ2H665585731636298586-636299215LrDUF247II-Z58DUF247
HPS10-Z2H665585731622994754-622994892,
622995019-622995107
Amyosuroides37-

Nucleotide

Protein

© 2023 National Genomics Data Center, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences