Analysis Name | Hordeum vulgare MorexV3 Assembly & Annotation |
Sequencing technology | HiFi, Nanopore, Bionano, Illumina |
Assembly method | CANU |
Release Date | 2021-03-12 |
Mascher M, Wicker T, Jenkins J, Plott C, Lux T, Koh CS, Ens J, Gundlach H, Boston LB, Tulpová Z, Holden S, Hernández-Pinzón I, Scholz U, Mayer KFX, Spannagl M, Pozniak CJ, Sharpe AG, Šimková H, Moscou MJ, Grimwood J, Schmutz J, Stein N. Long-read sequence assembly: a technical evaluation in barley. Plant Cell. 2021 Jul 19;33(6):1888-1906. doi: 10.1093/plcell/koab077.
AbstractSequence assembly of large and repeat-rich plant genomes has been challenging, requiring substantial computational resources and often several complementary sequence assembly and genome mapping approaches. The recent development of fast and accurate long-read sequencing by circular consensus sequencing (CCS) on the PacBio platform may greatly increase the scope of plant pan-genome projects. Here, we compare current long-read sequencing platforms regarding their ability to rapidly generate contiguous sequence assemblies in pan-genome studies of barley (Hordeum vulgare). Most long-read assemblies are clearly superior to the current barley reference sequence based on short-reads. Assemblies derived from accurate long reads excel in most metrics, but the CCS approach was the most cost-effective strategy for assembling tens of barley genomes. A downsampling analysis indicated that 20-fold CCS coverage can yield very good sequence assemblies, while even five-fold CCS data may capture the complete sequence of most genes. We present an updated reference genome assembly for barley with near-complete representation of the repeat-rich intergenic space. Long-read assembly can underpin the construction of accurate and complete sequences of multiple genomes of a species to build pan-genome infrastructures in Triticeae crops and their wild relatives.
Assembly statistics
Before gap-filling | After gap-filling | |
Assembly size | 4.2 Gb | |
Number of scaffolds | 386 | |
Number of contigs | 588 | 439 |
Scaffold N50 | 118.9 Mb | |
Scaffold N90 | 21.9 Mb | |
Contig N50 | 31.9 Mb | 69.6 Mb |
Contig N90 | 7.2 Mb | 19.3 Mb |
Gap size | 3.37 Mb | 1.32 Mb |
Assembly level | Chromosome | Chromosome |
The Hordeum vulgare MorexV3 Assembly file is available in FASTA format.
Downloads
Chromosomes (FASTA file) | Hordeum_vulgare.MorexV3_pseudomolecules_assembly.dna.toplevel.fa.gz |
The Hordeum vulgare MorexV3 genome gene prediction files are available in GFF3 and FASTA format.
Downloads
Genes (GFF3 file) | Hordeum_vulgare.MorexV3_pseudomolecules_assembly.53.gff3.gz |
CDS sequences (FASTA file) | Hordeum_vulgare.MorexV3_pseudomolecules_assembly.cds.all.fa.gz |
Protein sequences (FASTA file) | Hordeum_vulgare.MorexV3_pseudomolecules_assembly.pep.all.fa.gz |
Functional annotation for the Hordeum vulgare MorexV3 is available for download below. The proteins were analyzed using InterProScan to assign InterPro domains(Pfam).
Downloads
Domain from InterProScan | Hordeum_vulgare.Pfam.tsv.gz |
Summary
Query | Chromosome | Size(bp) | Coordinates | tBLASTn Hit | tBLASTn %ID | Domain |
DUF247II-S | 1H | 516505932 | 83113352-83115031 | LpSDUF247-II_chromosome1 | 75 | DUF247 |
DUF247II-ZΨ | 2H | 665585731 | 636298586-636299215 | LrDUF247II-Z | 58 | DUF247 |
HPS10-Z | 2H | 665585731 | 622994754-622994892,622995019-622995107 | Amyosuroides | 37 | - |
Nucleotide
Protein