
THE AIM OF VGP PHASE 1 IS TO GENERATE NEAR ERROR-FREE REFERENCE GENOMES OF 271 SPECIES REPRESENTING ALL VERTEBRATE ORDERS WITH A DIVERGENCE TIME OF ~50 MILLION YEARS AGO (MYA) OR GREATER FROM THEIR MOST RECENT COMMON ORDINAL ANCESTOR, INCLUDING HUMAN AND SOME SPECIES ON THE BRINK OF EXTINCTION. WE SEQUENCE THE HETEROGAMETIC SEX (WHEN IT EXISTS) SO THAT BOTH SEX CHROMOSOMES CAN BE ASSEMBLED FOR EACH SPECIES. WE HAVE CURRENTLY COMPLETED 85% OF THE PLANNED SPECIES’S GENOMES. FOR THE NEXT PHASES, OUR GOAL IS TO GENERATE ~12 GENOMES PER WEEK.
Biological analyses for publications are currently ongoing, including a reference-free alignment of all species. Analyses are led by dedicated VGP working groups, including the VGP Assembly working group, the VGP comparative genomics working group, and the VGP repeat working group. The VGP Assembly working group is planning a major genome release publication by the end of 2025.
All reference genomes were assembled between 2018-2025, in a period of unparallelled evolution in genome sequencing and assembly methods. Multiple scientific initiatives have also contributed, fostering international partnerships. For assembly, we combine accurate long reads and improved assembly algorithms to generate an initial contig-level assembly. We then scaffold the contigs with long-range information to generate chromosome-level scaffolds.
Phase I Pipeline
Version 1.0 of our pipeline featured PacBio CLR reads and scaffolding with 10X Genomics linked reads, Bionano optical maps, and Arima Hi-C sequencing. With the advent of PacBio Hifi, Version 2.0 of our pipeline now utilizes PacBio HiFi reads for the initial assembly, before scaffolding with Bionano optical maps and Arima Hi-C reads.
This improvement in the initial long read data has led to palpable improvements in assembly quality over CLR assemblies. We are also generating Telomere-to-Telomere (T2T) reference genomes for a selected subset of organisms.
Current Project Status
Phase 1 Goal: 271 species
Please visit our Genome Ark GitHub for public access to genomes meeting our VGP metrics and for genomes in progress.
In the face of the current sixth mass extinction, some species are extremely challenging to collect. The goal was to consider Phase I completed when the 80% Ordinal species coverage was reached. All remaining species either have no sample or no funding identified.
Expected Outcomes
for VGP Phase I data freeze:
:
