PROJECT PHASE I

 
iStock-832632442.jpg

PHASE 1 OF THE VGP WILL GENERATE NEAR ERROR-FREE REFERENCE GENOMES OF 260 SPECIES REPRESENTING ALL VERTEBRATE ORDERS WITH A DIVERGENCE TIME OF ~50 MILLION YEARS AGO (MYA) OR GREATER FROM THEIR MOST RECENT COMMON ORDINAL ANCESTOR, INCLUDING HUMAN AND SOME SPECIES ON THE BRINK OF EXTINCTION. WE WILL SEQUENCE THE HETEROGAMETIC SEX (WHEN IT EXISTS) SO THAT BOTH SEX CHROMOSOMES CAN BE ASSEMBLED FOR EACH SPECIES.

 
SpeciesAnimation.v4.gif
 

Once funding is secured for all 260 species in Phase 1, we will be able to generate ~12 genomes per week. It will take ~6-8 months to sequence and assemble Phase 1 genomes. Sample collection will occur in parallel, adding another ~4-6 months. Alignments and annotations will also occur simultaneously and at 10 genomes per week will add another ~6 months.

We expect to complete all 260 species plus the 4 invertebrate outgroups within 1.5 years from the start of a major source of funding. Biological analyses for publications will occur simultaneously, although some analyses can only occur after annotation and alignment of the 260 species, which will add another 12 months before submitting papers for publication.

 

Phase I Pipeline

 

For Phase 1, we are combining accurate long reads and improved assembly algorithms to generate an initial contig-level assembly. We then scaffold the contigs with long-range information to generate chromosome-level scaffolds. Version 1.0 of our pipeline featured PacBio CLR reads and scaffolding with 10X Genomics linked reads, Bionano optical maps, and Arima Hi-C sequencing.

With the advent of PacBio Hifi, Version 2.0 of our pipeline now utilizes PacBio HiFi reads for the initial assembly, before scaffolding with Bionano optical maps and Arima Hi-C reads. This improvement in the initial long read data has led to palpable improvements in assembly quality over CLR assemblies, and with further development of our algorithms, we believe that we will be able to achieve near error-free genome assemblies with no additional sequencing.

 
 

Current Project Status

Phase 1 Goal: 268 species

Please visit our Genome Ark GitHub for public access to genomes meeting our VGP metrics and for genomes in progress.

 *** as of Jan 2021 ***

All 268 designated species either have a sample identified but no funding or are in progress with an identified sample and funding

 
 

Expected Outcomes

AFTER ~ 2.5 YEARS FROM INITIAL FUNDING, WE EXPECT THE FOLLOWING OUTCOMES:

 
Phase1_CircleChart.v3.gif
white.jpg