Resources

Download UHGV data

Download UHGV data including genome and protein sequences, predicted protein structures, annotations, host predictions, metagenomic profiles, metadata, and microdiversity summaries.

Showing 53 downloads

Metadata

metadata/
Genome, vOTU, host, and sample metadata tables. 6 files
uhgv_metadata.tsv Detailed information on each of the 873,994 UHGV genomes
207 MB
uhgv_extended_metadata.tsv Detailed information on a set of 212,778 additional human gut viral genomes not included in the main catalog
65 MB
votus_metadata.tsv Detailed information on each of the 168,536 vOTUs
51 MB
votus_metadata_extra.tsv Additional information on each vOTU
128 MB
host_metadata.tsv Information for prokaryotic genomes (taxonomy, completeness, contamination, N50)
131 MB
source_biosample_metadata.tsv Information for the samples from which virus genomes were obtained
5.3 MB

Genome and protein catalogs

genome_catalogs/
Genome and protein FASTA catalogs of all UHGV genomes, vOTU representatives, and host genomes. 16 files
uhgv_full.fna.gz Genomic sequences of all genomes ≥10kb or ≥50% completeness
6.5 GB
uhgv_full.faa.gz Protein sequences of all genomes ≥10kb or ≥50% completeness
5.6 GB
uhgv_mq_plus.fna.gz Genomic sequences of all genomes with ≥50% completeness
5.1 GB
uhgv_mq_plus.faa.gz Protein sequences of all genomes with ≥50% completeness
3.8 GB
uhgv_hq_plus.fna.gz Genomic sequences of all genomes with ≥90% completeness
3.0 GB
uhgv_hq_plus.faa.gz Protein sequences of all genomes with ≥90% completeness
2.2 GB
uhgv_extended.fna.gz Genomic sequences of additional human gut viral genomes from VIRE that are not included in the main catalog
1.9 GB
uhgv_extended.faa.gz Protein sequences of additional human gut viral genomes from VIRE that are not included in the main catalog
1.1 GB
votus_full.fna.gz Genomic sequences of vOTU representatives ≥10kb or ≥50% completeness
1.3 GB
votus_full.faa.gz Protein sequences of vOTU representatives ≥10kb or ≥50% completeness
1.0 GB
votus_mq_plus.fna.gz Genomic sequences of vOTU representatives with ≥50% completeness
MQ+ 1.0 GB
votus_mq_plus.faa.gz Protein sequences of vOTU representatives with ≥50% completeness
MQ+ 796 MB
votus_hq_plus.fna.gz Genomic sequences of vOTU representatives with ≥90% completeness
MQ+ 701 MB
votus_hq_plus.faa.gz Protein sequences of vOTU representatives with ≥90% completeness
MQ+ 522 MB
host_genomes_full.tar.zst Genomic sequences of gut prokaryotes
53 GB
host_genomes_otus.tar.zst Genomic sequences of gut prokaryote OTU representatives
3.5 GB

Phylogeny

phylogeny/
Reference tree files for viral phylogenetic analysis. 1 files
caudoviricetes_tree.nwk.gz Phylogenetic tree of Caudoviricetes genomes
3.4 MB

Protein clusters

protein_clusters/
Cluster membership, consensus taxonomy, and multiple sequence alignments. 3 files
cluster_membership.tsv.gz Cluster membership of all UHGV proteins
217 MB
cluster_taxonomy.tsv.gz Consensus taxonomy (both UHGV and ICTV) for each protein cluster
60 MB
MSAs.tar.gz Multiple sequence alignments of protein clusters with ≥15 members
1.3 GB

Structures

structures/
Predicted protein structures and structure-domain annotations. 3 files
PDB.tar.gz PDB files of UHGV predicted protein structures
1.5 GB
PDB_references.tar.gz PDB files of predicted protein structures of COG, HAMAP, NCBIfam, and Pfam entries
1.4 GB
domains.tsv Domain segmentation of UHGV protein structures
2.6 MB

Annotations

annotations/
Functional annotations and mobile-element features for MQ+ vOTU representatives. 4 files
protein_annotations.tsv.gz Functional annotations for proteins encoded by MQ+ vOTU representatives
MQ+ 329 MB
structure_annotations.tsv.gz Structure-based annotations for protein clusters with predicted structures
MQ+ 1.8 MB
tRNAs.tsv.gz tRNAs predicted in MQ+ vOTU representatives
MQ+ 1.4 MB
DGRs.tsv.gz Diversity-generating retroelements predicted in MQ+ vOTU representatives
MQ+ 143 KB

vOTU representatives

votu_reps/
Per-genome files for MQ+ vOTU representative folders. 6 files
votu_reps_list.txt List of the paths to each MQ+ vOTU representative folder
MQ+ 2.2 MB
UHGV-*/UHGV-*/ Example path: UHGV-081/UHGV-0814053/
genome_id.fna DNA sequence FASTA file of the genome assembly of the vOTU representative
MQ+
genome_id.faa Protein sequence FASTA file of the vOTU representative
MQ+
genome_id.gff Genome GFF file with various sequence annotations
MQ+
genome_id_emapper.tsv eggNOG-mapper annotations for proteins
MQ+
genome_id_annotations.tsv Protein annotations from diverse reference databases (Pfam, UniRef90, NCBIfam, etc.)
MQ+
* : path wildcard genome_id : genome identifier

Host prediction

host_predictions/
Host genome metadata and host predictions for MQ+ vOTU representatives. 5 files
crispr_spacers.fna.gz 5,271,034 CRISPR spacers extracted from UHGG, GenBank, and Hadza prokaryotic genomes
MQ+ 69 MB
host_genomes_info.tsv GTDB r207 taxonomy for genomes from the UHGG, NCBI, and Hadza genomes
MQ+ 128 MB
host_assignment_crispr.tsv Detailed information for host prediction with CRISPR spacers
MQ+ 311 MB
host_assignment_kmers.tsv Detailed information for host prediction with PHIST k-mer matching
MQ+ 372 MB
host_assignment_crispr_extended.tsv Detailed information for host prediction with CRISPR spacers for additional vOTUs not included in the main catalog
MQ+ 207 MB

Read mapping

read_mapping/
Relative abundance, coverage, and sample metadata from metagenome and virome read mapping. 7 files
relative_abundance.tsv Relative abundances of viruses and bacteria across bulk genomes and viromes
MQ+ 50 MB
metagenomes_coverm.tsv.gz CoverM mapping statistics for viruses and bacteria across bulk metagenomes
MQ+ 2.7 GB
viromes_coverm.tsv.gz CoverM mapping statistics for viruses and bacteria across viral-enriched metagenomes
MQ+ 125 MB
sample_alpha_diversity.tsv Alpha diversity metrics computed for the prokaryotic and viral components of each sample
MQ+ 535 KB
sample_metadata.tsv Human sample metadata (study, country, lifestyle, age, gender, BMI, etc.)
MQ+ 584 KB
fastq_summary.tsv Information on sequencing reads (SRA accession, read count, ViromeQC enrichment, etc.)
MQ+ 382 KB
study_metadata.tsv Information on individual studies
MQ+ 16 KB

Microdiversity

microdiversity/
Variant and codon-level diversity summaries from read mapping. 2 files
SNVs.tsv.zst Single nucleotide variants identified through read mapping
MQ+ 118 MB
codon_pN_pS.tsv.zst Polymorphic codons and their synonymous/nonsynonymous substitution potentials (pS and pN)
MQ+ 121 MB