Report: PGS Catalog Calculator Pipeline

Pipeline command

nextflow run pgscatalog/pgsc_calc -profile docker --input samplesheet.csv --target_build GRCh38 --pgs_id PGS002785,PGS002786,PGS002789,PGS002787,PGS002788,PGS002790 --parallel --max_memory 30.GB --max_cpus 14 --min_overlap 0.4

Scoring file metadata

Additional documentation is available that explains some of the terms used this report in more detail

Scoring files

Variant matching

Parameters

## keep_multiallelic: false
## keep_ambiguous   : false
## min_overlap      : 0.4

Summary

Detailed results

Scores

⚠️ Warning: small sampleset size (n < 50)

plink2 uses allele frequency data to mean-impute the dosages of missing genotypes
Currently the pipeline disables mean-imputation in these small sample sets to make sure that the calculated PGS is as consistent with the genotype data as possible
With a small sample size, the resulting score sums may be inconsistent between samples
The average ([scorename]_AVG) may be more applicable as it calculates an average weighting over all genotypes present

In the future mean-imputation will be supported in small samplesets using ancestry-matched reference samplesets to ensure consistent calculation of score sums (e.g. 1000G Genomes).

6 scores for 1 samples processed

Score data

Score extract

Below is a summary of the aggregated scores, which might be useful for debugging.

## # A tibble: 1 × 8
##   sampleset  IID        PGS002790_hmPO…¹ PGS00…² PGS00…³ PGS00…⁴ PGS00…⁵ PGS00…⁶
##   <chr>      <chr>                 <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 NG1RDRPK1V NG1RDRPK1V            0.183   0.263   0.301   0.788   0.842   0.195
## # … with abbreviated variable names ¹PGS002790_hmPOS_GRCh38_SUM,
## #   ²PGS002785_hmPOS_GRCh38_SUM, ³PGS002786_hmPOS_GRCh38_SUM,
## #   ⁴PGS002788_hmPOS_GRCh38_SUM, ⁵PGS002787_hmPOS_GRCh38_SUM,
## #   ⁶PGS002789_hmPOS_GRCh38_SUM

See here for an explanation of plink2 column names

Density plot

The summary density plots show up to six scoring files

Get scores

All scores can be found in “aggregated_scores.txt.gz”, in the results folder output by the pipeline.

Citations

For scores from the PGS Catalog, please remember to cite the original publications from which they came (these are listed in the metadata table.)

PGS Catalog Calculator (in development). PGS Catalog Team. https://github.com/PGScatalog/pgsc_calc

Lambert et al. (2021) The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. 53:420–425 doi:10.1038/s41588-021-00783-5.