Report: PGS Catalog Calculator Pipeline

Pipeline command

nextflow run pgscatalog/pgsc_calc -profile docker --input samplesheet.csv --target_build GRCh38 --pgs_id PGS000327,PGS000135,PGS002746,PGS000907,PGS003333 --parallel --max_memory 30.GB --max_cpus 14 --min_overlap 0.3

Scoring file metadata

Additional documentation is available that explains some of the terms used this report in more detail

Scoring files

Variant matching

Parameters

## keep_multiallelic: false
## keep_ambiguous   : false
## min_overlap      : 0.3

Summary

Detailed results

Scores

⚠️ Warning: small sampleset size (n < 50)

plink2 uses allele frequency data to mean-impute the dosages of missing genotypes
Currently the pipeline disables mean-imputation in these small sample sets to make sure that the calculated PGS is as consistent with the genotype data as possible
With a small sample size, the resulting score sums may be inconsistent between samples
The average ([scorename]_AVG) may be more applicable as it calculates an average weighting over all genotypes present

In the future mean-imputation will be supported in small samplesets using ancestry-matched reference samplesets to ensure consistent calculation of score sums (e.g. 1000G Genomes).

5 scores for 1 samples processed

Score data

Score extract

Below is a summary of the aggregated scores, which might be useful for debugging.

## # A tibble: 1 × 7
##   sampleset  IID        PGS000135_hmPOS_GRCh38…¹ PGS00…² PGS00…³ PGS00…⁴ PGS00…⁵
##   <chr>      <chr>                         <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 NG1RDRPK1V NG1RDRPK1V                     11.5    5.28   0.418    35.1  -0.126
## # … with abbreviated variable names ¹PGS000135_hmPOS_GRCh38_SUM,
## #   ²PGS000327_hmPOS_GRCh38_SUM, ³PGS000907_hmPOS_GRCh38_SUM,
## #   ⁴PGS002746_hmPOS_GRCh38_SUM, ⁵PGS003333_hmPOS_GRCh38_SUM

See here for an explanation of plink2 column names

Density plot

The summary density plots show up to six scoring files

Get scores

All scores can be found in “aggregated_scores.txt.gz”, in the results folder output by the pipeline.

Citations

For scores from the PGS Catalog, please remember to cite the original publications from which they came (these are listed in the metadata table.)

PGS Catalog Calculator (in development). PGS Catalog Team. https://github.com/PGScatalog/pgsc_calc

Lambert et al. (2021) The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. 53:420–425 doi:10.1038/s41588-021-00783-5.