Quickstart

This Quickstart guide aims to provide a brief overview of how you can get started with EIR-auto-GP, where we will be training deep learning models to predict coronary artery disease (CAD) from genomic data. See the 01 – Genomic Prediction for Coronary Artery Disease for a more detailed tutorial.

A - Setup

Download the processed PennCATH data and set up your folders.

$ mkdir -p eir_auto_gp_tutorials/01_basic_tutorial/data
$ mkdir -p eir_auto_gp_tutorials/tutorial_runs/01_basic_tutorial

The downloaded data should have the following structure:

eir_auto_gp_tutorials/01_basic_tutorial/data
├── penncath
│   ├── penncath.bed
│   ├── penncath.bim
│   ├── penncath.csv
│   └── penncath.fam
└── penncath.zip

The label file ID column must be called “ID”. A sample label file would look like this:

"ID","CAD","sex","age","tg","hdl","ldl"
"10002",1,1,60,NA,NA,NA
"10004",1,2,50,55,23,75
"10005",1,1,55,105,37,69
"10007",1,1,52,314,54,108
"10008",1,1,58,161,40,94
"10009",1,1,59,171,46,92
"10010",1,1,54,147,69,109
"10011",1,2,66,124,47,84
"10012",1,1,58,60,114,67

B - Training

To process data and train models, run the following command:

eirautogp \
--genotype_data_path eir_auto_gp_tutorials/01_basic_tutorial/data/penncath \
--label_file_path eir_auto_gp_tutorials/01_basic_tutorial/data/penncath/penncath.csv \
--global_output_folder eir_auto_gp_tutorials/tutorial_runs/01_basic_tutorial_1 \
--output_cat_columns CAD \
--input_con_columns tg hdl ldl age \
--input_cat_columns sex \
--folds 5 \
--feature_selection gwas->dl \
--n_dl_feature_selection_setup_folds 2 \
--do_test

The command above trains a model to predict CAD risk, using genotype and clinical data as inputs to our models. To adjust settings such as the number of folds, feature selection strategy, or the GWAS p-value threshold, refer to the 01 – Genomic Prediction for Coronary Artery Disease

After running the command, the output will be written to a new folder in the eir_auto_gp_tutorials directory called tutorial_runs/01_basic_tutorial. You can go through these in the order of data -> modelling -> feature_selection -> analysis for a detailed overview of the outputs.

That’s it for the quickstart, thank you for trying out EIR-auto-GP! Hopefully you find it useful and informative.