Quickstart
This Quickstart guide aims to provide a brief overview of
how you can get started with EIR-auto-GP, where we will
be training deep learning models to predict coronary artery
disease (CAD) from genomic data. See the 01 – Genomic Prediction for Coronary Artery Disease for
a more detailed tutorial.
A - Setup
Download the processed PennCATH data and set up your folders.
$ mkdir -p eir_auto_gp_tutorials/01_basic_tutorial/data
$ mkdir -p eir_auto_gp_tutorials/tutorial_runs/01_basic_tutorial
The downloaded data should have the following structure:
eir_auto_gp_tutorials/01_basic_tutorial/data
├── penncath
│ ├── penncath.bed
│ ├── penncath.bim
│ ├── penncath.csv
│ └── penncath.fam
└── penncath.zip
The label file ID column must be called “ID”. A sample label file would look like this:
"ID","CAD","sex","age","tg","hdl","ldl"
"10002",1,1,60,NA,NA,NA
"10004",1,2,50,55,23,75
"10005",1,1,55,105,37,69
"10007",1,1,52,314,54,108
"10008",1,1,58,161,40,94
"10009",1,1,59,171,46,92
"10010",1,1,54,147,69,109
"10011",1,2,66,124,47,84
"10012",1,1,58,60,114,67
B - Training
To process data and train models, run the following command:
eirautogp \
--genotype_data_path eir_auto_gp_tutorials/01_basic_tutorial/data/penncath \
--label_file_path eir_auto_gp_tutorials/01_basic_tutorial/data/penncath/penncath.csv \
--global_output_folder eir_auto_gp_tutorials/tutorial_runs/01_basic_tutorial_1 \
--output_cat_columns CAD \
--input_con_columns tg hdl ldl age \
--input_cat_columns sex \
--folds 5 \
--feature_selection gwas->dl \
--n_dl_feature_selection_setup_folds 2 \
--do_test
The command above trains a model to predict CAD risk, using genotype and clinical data as inputs to our models. To adjust settings such as the number of folds, feature selection strategy, or the GWAS p-value threshold, refer to the 01 – Genomic Prediction for Coronary Artery Disease
After running the command,
the output will be written to a new folder in the
eir_auto_gp_tutorials directory called tutorial_runs/01_basic_tutorial.
You can go through these in the order of
data -> modelling -> feature_selection -> analysis
for a detailed overview of the outputs.
That’s it for the quickstart, thank you for trying out EIR-auto-GP!
Hopefully you find it useful and informative.