Running the Pipeline

This tutorial uses the continuum_imaging pipeline to create a clean image of 3C123. Running other pipelines follows a similar process. The MSv2 dataset was converted to MSv4 using the same xradio package installed with this project. A HPC cluster using SLURM scheduling was used for this tutorial.

1. The first step is to create the YAML config file called 3C123_continuum_imaging_config.yaml. All config options are explained here. A single dataset key is selected to process:

ps_dir: "/path/to/3C123.ps"
dataset_key: "3C123.ms.zarr_ddi_32_intent_OBSERVE_TARGET#UNSPECIFIED_field_id_0"
output_dir: "/path/to/output/"
swiftly_config: "4k[1]-n2k-256"
wtower_size: 100
niter: 3
fracthresh: 0.9
pixel_scale: "0.03asec"

2. Next, create a SLURM job script called 3C123_continuum_imaging.sh to run the continuum_imaging pipeline using the config file. The following script uses a conda environment called self-cal on CSD3. Edit this script for use on another cluster (e.g. AWS). Spack can also be used as an alternative package and environment manager (see instructions for installing conda and spack here and adjust the script accordingly):

#!/bin/bash
#SBATCH --partition=icelake-himem
#SBATCH --job-name=continuum_imaging_3C123
#SBATCH --output=continuum_imaging_3C123_output.log
#SBATCH --error=continuum_imaging_3C123_error.log
#SBATCH --nodes=1
#SBATCH --cpus-per-task=38
#SBATCH --time=0:30:00

# Activate conda env
CONDA_PATH="$HOME/miniconda3"
source $CONDA_PATH/etc/profile.d/conda.sh
conda activate self-cal

ska_sdp_distributed_self_cal_prototype --config-file=3C123_continuum_imaging_config.yml --pipeline-name="continuum_imaging"

conda deactivate

The SLURM job script can then be submitted:

sbatch 3C123_continuum_imaging.sh

Once the job is complete, the dirty image, psf, and clean image are output as well as the residual and model for each major loop.