Usage

User can choose among 4 ways to simulate template reads. - use a real count matrix - estimated the parameter from a real count matrix to simulate synthetic count matrix - specified by his/her own the input parameter - a combination of the above options

We use SPARSIM tools to simulate count matrix. for more information a bout synthetic count matrix, please read SPARSIM documentaion.

EXAMPLES

Sample data

A demonstration dataset to initiate this workflow is accessible on zenodo DOI : 10.5281/zenodo.12731408. This dataset is a subsample from a Nanopore run of the 10X 5k human pbmcs.

The human GRCh38 reference transcriptome, gtf annotation and fasta referance genome can be downloaded from Ensembl.

BASIC WORKFLOW
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
                      --features gene_name \
                      --gtf dataset/genes.gtf
WITH PCR AMPLIFICTION
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
                      --features gene_name \
                      --gtf dataset/GRCh38-2020-A-genes.gtf \
                      --pcr_cycles 2 \
                      --pcr_dup_rate 0.7 \
                      --pcr_error_rate 0.00003
WITH SIMULATED CELL TYPE COUNTS
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
                      --features gene_name \
                      --gtf dataset/GRCh38-2020-A-genes.gtf \
                      --sim_celltypes true \
                      --cell_types_annotation dataset/sub_pbmc_cell_type.csv
WITH PERSONALIZED ERROR MODEL
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
                     --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
                     --features gene_name \
                     --gtf dataset/GRCh38-2020-A-genes.gtf \
                     --build_model true \
                     --fastq_model dataset/sub_pbmc_reads.fq \
                     --ref_genome dataset/GRCh38-2020-A-genome.fa 
COMPLETE WORKFLOW
 nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
                      --transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
                      --features gene_name \
                      --gtf dataset/GRCh38-2020-A-genes.gtf \
                      --sim_celltypes true \
                      --cell_types_annotation dataset/sub_pbmc_cell_type.csv
                      --build_model true \
                      --fastq_model dataset/sub_pbmc_reads.fq \
                      --ref_genome dataset/GRCh38-2020-A-genome.fa 
                      --pcr_cycles 2 \
                      --pcr_dup_rate 0.7 \
                      --pcr_error_rate 0.00003

Results

After execution, results will be available in the specified --outdir. This includes simulated Nanopore reads .fastq, along with log files and QC report.

Cleaning Up

To clean up temporary files generated by Nextflow:

nextflow clean -f