Usage
User can choose among 4 ways to simulate template reads. - use a real count matrix - estimated the parameter from a real count matrix to simulate synthetic count matrix - specified by his/her own the input parameter - a combination of the above options
We use SPARSIM tools to simulate count matrix. for more information a bout synthetic count matrix, please read SPARSIM documentaion.
EXAMPLES
Sample data
A demonstration dataset to initiate this workflow is accessible on zenodo DOI : 10.5281/zenodo.12731408. This dataset is a subsample from a Nanopore run of the 10X 5k human pbmcs.
The human GRCh38 reference transcriptome, gtf annotation and fasta referance genome can be downloaded from Ensembl.
BASIC WORKFLOW
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/genes.gtf
WITH PCR AMPLIFICTION
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--pcr_cycles 2 \
--pcr_dup_rate 0.7 \
--pcr_error_rate 0.00003
WITH SIMULATED CELL TYPE COUNTS
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--sim_celltypes true \
--cell_types_annotation dataset/sub_pbmc_cell_type.csv
WITH PERSONALIZED ERROR MODEL
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--build_model true \
--fastq_model dataset/sub_pbmc_reads.fq \
--ref_genome dataset/GRCh38-2020-A-genome.fa
COMPLETE WORKFLOW
nextflow run main.nf --matrix dataset/sub_pbmc_matrice.csv \
--transcriptome dataset/Homo_sapiens.GRCh38.cdna.all.fa \
--features gene_name \
--gtf dataset/GRCh38-2020-A-genes.gtf \
--sim_celltypes true \
--cell_types_annotation dataset/sub_pbmc_cell_type.csv
--build_model true \
--fastq_model dataset/sub_pbmc_reads.fq \
--ref_genome dataset/GRCh38-2020-A-genome.fa
--pcr_cycles 2 \
--pcr_dup_rate 0.7 \
--pcr_error_rate 0.00003
Results
After execution, results will be available in the specified --outdir
. This includes simulated Nanopore reads .fastq
, along with log files and QC report.
Cleaning Up
To clean up temporary files generated by Nextflow:
nextflow clean -f