Indexing might take some time but only has to be run once per fasta file. Make sure to reuse already computed indices if possible.
DICAST will check if $indexdir/$indexname exists. If there is no index it will be automatically built. If you want to rebuild the index anyway set
$recompute_index=true in scripts/mapping_config.sh.
If you want to use your own precomputed index file copy it to index/segemehl-index/ and make sure the index is complete and named appropriately and according to the parameters set in the config files.
We recommend including the name of the fasta file in the index name to avoid overwriting. Per default this is already the case and no parameter changes are needed.
These are the default parameters set in the src/segemehl/ENTRYPOINT.sh script. If you want to change it you can do this in the ENTRYPOINT script directly. Please refer to the segemehl manual.
Reference genome in fasta format.-d $fasta
Fastq filename of paired end read 1.-q *yourFastqFile1_*1.fastq
Fastq filename of paired end read 2.-q *yourFastqFile1_*2.fastq
Base name of the index folder and files.-i $indexdir/$indexname
—splits Use split reads alignment
The path to the mapped output file in sam format. The output will be separated into case and control folder based on the basefolder of the according fastq file.-o $outdir/$controlfolder/*yourFastqFile1_*segemehl.sam
Number of threads to be used during the computation-t $ncores