Although our mapping scripts work well regardless of capitalization of sequence headers, we found that it may affect presentation of visualization tools. For example, reads mapped to "chrm" may not been shown in some visualization tools simply because its corresponding chromosome inside was named "ChrM". So time would be saved if some check and/or modifications could have been done on (1) reference sequence headers, (2) chromosome names in GFF3 or canonical GFF files, and (3) the visualization tool you use.
In some circumstance, especially when running multiple mapping tasks under a cluster environment like the PBS, we need to build indexes at first to avoid race condition on index building, which may cause some error when accessing reference sequences. This script simply builds indexes using Bio::DB::Fasta and faidx of SAMtools.
The faidx program of SAMtools may have problem when handling a FASTA file with sequence lines of inconsistent lengths. This script fixes a FASTA file so that the faidx program shall index the fixed version without any problem.
All our mapping scripts read three fixed parameters from the command line: (1) target, (2) query, and (3) output. The target is a string indicating which genome to be mapped but not a file nor a path to index prefix. Before executing Bowtie2 or BLAT, the target string will be translated into an appropriate parameter so that users can always use the same target string for all our mapping scripts.
To do this, a mapping table containing the translation of the target and actual target file (or index prefix) should be maintained and indicated by the environment variable TargetTableFile. Below is an example mapping table, in which <TAB> means the tab character.
<BLAT parameters>: parameters to be passed to BLAT
This script automatically adds -out=pslx for BLAT. DO NOT add any other format options for BLAT.
Options -q=type -t=type were suggested for BLAT. For example, -q=rna -t=dna would be good for RNAseq, -q=dna -t=dna would be good for DNAseq, and -q=dnax -t=dnax would be good for translation-level mapping (but time-consuming).
<Bowtie2 parameters>: parameters to be passed to Bowtie2
This script automatically adds -f --quiet for Bowtie2, so the query file must be in FASTA format.
The filtering procedure exactly processes what outputted by Bowtie2. So, for example, -ID 0.9 -SamTmp tmp.sam would let all reads without alignment of at least 0.9 identity recorded be in tmp.sam, including reads without any alignment. And -ID 0.9 -SamTmp tmp.sam plus --no-unal for Bowtie2 would let only aligned reads without alignment at least 0.9 identity recorded be in tmp.sam.
This script wraps MappingBowtie.pl and MappingBlat.pl. All query reads will be mapped by Bowtie through MappingBowtie.pl, and rest reads (reads not passing the identity filter) will be mapped by BLAT through MappingBlat.pl.
This script wraps MappingBowtie.pl and MappingBlat.pl. All query reads will be mapped by Bowtie through MappingBowtie.pl, and rest reads (reads not passing the identity filter) will be processed by a two-pass application of MappingBlat.pl.
<MappingBowtie.pl options>: options to be passed to MappingBowtie.pl, including those to be passed to Bowtie2
<MappingBlat.pl 1st-pass options>: options to be passed to the first pass of MappingBlat.pl, including those to be passed to BLAT
<MappingBlat.pl 2nd-pass options>: options to be passed to the second pass of MappingBlat.pl, including those to be passed to BLAT
This script automatically applies -SamTmp when invoking MappingBowtie.pl and first pass of MappingBlat.pl, which means, for these two invocations, -ID should be appropriately set and -SamTmp should NOT be applied manually.
This script will append -ID 0 to <MappingBowtie.pl options> and <MappingBlat.pl 1st-pass options> when there is no -ID option.
A reasonable usage of this script is to apply different search strategies on the two BLAT invocations. For example, default parameters for the first, and finer parameters (e.g. -tilSize=9 -stepSize=3) for the second.
This script is a super-wrapper of above four mapping scripts, it (1) splits an input FASTA/FASTQ file into the specified number of parts, (2) concurrently maps all parts using the specified mapping script, and (3) outputs mapping results in SAM or BAM (binary SAM) format.
-ID <idThreshold>: identity threshold for filtering. Setting this threshold 0 will let those unmapped records filtered. (default: 0.9)
<passedSAM>: the output SAM file for reads with only qualified records
<filteredSAM>: the output SAM file for reads without any qualified records
This script works properly with SAM files whose records of the same read ID are consecutive. Generally speaking, SAM or BAM files made directly by most aligners and BAM files sorted by names should fit this requirement.
There will be only one record for each read in <filteredSAM>.
SAM headers from the input will be written to both output SAM files.
Again, it would be better to make sure that -p 8 for Bowtie2 can be performed appropriately. Also note that this command creates only one process for BLAT. The following steps do the same thing and run BLAT faster.
Map reads with bowtie, filter alignments with identity threshold 0.9, and save unqualified reads/records in Result.bowtieTmp.sam. It is important not to apply the Bowtie2 option --no-unal so that unmapped reads would be also unqualified.
The following command runs translation level BLAT against transcript sequences of TAIR10. Note that BLAT options -stepSize=1 minIdentity=15 -minScore=20 refine default BLAT parameters and cost much more running time. These BLAT parameters are all adjustable.