n_filter.Rd
This program is a wrapper to
nFilter
.
It removes the sequences with a number of N's above
a threshold value 'rm.N'.
All the sequences with a number of N > rm.N (N >= rm.N) will be removed
n_filter(input, rm.N)
input |
|
---|---|
rm.N | Threshold value of N's to remove a sequence from the output (sequences with number of Ns > threshold are removed) For example, if rm.N is 3, all the sequences with a number of Ns > 3 (Ns >= 4) will be removed |
Filtered ShortReadQ
object
require('Biostrings') require('ShortRead') # create 6 sequences of width 20 set.seed(10) input <- random_seq(50, 20) # inject N's set.seed(10) input <- inject_letter_random(input, how_many_seqs = 1:30, how_many = 1:10) input <- DNAStringSet(input) # watch the N's frequency hist(letterFrequency(input, 'N'), breaks = 0:10, main = 'Ns Frequency', xlab = '# Ns')# create qualities of width 20 set.seed(10) input_q <- random_qual(50, 20) # create names input_names <- seq_names(50) # create ShortReadQ object my_read <- ShortReadQ(sread = input, quality = input_q, id = input_names) # apply the filter filtered <- n_filter(my_read, rm.N = 3) # watch the filtered sequences sread(filtered)#> A DNAStringSet instance of length 40 #> width seq #> [1] 20 TGGTCCGGTGTTCTGGCGGA #> [2] 20 ATAGGTACAGTCCAGTAATT #> [3] 20 GCCTCCCGCAGACGCTGGGT #> [4] 20 CCGGAATGCCCTTTCTGAGC #> [5] 20 AGCTCCAGCCGTTTGACTTC #> ... ... ... #> [36] 20 ATCAATTCGTCCTGAGTTCA #> [37] 20 AGCCCACTGGGGGAGAACGC #> [38] 20 CGAGGGAAGCCAAGCAAAGC #> [39] 20 GGGAAGATCCGTTACTCTTT #> [40] 20 AGGAATTCCCGGAAGTCGCA# watch the N's frequency hist(letterFrequency(sread(filtered), 'N'), main = 'Ns distribution', xlab = '')