This program is a wrapper to nFilter. It removes the sequences with a number of N's above a threshold value 'rm.N'. All the sequences with a number of N > rm.N (N >= rm.N) will be removed

n_filter(input, rm.N)

Arguments

input

ShortReadQ object

rm.N

Threshold value of N's to remove a sequence from the output (sequences with number of Ns > threshold are removed) For example, if rm.N is 3, all the sequences with a number of Ns > 3 (Ns >= 4) will be removed

Value

Filtered ShortReadQ object

Examples

require('Biostrings') require('ShortRead') # create 6 sequences of width 20 set.seed(10) input <- random_seq(50, 20) # inject N's set.seed(10) input <- inject_letter_random(input, how_many_seqs = 1:30, how_many = 1:10) input <- DNAStringSet(input) # watch the N's frequency hist(letterFrequency(input, 'N'), breaks = 0:10, main = 'Ns Frequency', xlab = '# Ns')
# create qualities of width 20 set.seed(10) input_q <- random_qual(50, 20) # create names input_names <- seq_names(50) # create ShortReadQ object my_read <- ShortReadQ(sread = input, quality = input_q, id = input_names) # apply the filter filtered <- n_filter(my_read, rm.N = 3) # watch the filtered sequences sread(filtered)
#> A DNAStringSet instance of length 40 #> width seq #> [1] 20 TGGTCCGGTGTTCTGGCGGA #> [2] 20 ATAGGTACAGTCCAGTAATT #> [3] 20 GCCTCCCGCAGACGCTGGGT #> [4] 20 CCGGAATGCCCTTTCTGAGC #> [5] 20 AGCTCCAGCCGTTTGACTTC #> ... ... ... #> [36] 20 ATCAATTCGTCCTGAGTTCA #> [37] 20 AGCCCACTGGGGGAGAACGC #> [38] 20 CGAGGGAAGCCAAGCAAAGC #> [39] 20 GGGAAGATCCGTTACTCTTT #> [40] 20 AGGAATTCCCGGAAGTCGCA
# watch the N's frequency hist(letterFrequency(sread(filtered), 'N'), main = 'Ns distribution', xlab = '')