trimseq

trimseq is used to tidy up the ends of sequences, removing all the bits that you would really rather were not published.

Tidy up the sequence ends, stopping at the first wanted code:

% trimseq xyz.seq xyz_clean.seq -window 1 -percent 100

Tidy up the sequence ends, removing poor bits at the ends:

% trimseq xyz.seq xyz_clean.seq -window 5 -percent 40

Tidy up the sequence ends, removing very poor bits at the ends:

% trimseq xyz.seq xyz_clean.seq -window 20 -percent 80

Tidy up the sequence ends, removing even maginally poor bits at the ends:

% trimseq xyz.seq xyz_clean.seq -window 20 -percent 10

Tidy up the sequence ends, removing poor bits including ambiguity codes:

% trimseq xyz.seq xyz_clean.seq -window 20 -percent 50 -strict

Tidy up the sequence ends, removing asterisks from a protein end:

% trimseq xyz.seq xyz_clean.seq -window 1 -percent 100 -star

Tidy up the sequence ends, removing poor bits at only the left end:

% trimseq xyz.seq xyz_clean.seq -window 20 -percent 50 -noright

-window (integer): This determines the size of the region that is considered when deciding whether the percentage of ambiguity is greater than the threshold. A value of 5 means that a region of 5 letters in the sequence is shifted along the sequence from the ends and trimming is done only if there is a greater or equal percentage of ambiguity than the threshold percentage.
-percent (float): This is the threshold of the percentage ambiguity in the window required in order to trim a sequence.
-strict (boolean): In nucleic sequences, trim off not only Ns and Xs, but also the nucleotide IUPAC ambiguity codes M, R, W, S, Y, K, V, H, D and B. In protein sequences, trim off not only Xs but also B and Z.
-star (boolean): In protein sequences, trim off not only Xs, but the asterisks as well.