compseq counts the composition of
dimer/trimer/etc words in a sequence.
Here is a sample session with compseq. To count
the frequencies of dinucleotides in a file:
% compseq embl:hsfau 2 result3.comp To count the frequencies of hexanucleotides, without outputting the
results of hexanucleotides that do not occur in the sequence:
% compseq embl:hsfau 6 result6.comp -nozero
To count the frequencies of trinucleotides
in frame 2 of a sequence using a previously prepared
compseq output to show the expected frequencies:
% compseq embl:hsfau 3 result3.comp -frame 2 -in prev.comp
Mandatory qualifiers:
- [-sequence] (seqall)
-
Sequence database USA.
- [-word] (integer)
-
The size of word (n-mer) to count. If you want to count codon
frequencies, enter 3 here.
- [-outfile] (outfile)
-
The results file.
Optional qualifiers (bold if not always prompted):
- -infile (infile)
-
This is a file previously produced by compseq
that can be used to set the expected frequencies of words in an
analysis. The word size in the current run must be the same as the
word size in this results file. Obviously, you should use a file
produced from protein sequences if you are counting protein sequence
word frequencies, or a file made from nucleotide frequencies if you
are analyzing a nucleotide sequence.
- -frame (integer)
-
The normal behavior of compseq is to count the
frequencies of all words that occur by moving a window of length
word up by one each time. This option allows you
to move the window up by the length of the word each time, skipping
intervening words. You can count only those words that occur in a
single frame of the word by setting this value to a number other than
0. If you set it to 1 it will
only count the words in frame 1, 2 will only count
the words in frame 2 and so on.
- -[no]ignorebz (boolean)
-
The amino acid code B represents Asparagine or Aspartic acid, and the
code Z represents Glutamine or Glutamic acid. These codes are not
commonly used, and you may not want to count words containing them.
This command will note codes B and Z in the count of
"Other" words.
- -reverse (boolean)
-
Set this option to true if you want to count words in the reverse
complement of a nucleic sequence.
- -[no]zerocount (boolean)
-
You can make the output results file much smaller if you do not
display the words with a zero count.
|