infoalign is small utility to list some
simple properties of sequences in an alignment.
Here is a sample session with infoalign:
% infoalign globin.seq Don't display the USA of a sequence:
% infoalign globin.seq -nousa -out stdout Display only the name and sequence length of a sequence:
% infoalign globin.seq -only -name -seqlength -out stdout Display only the name, number of gap characters and differences to
the consensus sequence:
% infoalign globin.seq -only -name -gapcount -diffcount -out stdout Display the name and number of gaps within a sequence:
% infoalign globin.seq -only -name -gaps -out stdout Display information formatted with HTML:
% infoalign globin.seq -html -out stdout Use the first sequence as the reference sequence to compare to:
% infoalign globin.seq -refseq 1 -out stdout
Mandatory qualifiers:
- [-sequence] (seqset)
-
The sequence alignment to display.
- [-outfile] (outfile)
-
If you enter the name of a file here, this program will write the
sequence details into the specified file.
Optional qualifiers:
- -refseq (string)
-
If you give the number in the alignment or the name of a sequence, it
will be taken to be the reference sequence. The reference sequence is
the one to which all the other sequences are compared. If this is set
to 0, the consensus sequence is used as the
reference sequence. By default, the consensus sequence is used as the
reference sequence.
- -matrix (matrix)
-
This is the scoring matrix file used when comparing sequences. By
default, it is the file EBLOSUM62 (for proteins)
or the file EDNAFULL (for nucleic sequences).
These files are found in the data directory of
the EMBOSS installation.
- -html (boolean)
-
Format output as an HTML table.
Advanced qualifiers:
- -plurality (float)
-
Set a cutoff for the percentage of positive scoring matches below
which there is no consensus. The default plurality is taken as 50% of
the total weight of all sequences in the alignment.
- -identity (float)
-
Sets the number of identities required at a position in order to
return a consensus. If this is set to 100%, only columns of
identities contribute to the consensus.
- -only (boolean)
-
This is a way of shortening the command line if you want only a few
things to be displayed. Instead of using the options -nohead
-nousa -noname -noalign -nogaps -nogapcount -nosimcount -noidcount
-nodiffcount to get only the sequence length output, you
can specify -only -seqlength.
- -heading (boolean)
-
Display column headings.
- -usa (boolean)
-
Display the USA of the sequence.
- -name (boolean)
-
Display name column.
- -seqlength (boolean)
-
Display seqlength column.
- -alignlength (boolean)
-
Display alignlength column.
- -gaps (boolean)
-
Display number of gaps.
- -gapcount (boolean)
-
Display number of gap positions.
- -idcount (boolean)
-
Display number of identical positions.
- -simcount (boolean)
-
Display number of similar positions.
- -diffcount (boolean)
-
Display number of different positions.
- -change (boolean)
-
Display percentage of changed positions.
- -description (boolean)
-
Display description column.
|