infoalign

infoalign is small utility to list some simple properties of sequences in an alignment.

Here is a sample session with infoalign:

% infoalign globin.seq

Don't display the USA of a sequence:

% infoalign globin.seq -nousa -out stdout

Display only the name and sequence length of a sequence:

% infoalign globin.seq -only -name -seqlength -out stdout

Display only the name, number of gap characters and differences to the consensus sequence:

% infoalign globin.seq -only -name -gapcount -diffcount -out stdout

Display the name and number of gaps within a sequence:

% infoalign globin.seq -only -name -gaps -out stdout

Display information formatted with HTML:

% infoalign globin.seq -html -out stdout

Use the first sequence as the reference sequence to compare to:

% infoalign globin.seq -refseq 1 -out stdout

[-sequence] (seqset): The sequence alignment to display.
[-outfile] (outfile): If you enter the name of a file here, this program will write the sequence details into the specified file.

-refseq (string): If you give the number in the alignment or the name of a sequence, it will be taken to be the reference sequence. The reference sequence is the one to which all the other sequences are compared. If this is set to 0, the consensus sequence is used as the reference sequence. By default, the consensus sequence is used as the reference sequence.
-matrix (matrix): This is the scoring matrix file used when comparing sequences. By default, it is the file EBLOSUM62 (for proteins) or the file EDNAFULL (for nucleic sequences). These files are found in the data directory of the EMBOSS installation.
-html (boolean): Format output as an HTML table.

-plurality (float): Set a cutoff for the percentage of positive scoring matches below which there is no consensus. The default plurality is taken as 50% of the total weight of all sequences in the alignment.
-identity (float): Sets the number of identities required at a position in order to return a consensus. If this is set to 100%, only columns of identities contribute to the consensus.
-only (boolean): This is a way of shortening the command line if you want only a few things to be displayed. Instead of using the options -nohead -nousa -noname -noalign -nogaps -nogapcount -nosimcount -noidcount -nodiffcount to get only the sequence length output, you can specify -only -seqlength.
-heading (boolean): Display column headings.
-usa (boolean): Display the USA of the sequence.
-name (boolean): Display name column.
-seqlength (boolean): Display seqlength column.
-alignlength (boolean): Display alignlength column.
-gaps (boolean): Display number of gaps.
-gapcount (boolean): Display number of gap positions.
-idcount (boolean): Display number of identical positions.
-simcount (boolean): Display number of similar positions.
-diffcount (boolean): Display number of different positions.
-change (boolean): Display percentage of changed positions.
-description (boolean): Display description column.