extractfeat
is a simple utility for extracting parts of
a sequence that have been annotated as a specific type of feature.
These subsequences are written to the output sequence file.
Here is a sample session with extractfeat to
write out the exons of a sequence:
% extractfeat embl:hsfau1 -type exon stdout To write out the exons with 10 extra bases at the start and end so
that you can inspect the splice sites:
% extractfeat embl:hsfau1 -type exon -before 10 -after 10 stdout To write out the 10 bases around the start of all
"exon" features in the EMBL
database:
% extractfeat embl:\* -type exon -before 5 -after -5 stdout To write out the 7 residues around all phosphorylated residues in
SWISS-PROT:
% extractfeat sw:\* -type mod_res -value phosphorylation -before 3 -after -4 stdout
Mandatory qualifiers:
- [-sequence] (seqall)
-
Sequence database USA.
- [-outseq] (seqout)
-
Output sequence USA.
Optional qualifiers:
- -before (integer)
-
If this value is greater than 0, that number of
bases or residues before the feature are included in the extracted
sequence. This allows you to see the context of the feature. If this
value is negative, the start of the extracted sequence will be this
number of bases/residues before the end of the feature. For example,
a value of 10 will start the extraction 10
bases/residues before the start of the sequence, and a value of
-10 will start the extraction 10 bases or residues
before the end of the feature. The output sequence will be padded
with "N" or
"X" characters if the sequence
starts after the required start of the extraction.
- -after (integer)
-
If this value is greater than 0, that number of
bases or residues after the feature are included in the extracted
sequence. This allows you to see the context of the feature. If this
value is negative, the end of the extracted sequence will be this
number of bases/residues after the start of the feature. For example,
a value of 10 will end the extraction 10
bases/residues after the end of the sequence, and a value of
-10 will end the extraction 10 bases or residues
after the start of the feature. The output sequence will be padded
with "N" or
"X" characters if the sequence ends
before the required end of the extraction.
- -source (string)
-
By default, any feature source in the feature table is shown. You can
set this to match any feature source you want to show. The source
name is usually either the name of the program that detected the
feature, or the feature table (e.g., EMBL) that the feature came
from. The source may be wildcarded by using *. If
you want to show more than one source, separate their names with the
character |, e.g., gene* |
embl.
- -type (string)
-
By default, every feature in the feature table is extracted. You can
set this to be any feature type you want to extract. See Chapter 2 for a list of the EMBL feature types, and
Chapter 3 for a list of the SWISS-PROT feature
types. The type may be wildcarded by using *. If
you want to extract more than one type, separate their names with the
|character. For example:
*UTR | intron
- -sense (integer)
-
By default, any feature type in the feature table is extracted. You
can set this to match any feature sense you want.
0 matches any sense, 1 matches
forward sense, and -1 matches reverse sense.
- -minscore (float)
-
If this is greater than or equal to the maximum score, any score is
permitted.
- -maxscore (float)
-
If this is less than or equal to the maximum score, any score is
permitted.
- -tag (string)
-
Tags are the types of extra values that a feature may have. For
example, in the EMBL feature table, a CDS type of feature may have
the tags /codon, /codon_start,
/db_xref, /EC_number,
/evidence, /exception,
/function, /gene,
/label, /map,
/note, /number,
/partial, /product,
/protein_id, /pseudo,
/standard_name, /translation,
/transl_except, /transl_table,
or /usedin. Some of these tags also have values
(e.g., /gene can have the value of the gene name).
By default, any feature tag in the feature table is extracted. You
can set this to match any feature tag you want to show. The tag may
be wildcarded by using *. If you want to extract
more than one tag, separate their names with the |
character. For example:
gene | label
- -value (string)
-
Tag values are the values associated with a feature tag. Tags are the
types of extra values that a feature may have. For example, in the
EMBL feature table, a CDS type of feature may have the tags
/codon, /codon_start,
/db_xref, /EC_number,
/evidence, /exception,
/function, /gene,
/label, /map,
/note, /number,
/partial, /product,
/protein_id, /pseudo,
/standard_name, /translation,
/transl_except, /transl_table,
or /usedin. Some of these tags also have values
(e.g., /gene can have the value of the gene name).
By default, any feature tag in the feature table is extracted. You
can set this to match any feature tag value you want to show. The tag
may be wildcarded by using *. If you want to
extract more than one tag, separate their names with the |
character. For example:
pax* | 10
- -join (boolean)
-
Some features, such as coding sequence (CDS) and mRNA, are composed
of introns concatenated together. There may be other forms of joined
sequence, depending on the feature table. If this option is set
TRUE, any group of these features will be output
as a single sequence. If the before and
after qualifiers have been set, only the sequences
before the first feature and after the last feature are added.
|