1.1 NCBI's Sequence Identifier Syntax

The National Center for Biotechnology Information (NCBI) uses the following syntax for its BLAST server. NCBI is part of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). The following (including the table) is NCBI's description. See ftp://ftp.ncbi.nih.gov/blast/db/README for details.

The syntax of sequence header lines used by the NCBI BLAST server depends on the database from which each sequence was obtained. The table below lists the identifiers for the databases from which the sequences were derived.

Database name

Identifier syntax

GenBank

gb|accession|locus

EMBL Data Library

emb|accession|locus

DDBJ, DNA Database of Japan

dbj|accession|locus

NBRF PIR

pir||entry

Protein Research Foundation

prf||name

SWISS-PROT

sp|accession|entry name

Brookhaven Protein Data Bank

pdb|entry|chain

Patents

pat|country|number

GenInfo Backbone Id

bbs|number

General database identifier

gnl|database|identifier

NCBI Reference Sequence

ref|accession|locus

Local Sequence identifier

lcl|identifier

For example, an identifier might be "gb|M73307|AGMA13GT", where the "gb" tag indicates that the identifier refers to a GenBank sequence, "M73307" is its GenBank ACCESSION, and "AGMA13GT" is the GenBank LOCUS.

"gi" identifiers are being assigned by NCBI for all sequences contained within NCBI's sequence databases. This identifier provides a uniform and stable naming convention whereby a specific sequence is assigned its unique gi identifier. If a nucleotide or protein sequence changes, however, a new gi identifier is assigned, even if the accession number of the record remains unchanged. Thus, gi identifiers provide a mechanism for identifying the exact sequence that was used or retrieved in a given search.

[ Team LiB ]