Section: User Commands (1)
Updated: September 2000
Index Return to Main Contents



getseqs - makes BLAST searches and GenBank over the internet, and make data subject to further refinement



[options] [file]



getseqs can for a set of sequences getseqs perform BLAST and GenBank retrieval over the internet to obtain a raw core of sequences that can automatically be refined, by discarding already known hits, and apply programs such as align0 and qrna.

getseqs requires installation of

  • BLAST, either a local version of blast or the netblast program blastcl3
  • lynx (works with version 2.8.3dev.9).
  • blast2col which is a part of this package.
  • extendlist which is specifically designed as part of this program.
  • align0 from the Pearson FASTA package. (This is only needed you wish to use align0 as part of the refinement.)
  • qrna (by Rivas and Eddy) to search for rna structure. As this program is not yet public available, this option is suppressed for time the being. (This is only needed you wish to use qrna as part of the refinement.)



getseqs accepts the following options.

-nseq <number>
Makes the blast search using nseq sequences at the time. Default is 25.

Read sequences column (col) format instead of fasta format. Default is fasta format.

-runname <string>
The name of temporary data dir. By default it combines date, time and process id, to create a unique identifier. If "runname" exists the extension of time will be added prior to making new "runname" dir. Retrieved GenBank entries are stored in the file (in runname dir) entries.gb. All used fasta files are stored in the subdir fasta.

-blast <'string'>
The blast commandline execution. Default is 'blastcl3 -p blastn -d nr'. Data is piped to this command. Note that even the netblast execution, blastcl3, can be replaced with your local version of blast and even a complete path to that executable.This program must be installed locally in order to be used by getseqs. The results of the blast search is stored in runname dir as search.blast.

-align0 <'string'>
The align0 commandline executable. Default is 'align0'. To turn align0 usage off, use -align0 ''. The command is executed on query and subject data files. This program must be installed locally in order to be used by getseqs. The output of align0 is stored in runname dir as align0.out.

-qrna <'string'>
The qrna commandline execution. Default is -qrna '', that the program is not used by default. (se man page for details). To turn qrna usage off, use -qrna ''. This program must be installed locally in order to be used by getseqs. The output of qrna is stored in runname dir as qrna.out.

-alength <number>
Filter the blast search by minimum allowable alignment length. Default is zero.

-discgb <file>
File containing the list of (GenBank) entries to be discarded from the blast search. The file search.blast.col contains only the filtered hits, in column format.

-crange <number>
The sequence context range to extend GenBank hit with. The extension is in both directions. Default is 100, but its recommend that size is of the size of the search sequence.

prints this list.



To search BLAST file foo.fasta against GenBank, and discard the hits of already the known hits in file foo.discard. Realign query and hit by using align0. Dump all data in dir foodatadir

getseqs.awk -runname foodatadir -discgb foo.discard foo.fasta

To extend the region in GenBank hit when realign with align0. Extend the region to 200 in both directions of the sequence.

getseqs.awk -runname foodatadir -discgb foo.discard -crange 200 foo.fasta

To avoid doing align0 realign:

getseqs.awk -runname foodatadir -discgb foo.discard -align0 '' foo.fasta
('' is two single ')



Report bugs to col-bugs@bioinf.au.dk.



Bjarne Knudsen (bk@daimi.au.dk)






Comments, questions, etc., email gorodkin@genome.ku.dk.

Last updated March 26th, 2007 by Jan Gorodkin