We present a collection of tools which can be used to perform semi-automated update of RNA structural databases. The procedure is in three steps: firstly, search for new sequences in GenBank, secondly update the database with new sequences, and thirdly check for constistent assignment of base pairs. The first step involves a program that apply external programs such as BLAST (Altschul et al., 1997) and align0 (Myers & Miller, 1988; Pearson, 1990). The second step is made by the user, and is likely to be performed manually, assisted by the output produced in step 1. [Computional approaches such as patscan and energy folding, structural alignment approaches e.g. stochastic context-free grammars (COVE and RNACAD) and FOLDALIGN can applied be the user. The user can also involve sequence editors e.g. BioEdit. However its up to the user to include these approaches in this step.] In the third step we apply a number of programs to check for consistency of base pair assignment, and possible improvements (extensions) of those manually assigned. The programs in particular work in column format that simplifies data manipulation on unix/linux command line. We have a collection of examples on how to use the programs. A list of the main programs is given below. A complete collection is listed here. More background can be found in the reference below. For usage of any of the programs listed here, please cite:
Semi-automated update and cleanup of structural RNA databases.
J. Gorodkin, C. Zwieb, and B. Knudsen. Bioinformatics, 17:642-645, 2001.
For usage of any of the programs that utilize BLAST, align0, qrna (see man page for getseqs) or perform GenBank sequence retrieval, the relevant references should also be cited.

Database update

The update involves essentially working two dataformats, one format suitable for manual intervention and one suitable for computational processesing. The format for manual intervention is a plane text (txt) format listing the alignment with one sequence per line and a The semi automated update of the databases can look like (you can click on different items):

Main programs and licensing

The program package with the documentation listed above comes with a standard gnulicense. Link to the gnu organisation. The main programs are described in their corresponding man pages (click on program name to go to man page), and the whole package can be downloaded here.
Main programs
Program Short description
getseqs Automated search and align0 realignment of global regions of blast hits.
stdpair Highlights any tyupe of non-standard RNA pairs.
extendstem Extends RNA stems where possible.
support Calculates which base pairs that have support.
shiftchk Colors nucletide pairs different colors.
ct2col Converts the ct format used by MFOLD to col format.
addparen Inserts parenthesis structure lines.


For more options download the software. The quick-server generates what corresponds to the following commandline:
txt2col -m <widetxtfile> | grepcol -r1,<srange> | unknown
          | stdpair --color | extendstem --color -g | support -s 
          | col2psalign --space --letter > widetxt.<srange>.ps
and where <srange> for example is "1-40" for sequences 1 to 40 of the alignment. (Hint: if first time, try pasting in the widetxt example above.)

