pairing_mask --3---311111111----222222-----111111111--2222222----- orga -gAcugUcugUCGAUgauu-AAcGuac--cAUCGAccgug-aCaUUuagua-- orgb agAccgUcg-UCGAUGauu-AAcGuaca-CAUCGA-cgug-aCaUUuagua-- orgc cgAcacUcuguCCAUgauu-AAcGuac--aAUCGaacggu-aCaUUuaguacc orgd -gAcchUgcguCGAUgauu-AAcGuac-uaAUCGaacgug-aCaUUuagua--In this format the structure is given by a pairing mask as the first sequence. The bases which form pairs are given as capital letters in the RNA sequences. Pairs are formed between bases that has the same symbols in the pairing mask. From this format, a col file can be made by writing:
txt2col example.txt > example.col
Now example.col contains the col file. The start of the resulting col file looks like this:
; Generated by txt2col ; ======================================================================== ; TYPE pairingmask ; COL 1 label ; COL 2 residue ; COL 3 alignpos ; ENTRY pairingmask ; ---------- M - 1 M - 2 M 3 3 M - 4 M - 5 M - 6 M 3 7 M 1 8 M 1 9 M 1 10 M 1 11 M 1 12The file starts with a header that describes where the file came from. This one was generated by txt2col. This area of the file could also contain information written by the person who made the file. In a database, this could be a reference to the article describing the article, which version it is etc. The header is ended by the line of equality sign. Notice how all lines starting with semicolons are comments.
The first entry of the file has the type pairingmask which describes RNA structure. It was the -m option to txt2col that made the program include the pairing mask as an entry. It is not this entry that specifies the structure of the following RNA sequences, this information is kept in each entry as shown below. This means that the pairing mask is not necessary, but it is only kept as a reference. The first column in this entry is a label that describes what is in each position. Here it is all M's for pairingmask. The seconde column called residue contains the symbols of the sequences. The third column contains the position numbers in the sequence.
The next entry in the col file is the first real sequence, that starts like this:
; TYPE RNA ; COL 1 label ; COL 2 residue ; COL 3 seqpos ; COL 4 alignpos ; COL 5 align_bp ; ENTRY orga ; ---------- G - . 1 . N g 1 2 . N A 2 3 7 N c 3 4 . N u 4 5 . N g 5 6 . N U 6 7 3 N c 7 8 . N u 8 9 . N g 9 10 . N U 10 11 35 N C 11 12 34This is of type RNA. The first column is again a label, here all the nucleotides have N in this column, while gaps have G's. Column two contains the sequence symbols and column three contains the sequence positions. The fourth column has positions relative to the alignment. The fifth column is called align_bp, for align basepair. This has the secondary structure of the RNA, specified as pairs relative to the alignpos column. A dot in the column means that the nucleotide is unpaired.
The entire col file can be found here.