The program SigStb is to explore a sequence by sliding a fixed-length window along the sequence and to compute the two z-scores, significance score (SIGSCR) and stability score (STBSCR) of all local segments with same size in a sequence. The maximum sequence length is allowed upto 29995000 bp. The window size is often in the range of 40-300 bp, and maximal window is upto 800 bp. In the version the random permutation is only performed in the extream cases where the empirical coefficients are not suitable to compute the means and standard deviation of the lowest free energies from a large sample of randomly shuffling sequences. In general, the program does not need heavy computation.
A statistically significant unusual folding region (UFS) whose SIGSCR and/or STBSCR are unusually small or large can be discovered if the distributions of the two scores are known in the sample. The lower the SIGSCR, the more statistically significant a folded RNA structure is judged to be. Similarly, lower STBSCR correspond to a greater likelihood of a structure relative to alternative structures in the same sequence. In addition, the significant open region can be determined in the sequence by their greater SIGSCR and STBSCR. The term, significant open region, means that the folded structure is very unstable, however the corresponding structures folded by randomly shuffled sequences are more stable than that of the natural sequence. In many cases, however UFRs do correlate with biologically interesting properties. In general, SIGSCR and STBSCR are well represented by a non-central Student's t distribution.
The program SigStb package includes following files:
1. SigStb Source Codes: segfill.f msigfd.f sigstb.f It can be easily compiled by f77 -col120 -vms_cc -O3 -static -o sigstb sigstb.f segfill.f msigfd.f 2. Energy Data Files for SigStb: turner.tbl (Turner energy rules for RNA folding) tinoco.tbl (Tinoco energy rules for RNA folding) ene.tbl, std.tbl ene2.tbl, and std2.tbl (Four files are used to computer sample mean and standard deviation, respectively, of the lowest free energies from randomly shuffled sequences based on the size and base compositions of the random sequences. The computed values are compatible with that computed directly by Tinoco energy rules. cf20, cf30, cf50, cf100, and cf150 (Five files are used to computer sample mean and standard deviation, respectively, of the lowest free energies from randomly shuffled sequences based on the size and base compositions of the random sequences. The computed values are compatible with that computed directly by Turner energy rules. 3. Test Data Files: Sequence file: sigstb.seq Input control file: sigstb.in Output files for sliding a window of 100 bp: W100.scr, W100.sig, and W100.open Output files for sliding a window of 63 bp: W63.scr, W63.sig, and W63.open