About the Splicing Predictor
An interpretable deep learning model for predicting RNA alternative splicing outcomes
What is PSI?
PSI (Percent Spliced In) is a measure of how often an exon is included in the mature mRNA transcript during alternative splicing. It ranges from 0 to 1:
Exon always included
50/50 inclusion
Exon always skipped
Alternative splicing is a key regulatory mechanism that allows a single gene to produce multiple protein variants. Understanding and predicting splicing outcomes is crucial for studying gene regulation and disease mechanisms.
How It Works
Input Your Sequence
Enter a 70-nucleotide exon sequence (A, C, G, T only)
Add Flanking Sequences
The model adds 10 nucleotides on each side from the original experimental context
Predict RNA Structure
ViennaRNA predicts the secondary structure and identifies wobble base pairs
Neural Network Prediction
The deep learning model predicts PSI based on sequence, structure, and wobble features
Who Should Use This Tool?
Researchers
Studying alternative splicing mechanisms and regulation
Synthetic Biologists
Designing synthetic exons with specific splicing behavior
Clinicians
Investigating potential splicing effects of genetic variants
Educators & Students
Learning about splicing regulation and computational biology
Limitations
- Fixed sequence length: Only accepts exactly 70-nucleotide exon sequences
- Training data: Model was trained on HeLa cell data from the ES7 library
- Cell type specificity: Predictions may not generalize to all cell types or tissues
- Cis-regulatory only: Does not consider trans-acting factors or cellular context
Model Performance
| Metric | Value | Description |
|---|---|---|
| Test R² | ~0.85 | Variance explained on held-out test set |
| Correlation | ~0.92 | Pearson correlation with experimental PSI |
| Test RMSE | ~0.12 | Root mean squared error on test set |
| Training Data | ~150,000 | Synthetic exon sequences from ES7_HeLa |