Stastical implementation of the SEG algorithm for the masking of low-complexity amino/nucleic acids sequences.
- Clone the repo (
git clone https://github.com/jszym/statseg
) - Install requirements (
pip install -r requirements.txt
)
It's as easy as that.
Using StatSEG is easy, just specify a FASTA file with sequence that you want to mask using the --infile
flag.
$ python -m statseg --infile prion.fasta
>sp|P04156|PRIO_HUMAN
MANLGCWMLVLFVATWSDLGLCKKRPKPGGxxxxxxxxPxxxSPGGNRYPPQGxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxQWNKPSKPKTNMKHMxxxxxxxxxxxxxxxYMLGSAMSRPIIHFGSDYEDRYYRENMHRYPNQVYYRPMDEYSNQNNFVHDCVNITIKQHTVTTTTKGExxxETDVKMMERVVEQMCITQYERESQAYYQRGSSMVLFSxxPVILLISFLIFLxxG
You can also output the masked sequence to a new FASTA file instead of just dumping it into the console.
$ python -m statseg --infile prion.fasta --outfile prion.masked.fasta
API & CLI documentation is available here. An explanatory blog post is available here.