Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to set up and use local MMseqs2 server with ColabFold Docker image❓ #636

Closed
mavericb opened this issue Jul 17, 2024 · 8 comments

Comments

@mavericb
Copy link

I'm trying to use a local MMseqs2 server with ColabFold running in a Docker container. However, I'm encountering several issues:

  1. It's not clear if ColabFold is using the default server or a local one. How can I verify this and ensure it's using a local server?

  2. I tried setting up a local server using https://github.com/soedinglab/MMseqs2-App, but I'm getting MMseqs2 API errors (see attached image
    image
    ).

  3. When attempting to set up the database using the setup_database.sh script, I get the error:

    + mmseqs tsv2exprofiledb uniref30_2302 uniref30_2302_db
    ./setup_databases.sh: line 65: mmseqs: command not found
    

Any guidance on configuring the Docker environment to work with a local MMseqs2 server would be greatly appreciated.

Thanks!!

@milot-mirdita
Copy link
Collaborator

There are instructions here: https://github.com/sokrypton/ColabFold/tree/main/MsaServer

On how to set it up correctly

@mavericb
Copy link
Author

There are instructions here: https://github.com/sokrypton/ColabFold/tree/main/MsaServer

On how to set it up correctly

Sorry to bother you again. I see a setup_databases.sh in the main folder, and another setup-and-start-local.sh in the MsaServer.
I already ran succesfully the setup_databases.sh, and now I tried to run the setup-and-start-local.sh but got the error "PDB rsync server was not chosen, please edit this script to choose which PDB download server you want to use".

I think it would be very helpful to write step-by-step instructions in the README on how to use ColaFold with a local MsaServer, possibly with additional explanation for running the MsaServer when using ColaFold via Docker.

I'm very confused now and don't know how to proceed further :(

@mavericb
Copy link
Author

setup_database.sh and msaserver/setup-and-start-local.sh seem different. So, the plan is to use msaserver/setup-and-start-local.sh and hopefully, the server will be up for working with the local fold Docker image.

I had to uncomment a line to select the PDB server:

PDB_SERVER=rsync.wwpdb.org::ftp                                   # RCSB PDB server name
PDB_PORT=33444     

but I'm not sure if this is the right thing to do.

And then, I had to install Go and Aria via apt-get install.

Now it's downloading a 95 GB file. I'm not sure if I have already downloaded that during the setup_database.sh process:

 *** Download Progress Summary as of Thu Jul 18 20:59:13 2024 ***                                                  
===================================================================================================================
[#3ec324 9.0GiB/95GiB(9%) CN:5 DL:10MiB ETA:2h15m4s]
FILE: ./uniref30_2302.tar.gz
-------------------------------------------------------------------------------------------------------------------

[#3ec324 9.3GiB/95GiB(9%) CN:5 DL:11MiB ETA:2h13m38s] 

@mavericb
Copy link
Author

I cloned a new repository and followed the instructions here: https://github.com/sokrypton/ColabFold/tree/main/MsaServer.

However, I encountered two problems:

  • It crashes because of the error message: 2024/07/19 10:01:05 open ~/databases/pdb70_a3m.ffdata: no such file or directory111.

The instructions claim that "The script can be called repeatedly to start the server. It will avoid doing any unnecessary setup work." However, when I call the script again, I get the error:

~/amelie/Workspace/ColabFold/MsaServer/mmseqs-server ~/amelie/Workspace/ColabFold/MsaServer
You are not currently on a branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

:(

@mavericb
Copy link
Author

Hmmm, maybe it's the config.json that is outdated. I see pdb70 there, but in the downloaded files I have pdb100, same with UniRef. So I am trying to update the config.json to match the downloaded files

@mavericb
Copy link
Author

mavericb commented Jul 19, 2024

I used the fork from this guy and now it's working: #534.
But new errors have appeared...

 File "/usr/local/envs/colabfold/lib/python3.9/site-packages/colabfold/colabfold.py", line 209, in run_mmseqs2
    raise Exception(f'MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.')
Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.
2024-07-19 19:30:23,169 Query 10/10: run-1_102__id_10__T_0.05__seed_111__overall_confidence_0.2730__ligand_confidence_0.2730__seq_rec_0.0412 (length 257)
2024-07-19 19:30:23,170 Server didn't reply with json: 404 page not found   

@mavericb
Copy link
Author

@tamuanand
Copy link

Hi @mavericb

Thanks a lot for the detailed blog

https://www.blopig.com/blog/2024/04/dockerized-colabfold-for-large-scale-batch-predictions/

A newbie question here: I happened to see this from @YoshitakaMo for localcolabfold where --use-env 1 --use-templates 1 --db2 pdb100_230517 is used with colabfold_search but the same args/parameters are not used in your search with colabfold_search.

MMSEQS_PATH="/path/to/your/mmseqs2/for_colabfold"
DATABASE_PATH="/mnt/databases"
INPUTFILE="ras_raf.fasta"
OUTPUTDIR="ras_raf"

colabfold_search \
  --use-env 1 \
  --use-templates 1 \
  --db-load-mode 2 \
  --db2 pdb100_230517 \
  --mmseqs ${MMSEQS_PATH}/bin/mmseqs \
  --threads 4 \
  ${INPUTFILE} \
  ${DATABASE_PATH} \
  ${OUTPUTDIR}

Appreciate your inputs and help here.

Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants