Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question]: command line params needed to run localcolabfold on a local server #260

Closed
tamuanand opened this issue Sep 24, 2024 · 2 comments

Comments

@tamuanand
Copy link

Hi @YoshitakaMo and localcolabfold team,

First of all, huge thanks for providing this tool.

I have a question: What are the set of parameters to use with colabfold_batch and colabfold_search when running everything in a local server with gpus and no data traversal to the MSA server on ColabFold on Google Colaboratory - #258 (comment)

In this regard, I happened to see this issue in the ColabFold repository and within that issue, I happened to see a detailed blog post from @mavericb as a comment with the blog being here: https://www.blopig.com/blog/2024/04/dockerized-colabfold-for-large-scale-batch-predictions/

I notice that @mavericb uses colabfold_batch and colabfold_search this way (with docker):

colabfold_search \
--mmseqs /usr/local/envs/colabfold/bin/mmseqs \
input.fasta database msas \
> /search.log 2>&1 \
&& colabfold_batch msas predictions \
> batch.log 2>&1

Questions:

  1. Is the above command line from @mavericb the one to use (I would probably add --amber --use-gpu-relax or should I use the command line as shown here with the use of --use-env 1 --use-templates 1 --db-load-mode 2 for colabfold_search and use of -pdb-hit-file ... --local-pdb-path in colabfold_batch? What's the main difference between the 2 ways
  2. The FAQS here suggest that ColabFold does not support multiple GPUs. Is it true for localcolabfold too where I want to run everything on my localserver via a slurm queue which has some single GPU machines and some multi-GPU machines? Related to this, what happens if a job lands on a multi-gpu machine (I am assuming the job will only run on 1 GPU - is my assumption correct?)

Thanks in advance.

@YoshitakaMo
Copy link
Owner

Answer to Q1.

Please see the help message (colabfold_batch --help or colabfold_search --help), FAQ of this repo, and ColabFold paper. --amber and --use-gpu-relax of colabfold_batch are optional flags. It depends on your purpose whether you use them or not.

--use-env 1, --use-templates 1, --db-load-mode 2 are the optional flags of colabfold_search. --use-env 1 and --use-templates 1 yield more diverse MSAs from metagenomics database (colabfold_envdb_202108_db) and template information (as output file foo.m8), respectively. --db-load-mode 2 is a mmseqs2's flag. See the instruction.

Q2.

Is it true for localcolabfold too where I want to run everything on my localserver via a slurm queue which has some single GPU machines and some multi-GPU machines?

Yes.

Related to this, what happens if a job lands on a multi-gpu machine?

It will be a waste of GPU resources.

You might want to specify the number of GPUs and CPUs when submitting a job with Slurm. For example, if a compute node foo01 has 16-core CPUs and 4 GPUs, you can submit 4 different jobs simultaneously to the foo01 node by specifying 4 cores and 1 GPU in Slurm. Since colabfold_search and colabfold_batch do not require more than 4~8 CPU cores, this is the best option.

@tamuanand
Copy link
Author

Thanks @YoshitakaMo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants