Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UserWarning: 1Torch was not compiled with flash attention. #850

Open
stromyu520 opened this issue May 10, 2024 · 2 comments
Open

UserWarning: 1Torch was not compiled with flash attention. #850

stromyu520 opened this issue May 10, 2024 · 2 comments

Comments

@stromyu520
Copy link

Loading pipeline components...: 100%|██████████| 7/7 [00:02<00:00, 2.48it/s]
0%| | 0/50 [00:00<?, ?it/s]D:\ProgramData\envs\pytorch\Lib\site-packages\diffusers\models\attention_processor.py:1279: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
hidden_states = F.scaled_dot_product_attention(
100%|██████████| 50/50 [00:05<00:00, 8.44it/s]

@lamguy
Copy link

lamguy commented May 30, 2024

+1 I am experiencing the exact same issue

@gaoming714
Copy link

Warning: 1Torch was not compiled with flash attention.

First of all, let me tell you a good news. Failure usually does not affect the program running, but it is slower.

This warning is caused by the fact that after torch=2.2 update, flash attention V2 needs to be started as the optimal mechanism, but it is not successfully started.

In this blog https://pytorch.org/blog/pytorch2-2/, it is written that pytorch 2.2 has major updates

scaled_dot_product_attention (SDPA) now supports FlashAttention-2, yielding around 2x speedups compared to previous versions.

Usually, the order of function calls is FlashAttention > Memory-Efficient Attention(xformers) > PyTorch C++ implementation(math)

(I don't understand why it is designed this way, and the meaning is completely unclear from the warning. I hope the official next version will improve it)

But the pits I want to solve are the following places:

  1. It is supported in pytroch and is the first choice. The logic is that this Warning will be issued as long as flashAttentionV2 fails. (Some people have tested and found that flashAttentionV2 has not improved much)

  2. flashAttentionV2 does not have a complete ecosystem. The current official version (official website https://github.com/Dao-AILab/flash-attention) only supports Linux, and for Windows users, they can only compile the code (it is very slow anyway, even if ninja is installed). You can refer to (https://github.com/bdashore3/flash-attention/releases) for downloading third-party packages.

  3. The hardware support is at least RTX 30 or above. FlashAttention only supports Ampere GPUs or newer. In other words, it can run on 3060.

  4. There is still a small possibility that the environment cuda version and the compiled cuda version are incompatible. The official version of torch is 12.1 (torch2.* +cu121).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants