-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UserWarning: 1Torch was not compiled with flash attention. #850
Comments
+1 I am experiencing the exact same issue |
Warning: 1Torch was not compiled with flash attention. First of all, let me tell you a good news. Failure usually does not affect the program running, but it is slower. This warning is caused by the fact that after torch=2.2 update, flash attention V2 needs to be started as the optimal mechanism, but it is not successfully started. In this blog https://pytorch.org/blog/pytorch2-2/, it is written that pytorch 2.2 has major updates scaled_dot_product_attention (SDPA) now supports FlashAttention-2, yielding around 2x speedups compared to previous versions. Usually, the order of function calls is FlashAttention > Memory-Efficient Attention(xformers) > PyTorch C++ implementation(math) (I don't understand why it is designed this way, and the meaning is completely unclear from the warning. I hope the official next version will improve it) But the pits I want to solve are the following places:
|
Loading pipeline components...: 100%|██████████| 7/7 [00:02<00:00, 2.48it/s]
0%| | 0/50 [00:00<?, ?it/s]D:\ProgramData\envs\pytorch\Lib\site-packages\diffusers\models\attention_processor.py:1279: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
hidden_states = F.scaled_dot_product_attention(
100%|██████████| 50/50 [00:05<00:00, 8.44it/s]
The text was updated successfully, but these errors were encountered: