You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some of the TOP50 Supercomputers run OpenPOWER ISA Compatible CPUs (POWER9, etc) - Summit, et. al. Given that and my personal desire to run inference and training on my own OpenPOWER-based systems, it would be extremely useful to support using these massively multi-threaded CPUs (POWER9 has 24 cores w/ 4 threads per core, for example) with extremely high memory bandwidths (200 GB/s+ per socket) with NNPACK. In order to support this, Altivec compatible implementations of NNPACK algorithms would need to be added. A first step might be to implement the Intel-compatible intrinsic shims for SSE intrinsic primitives. I would be interested in doing this and then proceeding to full implementation - would you be willing to entertain accepting such additions into the project (assuming ppc support is also provided for the cpuinfo library per pytorch/cpuinfo#2 )?
The text was updated successfully, but these errors were encountered:
jaesharp
changed the title
Altivec/PowerPC Support
NNPACK AltiVec/PowerPC (OpenPOWER ISA 3.0B or greater) Acceleration Support
Dec 7, 2020
jaesharp
changed the title
NNPACK AltiVec/PowerPC (OpenPOWER ISA 3.0B or greater) Acceleration Support
AltiVec/PowerPC (OpenPOWER ISA 3.0B or greater) Acceleration Support
Dec 7, 2020
Also, I can provide ongoing test/development/continuous integration resources for NNPACK on several Raptor Computing Systems Talos II (IBM POWER9-based) systems I own and operate.
Some of the TOP50 Supercomputers run OpenPOWER ISA Compatible CPUs (POWER9, etc) - Summit, et. al. Given that and my personal desire to run inference and training on my own OpenPOWER-based systems, it would be extremely useful to support using these massively multi-threaded CPUs (POWER9 has 24 cores w/ 4 threads per core, for example) with extremely high memory bandwidths (200 GB/s+ per socket) with NNPACK. In order to support this, Altivec compatible implementations of NNPACK algorithms would need to be added. A first step might be to implement the Intel-compatible intrinsic shims for SSE intrinsic primitives. I would be interested in doing this and then proceeding to full implementation - would you be willing to entertain accepting such additions into the project (assuming ppc support is also provided for the cpuinfo library per pytorch/cpuinfo#2 )?
The text was updated successfully, but these errors were encountered: