Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential improvement to set/restore_diag in GEQR2 #826

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

AGonzales-amd
Copy link

This PR aims to reduce the impact of set_diag and restore_diag kernels to the runtime of GEQR2 indicated by profiling. This is achieved by:

  1. Combining larfg and set_diag to reduce the number of global memory reads and writes:
    • This is achieved by modifying larfg to write both the unit diagonal and non-unit diagonal values thus eliminating the call to set_diag.
  2. Reduce kernel launch overhead of set_diag and restore_diag:
    • set_diag is explained above. Launch overhead of restore_diag is reduced by launching the kernel once to restore all diagonal values at the expense of additional memory footprint.

The following chart shows the speedup of geqrf with these changes on real single precision square matrices.
log_compare_sgeqrf_m

Note:

  • I tried the suggestion of using larfb instead of larf but it performed worse due to increased global memory access. I got similar results with my attempt to modify larf to assume implicit unit diagonal.
  • This is my attempt of a solution to this problem and I am open to try other suggestions.

@AGonzales-amd AGonzales-amd added the noOptimizations Disable optimized kernels for small sizes for some routines label Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
noOptimizations Disable optimized kernels for small sizes for some routines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant