Skip to content

Commit

Permalink
SWDEV-329687 - Porting HIP documents update requested in ROCM 5.2
Browse files Browse the repository at this point in the history
Change-Id: I84692f96f6535de58756605d601f1147aff31f9c
  • Loading branch information
jujiang-del authored and zhang2amd committed Jun 6, 2022
1 parent 8722b40 commit d6224a5
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/markdown/hip_faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ The tools also struggle with more complex CUDA applications, in particular, thos
- For Nvidia platforms, HIP requires Unified Memory and should run on any device supporting CUDA SDK 6.0 or newer. We have tested the Nvidia Titan and Tesla K40.

### Do HIPIFY tools automatically convert all source code?
Typically, HIPIFY tools can automatically convert almost all run-time code, and the coordinate indexing device code ( threadIdx.x -> hipThreadIdx_x ).
Typically, HIPIFY tools can automatically convert almost all run-time code.
Most device code needs no additional conversion since HIP and CUDA have similar names for math and built-in functions.
The hipify-clang tool will automatically modify the kernel signature as needed (automating a step that used to be done manually).
Additional porting may be required to deal with architecture feature queries or with CUDA capabilities that HIP doesn't support.
Expand Down
30 changes: 29 additions & 1 deletion docs/markdown/hip_porting_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -467,7 +467,8 @@ int main()
```

## CU_POINTER_ATTRIBUTE_MEMORY_TYPE
To get pointer's memory type in HIP/HIP-Clang one should use hipPointerGetAttributes API. First parameter of the API is hipPointerAttribute_t which has 'memoryType' as member variable. 'memoryType' indicates input pointer is allocated on device or host.

To get pointer's memory type in HIP/HIP-Clang, developers should use hipPointerGetAttributes API. First parameter of the API is hipPointerAttribute_t which has 'memoryType' as member variable. 'memoryType' indicates input pointer is allocated on device or host.

For example:
```
Expand All @@ -481,6 +482,33 @@ hipHostMalloc(&ptrHost, sizeof(double));
hipPointerAttribute_t attr;
hipPointerGetAttributes(&attr, ptrHost); /*attr.memoryType will have value as hipMemoryTypeHost*/
```
Please note, hipMemoryType enum values are different from cudaMemoryType enum values.

For example, on AMD platform, memoryType is defined in hip_runtime_api.h,
typedef enum hipMemoryType {
hipMemoryTypeHost, ///< Memory is physically located on host
hipMemoryTypeDevice, ///< Memory is physically located on device.
hipMemoryTypeArray, ///< Array memory, physically located on device.
hipMemoryTypeUnified ///< Not used currently
} hipMemoryType;

Looking into CUDA toolkit, it defines memoryType as following,
enum cudaMemoryType
{
cudaMemoryTypeUnregistered = 0, // Unregistered memory.
cudaMemoryTypeHost = 1, // Host memory.
cudaMemoryTypeDevice = 2, // Device memory.
cudaMemoryTypeManaged = 3, // Managed memory
}

In this case, memoryType translation for hipPointerGetAttributes needs to be handled properly on nvidia platform to get the correct memory type in CUDA, which is done in the file nvidia_hip_runtime_api.h.

So in any HIP applications which use HIP APIs involving memory types, developers should use #ifdef in order to assign the correct enum values depending on Nvidia or AMD platform.

As an example, please see the code from the link,
github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/catch/unit/memory/hipMemcpyParam2D.cc#L77-L96.

With the #ifdef condition, HIP APIs work as expected on both AMD and NVIDIA platforms.

## threadfence_system
Threadfence_system makes all device memory writes, all writes to mapped host memory, and all writes to peer memory visible to CPU and other GPU devices.
Expand Down
8 changes: 6 additions & 2 deletions docs/markdown/hip_programming_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,10 @@ HIP supports Stream Memory Operations to enable direct synchronization between N
hipStreamWriteValue64

Note, CPU access to the semaphore's memory requires volatile keyword to disable CPU compiler's optimizations on memory access.

For more details, please check the documentation HIP-API.pdf.

Please note, HIP stream does not gurantee concurrency on AMD hardware for the case of multiple (at least 6) long running streams executing concurrently, using hipStreamSynchronize(nullptr) for synchronization.

### Coherency Controls
ROCm defines two coherency options for host memory:
- Coherent memory : Supports fine-grain synchronization while the kernel is running.  For example, a kernel can perform atomic operations that are visible to the host CPU or to other (peer) GPUs.  Synchronization instructions include threadfence_system and C++11-style atomic operations.
Expand Down Expand Up @@ -130,7 +131,10 @@ The link here(https://github.com/ROCm-Developer-Tools/HIP/blob/main/tests/src/hi

## Device-Side Malloc

HIP-Clang currently doesn't supports device-side malloc and free.
HIP-Clang now supports device-side malloc and free.
This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respects any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed.

The test codes in the link (https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp) show how to implement application using malloc and free functions in device kernels.

## Use of Long Double Type

Expand Down

0 comments on commit d6224a5

Please sign in to comment.