From d6224a55390bf2d8fd0180c21bae44f0d718d1eb Mon Sep 17 00:00:00 2001 From: Julia Jiang Date: Wed, 4 May 2022 17:17:25 -0400 Subject: [PATCH] SWDEV-329687 - Porting HIP documents update requested in ROCM 5.2 Change-Id: I84692f96f6535de58756605d601f1147aff31f9c --- docs/markdown/hip_faq.md | 2 +- docs/markdown/hip_porting_guide.md | 30 +++++++++++++++++++++++++- docs/markdown/hip_programming_guide.md | 8 +++++-- 3 files changed, 36 insertions(+), 4 deletions(-) diff --git a/docs/markdown/hip_faq.md b/docs/markdown/hip_faq.md index a26597ca67..8107ecc148 100644 --- a/docs/markdown/hip_faq.md +++ b/docs/markdown/hip_faq.md @@ -144,7 +144,7 @@ The tools also struggle with more complex CUDA applications, in particular, thos - For Nvidia platforms, HIP requires Unified Memory and should run on any device supporting CUDA SDK 6.0 or newer. We have tested the Nvidia Titan and Tesla K40. ### Do HIPIFY tools automatically convert all source code? -Typically, HIPIFY tools can automatically convert almost all run-time code, and the coordinate indexing device code ( threadIdx.x -> hipThreadIdx_x ). +Typically, HIPIFY tools can automatically convert almost all run-time code. Most device code needs no additional conversion since HIP and CUDA have similar names for math and built-in functions. The hipify-clang tool will automatically modify the kernel signature as needed (automating a step that used to be done manually). Additional porting may be required to deal with architecture feature queries or with CUDA capabilities that HIP doesn't support. diff --git a/docs/markdown/hip_porting_guide.md b/docs/markdown/hip_porting_guide.md index 33f6847f75..179ad64a2f 100644 --- a/docs/markdown/hip_porting_guide.md +++ b/docs/markdown/hip_porting_guide.md @@ -467,7 +467,8 @@ int main() ``` ## CU_POINTER_ATTRIBUTE_MEMORY_TYPE -To get pointer's memory type in HIP/HIP-Clang one should use hipPointerGetAttributes API. First parameter of the API is hipPointerAttribute_t which has 'memoryType' as member variable. 'memoryType' indicates input pointer is allocated on device or host. + +To get pointer's memory type in HIP/HIP-Clang, developers should use hipPointerGetAttributes API. First parameter of the API is hipPointerAttribute_t which has 'memoryType' as member variable. 'memoryType' indicates input pointer is allocated on device or host. For example: ``` @@ -481,6 +482,33 @@ hipHostMalloc(&ptrHost, sizeof(double)); hipPointerAttribute_t attr; hipPointerGetAttributes(&attr, ptrHost); /*attr.memoryType will have value as hipMemoryTypeHost*/ ``` +Please note, hipMemoryType enum values are different from cudaMemoryType enum values. + +For example, on AMD platform, memoryType is defined in hip_runtime_api.h, +typedef enum hipMemoryType { + hipMemoryTypeHost, ///< Memory is physically located on host + hipMemoryTypeDevice, ///< Memory is physically located on device. + hipMemoryTypeArray, ///< Array memory, physically located on device. + hipMemoryTypeUnified ///< Not used currently +} hipMemoryType; + +Looking into CUDA toolkit, it defines memoryType as following, +enum cudaMemoryType +{ + cudaMemoryTypeUnregistered = 0, // Unregistered memory. + cudaMemoryTypeHost = 1, // Host memory. + cudaMemoryTypeDevice = 2, // Device memory. + cudaMemoryTypeManaged = 3, // Managed memory +} + +In this case, memoryType translation for hipPointerGetAttributes needs to be handled properly on nvidia platform to get the correct memory type in CUDA, which is done in the file nvidia_hip_runtime_api.h. + +So in any HIP applications which use HIP APIs involving memory types, developers should use #ifdef in order to assign the correct enum values depending on Nvidia or AMD platform. + +As an example, please see the code from the link, +github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/catch/unit/memory/hipMemcpyParam2D.cc#L77-L96. + +With the #ifdef condition, HIP APIs work as expected on both AMD and NVIDIA platforms. ## threadfence_system Threadfence_system makes all device memory writes, all writes to mapped host memory, and all writes to peer memory visible to CPU and other GPU devices. diff --git a/docs/markdown/hip_programming_guide.md b/docs/markdown/hip_programming_guide.md index d5549659da..e9d25b06bc 100644 --- a/docs/markdown/hip_programming_guide.md +++ b/docs/markdown/hip_programming_guide.md @@ -60,9 +60,10 @@ HIP supports Stream Memory Operations to enable direct synchronization between N hipStreamWriteValue64 Note, CPU access to the semaphore's memory requires volatile keyword to disable CPU compiler's optimizations on memory access. - For more details, please check the documentation HIP-API.pdf. +Please note, HIP stream does not gurantee concurrency on AMD hardware for the case of multiple (at least 6) long running streams executing concurrently, using hipStreamSynchronize(nullptr) for synchronization. + ### Coherency Controls ROCm defines two coherency options for host memory: - Coherent memory : Supports fine-grain synchronization while the kernel is running.  For example, a kernel can perform atomic operations that are visible to the host CPU or to other (peer) GPUs.  Synchronization instructions include threadfence_system and C++11-style atomic operations. @@ -130,7 +131,10 @@ The link here(https://github.com/ROCm-Developer-Tools/HIP/blob/main/tests/src/hi ## Device-Side Malloc -HIP-Clang currently doesn't supports device-side malloc and free. +HIP-Clang now supports device-side malloc and free. +This implementation does not require the use of `hipDeviceSetLimit(hipLimitMallocHeapSize,value)` nor respects any setting. The heap is fully dynamic and can grow until the available free memory on the device is consumed. + +The test codes in the link (https://github.com/ROCm-Developer-Tools/HIP/blob/develop/tests/src/deviceLib/hipDeviceMalloc.cpp) show how to implement application using malloc and free functions in device kernels. ## Use of Long Double Type