-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase thread safety to fix RBatchGenerator tutorials segfaults #13302
Conversation
Sonatype Lift is retiringSonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console. |
@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=ON |
Starting build on |
Build failed on mac11/noimt. Failing tests: |
Build failed on windows10/cxx14. Failing tests: |
@phsft-bot build just on mac11/noimt with flags -DCTEST_TEST_EXCLUDE_NONE=ON -DCMAKE_BUILD_TYPE=Debug -DLLVM_BUILD_TYPE=Debug |
Starting build on |
The failures in the new CI only refer to windows, other builds are "green" but the tutorials are not actually run. I need to understand how to force running also these tutorials in the new CI |
This absolutely needs @pcanal 's review. |
Build failed on mac11/noimt. |
@phsft-bot build just on mac11/noimt with flags -DCTEST_TEST_EXCLUDE_NONE=ON -DCMAKE_BUILD_TYPE=Debug -DLLVM_BUILD_TYPE=Debug |
Starting build on |
Build failed on mac11/noimt. |
@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=ON |
Starting build on |
Build failed on ROOT-ubuntu2004/python3. Failing tests: |
Build failed on mac12arm/cxx20. Failing tests: |
Starting build on |
Build failed on ROOT-ubuntu2204/nortcxxmod. Failing tests: |
Build failed on mac11/noimt. Failing tests: |
Build failed on ROOT-ubuntu2004/python3. Failing tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
9a08347
to
5e03e20
Compare
Starting build on |
Build failed on ROOT-ubuntu2204/nortcxxmod. |
@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=ON |
Starting build on |
Build failed on ROOT-ubuntu2204/nortcxxmod. |
TFunction is a higher-level class, it should not lock directly. As a general rule, thread-safety should be ensured by the interfaces used. At the same time, if a method of TFunction needs to call multiple, different locking functions in its body, a single call to R__LOCKGUARD is better than multiple separate calls happening in the other functions. The strategy to improve thread-safety adopted is to move locks present in TFunction to lower-level interfaces in TCling or TClingMethodInfo whenever possible. In few cases, keep a lock in the TFunction method if it needs to call multiple locking functions. Still, leave the locks in the called functions so that they can be thread-safe interfaces for other callers.
Co-authored-by: Jonas Hahnfeld <[email protected]>
Even though it would be better to place the locks in TMetaUtils directly, TClingUtils.cxx cannot depend from TInterpreter.h (for the declaration of gInterpreterMutex) as that would cause a circular dependency. For now, leave the locks in TClingTypeInfo.
Build failed on ROOT-ubuntu2004/python3. Failing tests: |
5e03e20
to
dd3a6cc
Compare
@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=ON |
Starting build on |
Build failed on ROOT-ubuntu2204/nortcxxmod. Failing tests: |
Build failed on ROOT-ubuntu2004/python3. Failing tests: |
Build failed on windows10/default. Failing tests: |
Re-requesting review from @pcanal after applying suggestions. The RBatchGenerator tutorials are passing on Linux and mac nodes, other failing TMVA tests are being handled separately and are not related to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks.
Original message of upstream commit by Richard Smith, llvm/llvm-project@61c7a9140b: --- Commit to a primary definition for a class when we load its first member. Previously, we wouldn't do this if the first member loaded is within a definition that's added to a class via an update record, which happens when template instantiation adds a class definition to a declaration that was imported from an AST file. This would lead to classes having member functions whose getParent returned a class declaration that wasn't the primary definition, which in turn caused the vtable builder to build broken vtables. I don't yet have a reduced testcase for the wrong-code bug here, because the setup required to get us into the broken state is very subtle, but have confirmed that this fixes it. --- This fixes an assertion in CodeGenFunction::EmitCXXDestructorCall(): Assertion `ThisTy->getAsCXXRecordDecl() == DtorDecl->getParent() && "Pointer/Object mixup"' failed. which was already seen during the upgrade to LLVM 13 in one tutorial on CentOS 8 and "solved" by commit ffe8679 ("Relax assertion on generating destructor call"). Due to the nature of this problem, the assertion failure went away with unrelated changes so I reverted the change in 2b997ad. Now the problem comes back with the upgrade to LLVM 16 and also in master when trying to enable the RBatchGenerator tutorials in root-project#13302, both on macOS this time. Luckily, the underlying cause was properly fixed in upstream LLVM just last week, so backport that commit.
Attempting to improve the situation regarding segfaults seen in the RBatchGenerator tutorials. Generally, the situation triggering the segfaults is:
UPDATE:
The commits were update after discussion with @pcanal. The general strategy is to write the locks further deep in the call chain when possible (e.g. in the context of this PR,
TFunction
is a higher-level class thanTClingMethodInfo
, so the locks are better placed in the latter).