-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ffi: Add support for serializing/deserializing auto-generated and user generated schema tree node IDs. #557
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces significant updates to the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (16)
components/core/src/clp/ffi/ir_stream/utils.cpp (1)
57-58
: LGTM! Consider adding a comment for clarity.The addition of the
IRErrorCode_Success
case improves the completeness of the error handling. Returning an emptystd::errc
for success is appropriate, as it indicates no error.Consider adding a brief comment to explain that an empty
std::errc
represents success:case IRErrorCode_Success: + // Return an empty std::errc to indicate success (no error) return {};
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (3)
35-36
: LGTM! Consider adding a tracking issue for the TODO.The addition of the new error code for auto-generated keys is appropriate and aligns with the current state of implementation. The TODO comment provides clarity on the temporary nature of this change.
Consider creating a GitHub issue to track the removal of this error code once auto-generated keys are fully supported. This will help ensure it's not overlooked in future development.
68-69
: LGTM! Ensure consistency in error handling.The addition of the new error code for auto-generated keys is appropriate and consistent with the changes made to the
deserialize_ir_unit_schema_tree_node_insertion
function.For consistency, consider using the same wording for the error description as in the
deserialize_ir_unit_schema_tree_node_insertion
function. This will make it easier to search for and remove these temporary measures in the future.
Line range hint
1-84
: Summary: Changes align with PR objectives and provide clear path for future development.The modifications to this file are minimal and focused, adding error codes for handling auto-generated keys in two deserialization functions. These changes align well with the PR objectives, particularly the note that support for auto-generated keys is not yet fully integrated.
The use of TODO comments provides clear guidance for future development, ensuring that these temporary measures are removed once full support is implemented.
As the project moves forward with implementing support for auto-generated keys, consider creating a centralized list or document of all places where temporary measures have been put in place. This will facilitate a smoother transition when the feature is fully implemented and help ensure no temporary code is left behind.
components/core/src/clp/ffi/ir_stream/protocol_constants.hpp (4)
16-16
: LGTM: Version update aligns with PR objectives.The update to
BetaVersionValue
from "0.1.0-beta" to "0.1.0-beta.1" is consistent with the PR objectives. This change indicates a minor update to the beta version.Consider updating any relevant documentation or changelogs to reflect this version change and its implications for IR stream compatibility.
70-72
: LGTM: Improved naming convention and added support for larger IDs.The renaming of
SchemaTreeNodeParentIdUByte
andSchemaTreeNodeParentIdUShort
toEncodedSchemaTreeNodeParentIdByte
andEncodedSchemaTreeNodeParentIdShort
respectively, along with the addition ofEncodedSchemaTreeNodeParentIdInt
, improves clarity and extends support for larger IDs. These changes align well with the PR objectives.Consider adding a brief comment explaining the purpose of these constants and when each should be used, to improve code maintainability.
74-76
: LGTM: Consistent naming improvements and extended support for larger key IDs.The renaming of
KeyIdUByte
andKeyIdUShort
toEncodedKeyIdByte
andEncodedKeyIdShort
respectively, along with the addition ofEncodedKeyIdInt
, maintains consistency with the SchemaTreeNodeParentId constants and extends support for larger key IDs. These changes are in line with the PR objectives.Consider grouping these constants with the SchemaTreeNodeParentId constants and adding a brief comment explaining their relationship and usage, to improve code organization and maintainability.
Line range hint
1-101
: Summary: Consistent and purposeful updates to protocol constants.The changes to this file are well-aligned with the PR objectives:
- The beta version has been updated, reflecting the changes to the IR format.
- Constants for SchemaTreeNodeParentId and KeyId have been renamed for clarity and consistency.
- New constants have been added to support larger IDs (Int versions).
These modifications enhance the protocol's capabilities while maintaining its overall structure. The changes appear to be consistent and purposeful.
As the protocol evolves, consider creating a separate document or comment block that outlines the purpose and relationships between these constants. This would help maintain a clear understanding of the protocol's structure and aid in future development.
components/core/src/clp/type_utils.hpp (2)
97-102
: Use '!' operator instead of 'false ==' for clarityIn the definition of the
IntegerType
concept, it's more idiomatic in C++ to use the logical NOT operator!
rather than comparing tofalse
. This enhances readability and follows common C++ coding practices.Apply this diff to make the change:
template <typename integer_t> -concept IntegerType = std::is_integral_v<integer_t> && false == std::is_same_v<integer_t, bool>; +concept IntegerType = std::is_integral_v<integer_t> && !std::is_same_v<integer_t, bool>;
97-102
: Clarify the documentation to mention exclusion ofbool
The comment for the
IntegerType
concept could be expanded to explicitly mention that it excludesbool
, asbool
is considered an integral type in C++. This clarification will improve understanding for other developers reviewing the code.Consider updating the comment as follows:
/** - * Concept for integer types. + * Concept for integer types excluding `bool`. */components/core/src/clp/ffi/ir_stream/utils.hpp (2)
Line range hint
161-179
: Consider unifying error handling mechanisms for consistencyIn
deserialize_int
, the function returns abool
to indicate success or failure, while higher-level deserialization functions returnOUTCOME_V2_NAMESPACE::std_result
types for error handling. For consistency and improved error propagation, consider modifyingdeserialize_int
to return astd_result
. This change would allow the function to convey specific error codes and integrate seamlessly with the overall error handling strategy.
289-298
: Handling unexpectedlength_indicator_tag
values gracefullyIn
deserialize_and_decode_schema_tree_node_id
, when an unknownlength_indicator_tag
is encountered, the function returnsstd::errc::protocol_error
. While this indicates an error, consider providing more detailed diagnostic information or logging to facilitate debugging. This could help identify issues with corrupted data or incompatible protocol versions more effectively.components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (4)
47-50
: Align function documentation withstd_result
return typeThe documentation refers to returning an error code, but since the function now returns a
std_result
, consider updating the@return
comments to reflect this. This clarifies that the function returns either a value or an error encapsulated withinstd_result
.Suggestion: Modify the
@return
section to indicate the use ofstd_result
and describe what the pair contains upon success.
101-103
: Update documentation to reflectstd_result
usageThe function
deserialize_schema
now returns astd_result<Schema>
. To maintain clarity, update the documentation to reflect that it returns astd_result
containing either the deserialized schema or an error.Suggestion: Revise the
@return
comment to accurately describe thestd_result
return type and possible error conditions.
170-170
: Incomplete function documentationThe documentation for
is_log_event_ir_unit_tag
lacks a brief description of the function's purpose. Including this would enhance readability and maintain consistency with the rest of the codebase.Suggestion: Add a brief description of what the function checks for.
Example:
/** * Checks if the given tag represents the start of a log event IR unit. * @param tag * @return Whether the given tag can be a valid leading tag of a log event IR unit. */
175-178
: Inconsistent comment style for function documentationThe function
is_encoded_key_id_tag
uses/* ... */
for comments. To maintain consistency and support automatic documentation generation, consider using Doxygen-style comments/** ... */
.Suggestion: Update the comment block to Doxygen style.
/** * Determines if the given tag represents a valid encoded key ID. * @param tag * @return Whether the given tag represents a valid encoded key ID. */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
- components/core/src/clp/ffi/ir_stream/Serializer.cpp (3 hunks)
- components/core/src/clp/ffi/ir_stream/Serializer.hpp (1 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (9 hunks)
- components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/protocol_constants.hpp (2 hunks)
- components/core/src/clp/ffi/ir_stream/utils.cpp (1 hunks)
- components/core/src/clp/ffi/ir_stream/utils.hpp (6 hunks)
- components/core/src/clp/type_utils.hpp (1 hunks)
- components/core/tests/test-ir_encoding_methods.cpp (3 hunks)
🧰 Additional context used
📓 Learnings (1)
components/core/src/clp/ffi/ir_stream/utils.cpp (2)
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210 Timestamp: 2024-10-08T15:52:50.753Z Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
Learnt from: LinZhihao-723 PR: y-scope/clp#549 File: components/core/src/clp/ffi/ir_stream/Deserializer.hpp:209-210 Timestamp: 2024-10-01T07:59:15.290Z Learning: Until #486 is merged with customized error code support, introducing a layer from `IRErrorCode` to `std::errc` is unnecessary.
🔇 Additional comments (22)
components/core/src/clp/ffi/ir_stream/protocol_constants.hpp (1)
70-76
: Verify the impact of constant changes on the codebase.The renaming of existing constants and addition of new ones may have a significant impact on the codebase. Please ensure that:
- All references to the old constant names (
SchemaTreeNodeParentIdUByte
,SchemaTreeNodeParentIdUShort
,KeyIdUByte
,KeyIdUShort
) have been updated throughout the codebase.- The new constants (
EncodedSchemaTreeNodeParentIdInt
,EncodedKeyIdInt
) are properly utilized in the serialization/deserialization logic.- Any documentation or comments referencing these constants have been updated.
To assist in this verification, you can run the following script:
✅ Verification successful
Impact of Constant Changes Verified Successfully
All references to the old constant names have been removed, and the new constants are correctly utilized in the serialization/deserialization logic. No issues were found in the documentation or comments regarding these changes.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining references to old constant names and verify usage of new constants # Check for old constant names echo "Checking for old constant names:" rg --type cpp "SchemaTreeNodeParentIdUByte|SchemaTreeNodeParentIdUShort|KeyIdUByte|KeyIdUShort" # Check usage of new constants echo "Checking usage of new constants:" rg --type cpp "EncodedSchemaTreeNodeParentIdInt|EncodedKeyIdInt" # Check for potential missed updates in comments or documentation echo "Checking for potential missed updates in comments or documentation:" rg --type cpp -i "schema.*tree.*node.*parent.*id|key.*id"Length of output: 17483
components/core/src/clp/ffi/ir_stream/Serializer.hpp (1)
106-106
: Improved documentation, but potential inconsistency with method signatureThe updated documentation provides better clarity on the method's behaviour. However, there seems to be an inconsistency between the new documentation and the method signature.
- The documentation now states that the method forwards the return values of
encode_and_serialize_schema_tree_node_id
.- The method signature still shows a boolean return type:
[[nodiscard]] auto serialize_key(SchemaTree::Node::id_t id) -> bool;
Please verify the following:
- Does the method implementation match the new documentation (i.e., does it forward the return values of
encode_and_serialize_schema_tree_node_id
)?- If so, should the method signature be updated to reflect this change?
To assist in verification, you can run the following script:
If the implementation doesn't match the new documentation or if the method signature needs to be updated, please make the necessary adjustments to ensure consistency across the documentation, implementation, and method signature.
✅ Verification successful
Documentation and implementation are consistent. Verification successful.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check the implementation of serialize_key method # Search for the serialize_key method implementation rg --type cpp -A 10 "auto.*serialize_key.*SchemaTree::Node::id_t.*\)" components/core/src/clp/ffi/ir_stream/Length of output: 2190
components/core/src/clp/type_utils.hpp (1)
103-107
: Definition ofSignedIntegerType
concept is correctThe
SignedIntegerType
concept correctly extendsIntegerType
by ensuring the type is signed. The implementation is accurate and aligns with the intended functionality.components/core/src/clp/ffi/ir_stream/utils.hpp (6)
9-9
: Appropriate inclusion of necessary headersThe inclusion of
<utility>
,<outcome/single-header/outcome.hpp>
,"../../type_utils.hpp"
, and"../SchemaTree.hpp"
provides the required declarations and functionalities used in this file.Also applies to: 13-13, 18-19
41-41
: Enhancing type safety with conceptsBy changing the template parameters from
typename integer_t
toIntegerType integer_t
, you are utilizing concepts to enforce thatinteger_t
satisfies theIntegerType
constraint. This enhances type safety and reduces the likelihood of improper types being used with these serialization functions.Also applies to: 51-51
77-84
: Addition of utility function for one's complement computationThe introduction of the
get_ones_complement
function offers a clear and concise way to compute the one's complement of an integer without implicit promotions. This utility function improves code readability and reusability.
225-280
: Well-structured functions for schema tree node ID serialization and deserializationThe addition of templated functions for encoding and decoding schema tree node IDs enhances flexibility and code reuse. Utilizing template parameters to handle auto-generated versus user-generated IDs, along with varying length indicators, is a robust and scalable approach.
165-166
:⚠️ Potential issueVerify availability of
bit_cast
in the target C++ standardThe use of
bit_cast<int8_t>(value)
in the serialization of one-byte integers depends on the availability ofstd::bit_cast
, which is a C++20 feature. Please ensure that:
- The project is compiled with C++20 support.
- The
<bit>
header is included to providestd::bit_cast
.If C++20 is not being used, consider providing a custom implementation of
bit_cast
or using an alternative method to achieve the desired functionality.
247-264
:⚠️ Potential issuePotential issues with signedness and integer overflows
In
encode_and_serialize_schema_tree_node_id
, there may be concerns regarding:
- Signedness Mismatch: Comparing
node_id
(possibly an unsigned type) withINT8_MAX
,INT16_MAX
, andINT32_MAX
could lead to unexpected behaviour ifnode_id
exceeds these limits. Ensure thatnode_id
is of a signed type compatible with these comparisons.- Integer Overflows: Casting
node_id
toint8_t
,int16_t
, orint32_t
without ensuring it fits within these types might result in overflows or data loss.Consider adding static assertions or explicit checks to prevent integer overflows and ensure that
node_id
falls within the representable range of the target types before casting.components/core/src/clp/ffi/ir_stream/Serializer.cpp (3)
406-415
: Refactored serialization of schema tree node parent IDsThe code now uses the templated function
encode_and_serialize_schema_tree_node_id
to serialize schema tree node parent IDs. This change reduces code duplication and enhances maintainability by consolidating serialization logic.
424-428
: Simplified key serialization with templated functionThe
serialize_key
method now employsencode_and_serialize_schema_tree_node_id
, streamlining the serialization process for key IDs. This improves code consistency and reusability across the serialization functions.
492-528
: Added explicit template instantiations for serializer methodsExplicitly instantiating the
Serializer
methods for botheight_byte_encoded_variable_t
andfour_byte_encoded_variable_t
ensures that all necessary template specializations are generated. This prevents potential linker errors and clarifies the intended instantiations.components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp (10)
8-8
: Inclusion of<system_error>
is necessary forstd::errc
The addition of
#include <system_error>
is appropriate as it is required forstd::errc
used in error handling throughout the file.
52-53
: Function signature updated for improved error handlingChanging the return type of
deserialize_schema_tree_node_parent_id
tostd_result
enhances error handling by encapsulating both the result and potential errors.
105-106
: Enhanced error handling in function signatureThe updated signature of
deserialize_schema
returning astd_result<Schema>
improves error propagation and handling consistency across the codebase.
179-180
: Missing[[nodiscard]]
attribute justificationWhile the
[[nodiscard]]
attribute is used foris_encoded_key_id_tag
, it's not clear why it's necessary for a simple boolean function. Ensure this attribute is intentionally applied.Consider whether the
[[nodiscard]]
attribute is needed here. If it is, perhaps add a comment explaining its importance to prevent unintended misuse.
200-209
: Correct implementation of error handling indeserialize_schema_tree_node_parent_id
The function now correctly propagates errors using
std_result
, and the use ofir_error_code_to_errc(err)
ensures error codes are appropriately converted.
290-311
: Efficient schema deserialization with proper error handlingThe refactored
deserialize_schema
function enhances readability and error management. The check for auto-generated keys and returningstd::errc::protocol_not_supported
aligns with current support limitations.
464-464
: Correct usage ofis_encoded_key_id_tag
in conditionThe condition correctly uses
is_encoded_key_id_tag(tag)
to determine if the tag signifies the start of a log event, improving code clarity.
471-480
: Optimized and justified implementation inis_encoded_key_id_tag
The function provides a clear rationale for not using a range check, optimizing for streams with few key IDs. The explicit checks enhance performance and maintainability given the constraints.
514-521
: Appropriate handling of unsupported auto-generated keysIn
deserialize_ir_unit_schema_tree_node_insertion
, the code correctly identifies unsupported auto-generated keys and returnsstd::errc::protocol_not_supported
. This aligns with the current implementation status outlined in the PR objectives.
550-554
: Improved error propagation in log event deserializationThe use of
std_result
forschema_result
enhances error handling indeserialize_ir_unit_kv_pair_log_event
. This ensures that errors are properly propagated upwards.
Description
As explained in #556, we want to add support for auto-generated key-value pairs in our current kv pair IR format. This PR implements one's complement encoding to serialize/deserialize encoded schema tree node IDs, and use these new methods to replace the old methods used in the serializer/deserializer implementations.
The serialization/deserialization methods in this PR use templates to reduce code duplication but without introducing runtime overhead. The template is mainly used for:
Notice that the current implementation only uses user-generated node IDs' code path as the auto-generated node IDs are not yet supported. However, this PR changes the IR formats, meaning that the IR format serialized by the old serializer may not be correctly processed by the latest deserializer. As a result, this PR elevates the IR stream version number to
0.1.0-beta.1
.Validation performed
Summary by CodeRabbit
New Features
Bug Fixes
Tests