Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-238 Add Support for nested ObjectIDs in polars conversion #220

Merged
merged 6 commits into from
Aug 7, 2024

Conversation

sibbiii
Copy link
Contributor

@sibbiii sibbiii commented Jun 17, 2024

Hi,

_arrow_to_polars currently has no support to cast extension types for nested fields.
This prohibits ObjectIDs to be read in case they are in nested fields.

I could not manage the conversion with the original code,
but I found a way to using arrow_table_without_extensions = arrow_table.cast(schema_without_extensions)
to cast the schema of the whole table in one go.

The schema_without_extensions is created recursively from the old schema.
Support for lists is still to be added, should not be that hard, maybe I try tomorrow.

I am not an expert in apache arrow. My world is Pandas and Polars.
I have wrote some unit tests locally to test the code, but I do not feel confident that I have not overlooked
something, so please review carefully.

#219

_arrow_to_polars currently has no support to cast extension types for nested fields. This prohibits ObjectIDs to be read in case they are in nested fields.
@caseyclements
Copy link
Contributor

Thank you for you submission. It looks good to me. We are waiting on Polars to support ExtensionTypes, but in the meantime, I don't see why we wouldn't add this. I cannot recall why we commented out the list and struct cases before. Please give us a few days to review.

Here is the link to the mongo-arrow task: https://jira.mongodb.org/browse/ARROW-202. It contains links to the Polars issues.

@caseyclements
Copy link
Contributor

Hi @sibbiii . I'm sorry for the delay. I've been very busy. Would you please add a couple tests of this new functionality?

@lazargugleta lazargugleta force-pushed the patch-1 branch 2 times, most recently from 14933f8 to c771c0d Compare July 1, 2024 18:43
@lazargugleta
Copy link
Contributor

lazargugleta commented Jul 1, 2024

Hey @caseyclements ,
I extended the existing test for _arrow_to_polars with lists and structs.
Feel free to let me know if you need anything else.

Copy link
Member

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@blink1073 blink1073 merged commit c58ed2f into mongodb-labs:main Aug 7, 2024
36 of 41 checks passed
@blink1073 blink1073 changed the title Support for nested ObjectIDs in polars conversion ARROW-238 Add Support for nested ObjectIDs in polars conversion Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants