Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT adapt for scikit-learn 1.6 #1135

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

glemaitre
Copy link
Member

Adapt the code with changes of related to some changes in the developer API that became public in scikit-learn.

@glemaitre glemaitre marked this pull request as draft November 9, 2024 17:42
skrub/_dataframe/_common.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Member Author

The remaining pandas error is:

E               TypeError: Cannot use .astype to convert from timezone-naive dtype to timezone-aware dtype. Use obj.tz_localize instead or series.dt.tz_localize instead

I assume we try to cast.

Copy link
Member

@Vincent-Maladiere Vincent-Maladiere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of questions I had while reading :)

skrub/_datetime_encoder.py Outdated Show resolved Hide resolved
skrub/_dataframe/_common.py Outdated Show resolved Hide resolved
skrub/_fixes.py Show resolved Hide resolved
Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for taking care of this @glemaitre !

benchmarks/bench_minhash_batch_number.py Outdated Show resolved Hide resolved
skrub/_dataframe/_common.py Outdated Show resolved Hide resolved
skrub/_datetime_encoder.py Outdated Show resolved Hide resolved
skrub/_similarity_encoder.py Outdated Show resolved Hide resolved
skrub/_table_vectorizer.py Outdated Show resolved Hide resolved
@jeromedockes
Copy link
Member

jeromedockes commented Nov 12, 2024

@glemaitre here is the fix for the time zone failures (pandas changes the representation of time zones in 3.0):

diff --git a/skrub/_to_datetime.py b/skrub/_to_datetime.py
index 4a08013f..a14d3c8a 100644
--- a/skrub/_to_datetime.py
+++ b/skrub/_to_datetime.py
@@ -28,6 +28,8 @@ def _get_time_zone_pandas(col):
         return None
     if hasattr(tz, "zone"):
         return tz.zone
+    if hasattr(tz, "key"):
+        return tz.key
     return tz.tzname(None)

@jeromedockes jeromedockes added this to the 0.3.2 milestone Nov 14, 2024
@glemaitre glemaitre marked this pull request as ready for review November 15, 2024 09:11
@@ -323,3 +330,17 @@ def _check_params(self):
raise ValueError(
f"'resolution' options are {allowed}, got {self.resolution!r}."
)

if sklearn_below_1_6:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeromedockes The issue was that the scikit-learn tests (that are skipped) issue an error at collection because preserve_dtype was not defined.

Somehow it was not raising in the past for some reason (but it could have raised if we run the test). So we don't need to inherit from TransformerMixin, just make a similar definition of the tags.

Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot @glemaitre ! super useful that you help us keep up with scikit-learn :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants