Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Remove toolz dependency #3426

Merged
merged 34 commits into from
Jun 19, 2024
Merged

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented May 26, 2024

This PR is a proof-of-concept, entirely* removing the hard dependency on toolz.
*Excluding usage in tests and twice for a deprecation warning.

I've tried to provide reasoning on each commit, but generally, it seemed to me that the behaviour altair needs is easily replicated within stdlib.

Additional Notes

  • pipe and curry were the only imports
    • pipe is trivial to replace.
    • curry seems overkill when the only non-default argument is always data
  • While this is documented under Advanced Usage, I haven't come across any more complex examples.
    • I'd imagine users could see a benefit to performance, given the complexity of curry.
    • If anyone has some sample code that heavily uses data transformers, it could be helpful to benchmark any changes.

If there was any desire to pursue this further, in b9dc070 I mentioned that this could be implemented as a decorator.
I stopped short of that for now, but would be happy to go down that route to simplify the @overload usage.

Edit

@mattijn is this worth opening an issue for discussion, or can that take place here?

AFAIK the last remaining quirks would be:

  • Handling the existing deprecation functions in bf99d72.
    • I would assume 4 years is enough for a scheduled removal.
    • Otherwise these could use proper optional imports (rather than the nesting solution I've tentatively used).
  • Updating pyproject.toml.
    • Not entirely sure on the interaction with existing dependency groups.
    • Just want to highlight it as an extra item to-do.

- Made `SupportsGeoInterface` usable for runtime checking
  - Simplifies type narrowing logic and aligns with `DataFrameLike`
- Adds `is_data_type`
  - Used to reduce code duplication, specifically where `pipe` was previously referenced.
- Avoids the `None` case introduced by `DataTransformerType | None`.
Currying a single argument function is equivalent to referencing the function itself.
This only would've added performance overhead.
NOTE: Unsure why this differs from `vegalite.v5.api._prepare_data`. Have only removed the dependency.
…sGeoInterface`

Now possible due to `@runtime_checkable`
Added missing `sys` import from earlier commit
This pattern is what I've used in most cases. It works and type checks correctly, but could be improved as a decorator.
Would have the benefit of preserving the return type, if one were to inspect the intermediate function.

Also replacing the `TypeVar` with a `Union` seems to have satisfied `mypy`.
Same pattern as `limit_rows`. Also aliased the return type, aligning with the other transformers.
- Not super happy with the added length, but was difficult to avoid given the number of arguments.
- Moved all the generic parts to new function `_to_text`.
- Replaced `os.path` with `pathlib.Path`. Can be identified (but not autofixed) with [PTH](https://docs.astral.sh/ruff/rules/#flake8-use-pathlib-pth).
  - Although this might be better handled by [urllib](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin)
The nested imports are purely for a deprecation warning introduced in [4.10.0](vega@af4148d).

4 years on and over a major version later, would these be due for removal?
- Both `@curried.curry` and `curried.pipe` no longer needed
- Originally rewrote a generic curried pipe, but given that the two functions are explicitly named here - it seemed excessive.
- Required a new alias `NonLikeDataType` to resolve the overloads.
- Merged the two default branches
- `max_rows` should probably be removed since it is never used
The original describes a function before currying, but not the defered variant.
Switching from `curry` -> `functools.partial` is trivial.
Everything else I'm not too sure on. Had to battle `mypy` to arrive at this and not really happy with the result.

The change to `plugin_type`'s default should mean the assert in `register` is somewhat more useful.
This is valid for the `assert`. An alternative would be using `callable` from builtins, or reusing `PluginType`'s bounds/constraints.
@dangotbanned dangotbanned marked this pull request as ready for review May 28, 2024 10:00
@dangotbanned dangotbanned changed the title refactor(DRAFT): Remove toolz dependency refactor: Remove toolz dependency Jun 1, 2024
@mattijn
Copy link
Contributor

mattijn commented Jun 6, 2024

Hi @dangotbanned, this looks good! Thanks for working on this. Reducing the hard dependencies where possible is much welcome.
If types and tests are happy than we are very far in the progress already.
Could you explain why this PR requires the introduction of a new ._to_text() function, see here: https://github.com/vega/altair/pull/3426/files#diff-478a9046db15a358587cf77454cd2cf9cc4723435276cad0864c268d17e07650R295. It seems to be introduced during the development process?

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 6, 2024

Hi @dangotbanned, this looks good! Thanks for working on this. Reducing the hard dependencies where possible is much welcome.

Thanks @mattijn!

Could you explain why this PR requires the introduction of a new ._to_text() function, see here: #3426 (files). It seems to be introduced during the development process?

Sure thing.

So, previously to_csv, to_json had almost identical code and required 2x TypedDict's per function:

  • to_csv
    • _CsvFormatDict
    • _ToCsvReturnUrlDict
  • to_json
    • _JsonFormatDict
    • _ToJsonReturnUrlDict

I split out the generic behaviour to _to_text and simplified the types (4 -> 2) to reduce code duplication.

This helped with managing keyword-only args for line 251/line 287 which would have previously been curried.

The addition of _to_text isn't required, but as a comparison, below is the additional code without it (not including the overloads):

def to_json(
    data: Optional[DataType] = None,
    prefix: str = "altair-data",
    extension: str = "json",
    filename: str = "{prefix}-{hash}.{extension}",
    urlpath: str = "",
) -> Union[partial, _ToFormatReturnUrlDict]:
    """
    Write the data model to a .json file and return a url based data model.
    """
    if data is None:
        return partial(
            to_json, # diff 1
            prefix=prefix,
            extension=extension,
            filename=filename,
            urlpath=urlpath,
        )
    else:
        data_str = _data_to_json_string(data) # diff 2
        data_hash = _compute_data_hash(data_str)
        filename = filename.format(prefix=prefix, hash=data_hash, extension=extension)
        Path(filename).write_text(data_str)
        url = str(Path(urlpath, filename))
        return _ToFormatReturnUrlDict({"url": url, "format": _FormatDict(type="json")}) # diff 3

def to_csv(
    data: Optional[NonGeoDataType] = None,
    prefix: str = "altair-data",
    extension: str = "csv",
    filename: str = "{prefix}-{hash}.{extension}",
    urlpath: str = "",
) -> Union[partial, _ToFormatReturnUrlDict]:
    """Write the data model to a .csv file and return a url based data model."""
    if data is None:
        return partial(
            to_csv,
            prefix=prefix,
            extension=extension,
            filename=filename,
            urlpath=urlpath,
        )
    else:
        data_str = _data_to_csv_string(data)
        data_hash = _compute_data_hash(data_str)
        filename = filename.format(prefix=prefix, hash=data_hash, extension=extension)
        Path(filename).write_text(data_str)
        url = str(Path(urlpath, filename))
        return _ToFormatReturnUrlDict({"url": url, "format": _FormatDict(type="csv")})

I am happy to revert back to this, but thought it might be more maintainable in the future - while still preserving the public interface.

This preserves the existing `AltairDeprecationWarning` but removes the final traces of `toolz` dependency in `altair`.
Forgot I replaced a literal string with regex
@mattijn
Copy link
Contributor

mattijn commented Jun 6, 2024

Thanks for the explanation! That makes sense, leave it in please.

I'm in favor of removing the functions that have been officially deprecated for such a long time.

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 6, 2024

@mattijn No rush on the review, just wanted to update that all traces of toolz are now gone 😅

@mattijn
Copy link
Contributor

mattijn commented Jun 12, 2024

Hi @dangotbanned. Since you have multiple PRs open, do you have a certain order in mind how we should review and merge these?

@dangotbanned
Copy link
Member Author

Hi @dangotbanned. Since you have multiple PRs open, do you have a certain order in mind how we should review and merge these?

#3426

Hi @mattijn this one is ready to go.

I'm in favor of removing the functions that have been officially deprecated for such a long time.

I'm not sure if you wanted me to remove pipe, curry, the new optional import and tests - or if the removal needs any more input from other maintainers.

Personally, I agree and think they should be removed.

Other PRs

#3431

  • I'm pretty happy with where it's at now, but the changes I've made in the tools package relating to Optional, Parameter, SchemaBase likely need further review.
  • They were somewhat "hacky" so as to not alter the surrounding code, and someone more familiar with the code generation may be able to improve upon it.
  • This one (I think) has addressed all the points from @binste original review

#3427

  • I've left this on draft, not ready yet, but would be good to hear your thoughts on a couple of things
  1. alt.when signature
    • To preserve compatibility with alt.condition, the first predicate arg behaves identically - but *more_predicates is more restrictive in the types allowed - as they all must implement &.
    • I've left this to bake before writing docs, as I'm not sure whether the trade-off in added complexity (for the user) is worth preserving identical semantics as alt.condition for a single argument.
  2. The new parts added for alt.expr comment which are "an" implementation, but not sure it would be the right way to go.

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 14, 2024

@mattijn

I would be I favor of removing pipe and curry as well and instead of a deprecation warning use a deprecation error.

@joelostblom

I am also in favor of removing pipe and curry if we can replicate the functionality with stlib.

Thank you both for reviewing. All of your comments should be resolved now.

I have added comments suggesting the parts to remove. This excludes the imports, which would be highlighted by any IDE anyway afterwards.

Should be ready to merge after 650ee0e finishes

@mattijn
Copy link
Contributor

mattijn commented Jun 18, 2024

I think we need something else if we want a NonGeoDataType type, since the following supports the SupportsGeoInterface and also is postive as a NonGeoDataType being:

from altair.utils.data import NonGeoDataType

from shapely import geometry
import geopandas as gpd

data_geoms = [
    {"color": "#F3C14F", "geometry": geometry.Polygon([[1.45, 3.75], [1.45, 0], [0, 0], [1.45, 3.75]])},
    {"color": "#4098D7", "geometry": geometry.Polygon([[1.45, 0], [1.45, 3.75], [2.57, 3.75], [2.57, 0], [2.33, 0], [1.45, 0]])},
    {"color": "#66B4E2", "geometry": geometry.Polygon([[2.33, 0], [2.33, 2.5], [3.47, 2.5], [3.47, 0], [3.2, 0], [2.57, 0], [2.33, 0]])},
    {"color": "#A9CDE0", "geometry": geometry.Polygon([[3.2, 0], [3.2, 1.25], [4.32, 1.25], [4.32, 0], [3.47, 0], [3.2, 0]])},
]

gdf_geoms = gpd.GeoDataFrame(data_geoms)
isinstance(gdf_geoms, SupportsGeoInterface), isinstance(gdf_geoms, NonGeoDataType)
(True, True)

Example data taken from docs here: https://altair-viz.github.io/user_guide/data.html#spatial-data-gdf

@dangotbanned
Copy link
Member Author

dangotbanned commented Jun 18, 2024

I think we need something else if we want a NonGeoDataType type, since the following supports the SupportsGeoInterface and also is postive as a NonGeoDataType being:

@mattijn I think this would be unavoidable given that GeoDataFrame is a subclass of pd.DataFrame and python's typing doesn't support discriminated unions.

So it would pass NonGeoDataType = Union[dict, pd.DataFrame, DataFrameLike]

  • DataFrameLike as it has .__dataframe__
  • pd.DataFrame as it is a subclass

I'm only finding this out now btw, in 8a9d593 I just attached a name to an existing type annotation.

For the actual usage in to_csv, SupportsGeoInterface is rejected in https://github.com/dangotbanned/altair/blob/ed13af6d4598c96d1f5e5a816b95a85ff82f81f3/altair/utils/data.py#L376.

So the impact of this would be static type checking doesn't pick up the incompatibility - but at runtime the existing code does.

@mattijn
Copy link
Contributor

mattijn commented Jun 18, 2024

Thanks for responding! Let me think about it too. Just wondering, do we really need this NonGeoDataType? Maybe this constrain within to_csv is too conservative?

Spotted while testing a solution for [comment](vega#3426 (comment)).

This isn't a solution for that, but ensures the overloads solve correctly.
@dangotbanned
Copy link
Member Author

Just wondering, do we really need this NonGeoDataType?

I've no strong feelings either way @mattijn.

The only thing I'd add to #3426 (comment) is that having an alias here at least hints the user towards something that we can't express as a type - if that makes sense?

I wouldn't expect passing a GeoDataFrame would work as a parameter labelled NonGeo....

Looking back to the original signature, there is no indication that to_csv doesn't accept SupportsGeoInterface:

altair/altair/utils/data.py

Lines 198 to 205 in 596b282

def to_csv(
data: Union[dict, pd.DataFrame, DataFrameLike],
prefix: str = "altair-data",
extension: str = "csv",
filename: str = "{prefix}-{hash}.{extension}",
urlpath: str = "",
) -> _ToCsvReturnUrlDict:
"""Write the data model to a .csv file and return a url based data model."""

@mattijn
Copy link
Contributor

mattijn commented Jun 19, 2024

I've opened #3441. Maybe we can rephrase the NotImplementedError with a link tot that github issue. Then we don't need the newly introduced NonGeoDataType type in this PR. Thanks for still working on this @dangotbanned, almost there!

@mattijn
Copy link
Contributor

mattijn commented Jun 19, 2024

Tests are happy too after synching with main. Merging! Thanks @dangotbanned!

@mattijn mattijn merged commit 76a9ce1 into vega:main Jun 19, 2024
11 checks passed
@dangotbanned dangotbanned deleted the remove-toolz-depend branch June 19, 2024 08:59
binste added a commit that referenced this pull request Jun 27, 2024
…ormance (#3431)

* ci: Add additional `ruff` rules to `pyproject.toml`

Only highly autofixable groups from [#3430](#3430).
Aiming to refine further, depending on how much manual fixing this generates.

* fix: add item to `pyproject.toml` to silence import errors for `pylance|pyright`

* ci: add subset of `SIM` to `extend-safe-fixes`

* ci: add unfixable `RUF` rules to `ignore`

[RUF002](https://docs.astral.sh/ruff/rules/ambiguous-unicode-character-docstring/) may be avoidable during schema generation, so I'm not doing any manual fixes right now.

* refactor: apply new `ruff` rules, fix and reformat

* test: Skip tests on Win that require a tz database

See apache/arrow#36996

* ci: enable `tool.ruff.lint.preview`

Testing the effect this has with existing rules, prior to adding `pylint`, `refurb`

* fix: replace [F841](https://docs.astral.sh/ruff/rules/unused-variable/) violations with dummy variable

* ci: add additional `preview` fixes to `extend-safe-fixes`

* refactor: apply `preview` fixes for existing `ruff` rules, reformat

* ci: add `preview` category `FURB` rules

Almost all have autofixes, splitting out from [pylint](https://docs.astral.sh/ruff/rules/#pylint-pl) which is much larger in scope + fewer fixes

* refactor: apply `FURB` rule fixes, manually fix `FURB101/3`

* fix: Revert newer fstring syntax, not available to `sphinx`

* ci: add fixable `pylint` rules

Across the 4 `PL.` categories, there are over 50 rules, with a mix of fixable, preview.
Tried to be picky with what is added, because adding even 1 of the groups would generate a lot of manual fixes.

* ci: add `pylint` fixes to `extend-safe-fixes`

* refactor: apply `pylint` rule fixes, add an inline optimization for `_selection`

`param_kwds` in `_selection` triggered `PLR6201`. Instead rewrote as a dictcomp.

* fix: Recover comments lost during linting

#3431 (comment)

* fix: Replace sources of `RUF002` violations

* fix: manual fix `RUF002` in `api`

* ci: add `PTH` rules

`flake8-use-pathlib` rules all require manual fixes.
> Found 70 errors.

<details>
<summary>Details</summary>

```md
20      PTH118  `os.path.join()` should be replaced by `Path` with `/` operator
12      PTH120  `os.path.dirname()` should be replaced by `Path.parent`
11      PTH100  `os.path.abspath()` should be replaced by `Path.resolve()`
11      PTH123  `open()` should be replaced by `Path.open()`
 7      PTH110  `os.path.exists()` should be replaced by `Path.exists()`
 6      PTH107  `os.remove()` should be replaced by `Path.unlink()`
 3      PTH103  `os.makedirs()` should be replaced by `Path.mkdir(parents=True)`
```

* refactor: Manually fix `PTH` rule violations

* fix: Use safer `inspect.getattr_static` in `update_init_file`

Avoids [triggering](https://docs.python.org/3/library/inspect.html#inspect.getattr_static) on objects like `datum`, `expr`

* fix: Use correct f-string `!s` modifier

* fix: Resolve `doc:build-html` error

```log
Extension error (sphinxext.altairgallery): Handler <function main at 0x7f0873471ab0> for event 'builder-inited' threw an exception (exception: unsupported operand type(s) for +: 'PosixPath' and 'str')
```

* fix: Resolve utf-8 error for [emoji example](https://altair-viz.github.io/gallery/isotype_emoji.html)

* Update sphinxext/altairgallery.py

Co-authored-by: Stefan Binder <[email protected]>

* build: bump `docbuild.yml` python `3.10` -> `3.12`

See @binste [comment](#3431 (comment))

* style: Simplify `PluginRegistry` repr

#3431 (comment)

* revert: ignore `suppressible-exception` rule in `pyproject.toml`

#3431 (comment)

* refactor: remove need for exception handling in `utils._vegafusion_data.get_inline_tables`

I'm happy to revert back to the original `try-catch`, but think it was less both less efficient and understandable.

If I've understood correctly:
- Every time this was called, all tables would be iterated over
- On the first call, tables processed by the `vegafusion_data_transformer` would popped and returned
- On all calls, other tables would be ignored but always trigger exceptions
- On any call other than the first, every table triggers an exception

This behaviour is easily reduced using [set methods](https://docs.python.org/3/library/stdtypes.html#set) directly, as an empty `set` would produce an empty `dict` during the `dictcomp`.

#3431 (comment)

* refactor: remove `python<3.3` compat code in `utils.core.use_signature`

Was replaced with `contextlib.suppress` in a3fd77a but targets long-unsupported python versions.

For future reference, there are also related `stdlib` functions that may have not been available when this was written:
- [inspect.getdoc](https://docs.python.org/3/library/inspect.html#inspect.getdoc)
- [functools.wraps](https://docs.python.org/3/library/functools.html#functools.wraps)

Wouldn't cover everything there, but may replace the need for some parts.

#3431 (comment)

* revert: replace `PLW1514` fixes with "utf-8"

The rule remains in `pyproject.toml`, but is no longer autofixed with undesirable `locale.getprefferedencoding(False)`.

#3431 (comment)

* refactor: minor simplify and sort `__all__` from `generate_schema_wrapper`

* docs: `FA100` on most of `tools/`*`

The start of a cautious introduction of `from __future__ import annotations`.
Excludes `schemaapi.schemaapi.py` for now, but I think that would be safe as well.

* docs: `FA100` on `sphinxext/*`

* docs: `FA100` on `alt.jupyter`

* docs: `FA100` on `alt.expr`

* docs: `FA100` on `alt.vegalite`, excluding `v5.api`, `v5.schema/*`

- Covers non-autogenerated code.
- `api` is being special-cased for now.
  - Will be a great candidate for these fixes, but there are many simpler modules in `alt.utils` that I'm priotizing in case they highlight issues

* docs: `FA100` on private `alt.utils` modules

* docs(typing): Add annotations for `sphinxext`

- Making use of `FA100` enabled syntax
- Eventually would like to improve the performance of this, since `sphinxext` is by far the longest build step
- Towards that end, the annotations give a contextual starting point

* docs: `FA100` on public `alt.utils` modules, excluding `schemapi`

- Also did made some improvements to annotations more broadly, now that the newer syntax is allowed

* ci: update `pyproject.toml`, add new granular `tool.ruff.lint.per-file-ignores` setting

- So far been making the changes one file at a time, followed by tests
- Not encountered any runtime issues
- `mypy` has complained occasionally, but has been resolvable

* revert: Change `write_file_or_filename` default `encoding` to `None`

* docs: `FA100` on `tools.schemapi.schemapi`

Also adds some annotations that are now possible without circular import concerns

* feat(typing): add `from __future__ import annotations` for each file in `generate_schema_wrapper`

Also:
- ensure `annotations` excluded from `__all__`
- minor standardizing import order
- adjust `ruff` rules to correct schemagen

* ci(typing): adds `generate-schema-wrapper` script to `hatch`

Rewrites and reformats all annotations in `altair/vegalite/v?/schema`
- Meaning that the original code stays the same:

```
>>> ruff check altair/vegalite/v?/schema --fix --exit-zero
Found 13006 errors (13006 fixed, 0 remaining).
>>> ruff format altair/vegalite/v?/schema
4 files reformatted
>>> ruff check altair/vegalite/v?/schema --statistics
```

* refactor(typing): Improve annotations and narrowing in `schemapi`

- Changes are mainly on `_FromDict.from_dict`
  - Avoids type errors due to multiple reassignments
  - Adds a range of overloads that preserve the `SchemaBase` subclass
    - Not possible in all cases
  - replacing `cls` with `tp` - when used in an instance method
    - The former is quite strange to see outside of a `@classmethod`
    - Especially in a class that has a `@classmethod`
- updated `_default_wrapper_classes` to match `_subclasses` return type in `generate_schema_wrapper`

* build: run `generate-schema-wrapper`

The cumulative effect of `FA100`, `UP`, `TCH` rules applied to `v5.schema`.

All tests pass!

* revert(typing): Roll back runtime evaluated types not possible on older python

`from __future__ import annotations` applies in annotation scope, but not in aliases. Until `altair` requires `python>=3.9-10`, some syntax will need to differ - like many of the changes seen in this commit.

# Note
The current `ruff` config will only upgrade syntax in annotation scope, until `tool.ruff.target-version` "unlocks" new features. These were changes I made manually, not having tested against all versions - which are being reverted.

* fix(typing): Adds overloads and reduce ignore comments relating to `spec_to_mimebundle`

The repetition in `utils.save` is unfortunate, but is required to satisfy mypy. The issues are from overloading `spec_to_mimebundle` with `tuple/dict` as each has different indexing constraints.

A longer term solution would be to adding an intermediate function to return the nested value. Due to `spec_to_mimebundle`'s use elsewhere operating directly on the full output.

* revert(typing): Roll back additional runtime evaluated types for `python=3.8`

* fix(typing): Use `...` for `DataTransformerType`

Original `pass` keyword was equivalent to a `None` return

* docs(typing): `FA100` on `alt.vegalite.v5.api` and manual annotation rewrites

### `api`
A unique, but significant, difference to the other commits is the large number of module prefixed annotations that can be shortened.
`TopLevelMixin.project` shows the greatest reduction in combined characters for annotations, but there are small improvements throughout.

I've made all of these `TYPE_CHECKING`-only imports explicit, to highlight a new path forward this opens up. There are many repeat/similar `Union` types, which can now be declared in another module - without concern for circular import issues.

### `v5.__init__`
These changes have also removed the need for both the import of `expr` in `api`, and the `# type: ignore[no-redef]` in `v5.__init__`.

* test: add pytest rules `PT` and fix `PT001` violations

https://docs.astral.sh/ruff/rules/pytest-fixture-incorrect-parentheses-style/

* test: add ignores for `PT011` on existing tests

Probably useful to enforce on new code, https://docs.astral.sh/ruff/rules/pytest-raises-too-broad/

* test: fix `PT018` violation

https://docs.astral.sh/ruff/rules/pytest-composite-assertion/

* test: fix `PT006` violations

https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/

* test: add ignore for `PT012` violation

Not sure how this could be rewritten. https://docs.astral.sh/ruff/rules/pytest-raises-with-multiple-statements/

* test: add `pytest.mark.xfail` for flaky `scatter_with_layered_histogram` test

Struggling to find the source of the failure, the `mark` is my best guess but currently can't reproduce the following error:

```
___________________________________________ test_compound_chart_examples[False-scatter_with_layered_histogram.py-all_rows18-all_cols18] ___________________________________________
[gw2] win32 -- Python 3.8.19 C:\Users\*\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\hatch-test.py3.8\Scripts\python.exe

filename = 'scatter_with_layered_histogram.py', all_rows = [2, 17], all_cols = [['gender'], ['__count']], to_reconstruct = False

    @pytest.mark.skipif(vf is None, reason="vegafusion not installed")
    # fmt: off
    @pytest.mark.parametrize("filename,all_rows,all_cols", [
        ("errorbars_with_std.py", [10, 10], [["upper_yield"], ["extent_yield"]]),
        ("candlestick_chart.py", [44, 44], [["low"], ["close"]]),
        ("co2_concentration.py", [713, 7, 7], [["first_date"], ["scaled_date"], ["end"]]),
        ("falkensee.py", [2, 38, 38], [["event"], ["population"], ["population"]]),
        ("heat_lane.py", [10, 10], [["bin_count_start"], ["y2"]]),
        ("histogram_responsive.py", [20, 20], [["__count"], ["__count"]]),
        ("histogram_with_a_global_mean_overlay.py", [9, 1], [["__count"], ["mean_IMDB_Rating"]]),
        ("horizon_graph.py", [20, 20], [["x"], ["ny"]]),
        ("interactive_cross_highlight.py", [64, 64, 13], [["__count"], ["__count"], ["Major_Genre"]]),
        ("interval_selection.py", [123, 123], [["price_start"], ["date"]]),
        ("layered_chart_with_dual_axis.py", [12, 12], [["month_date"], ["average_precipitation"]]),
        ("layered_heatmap_text.py", [9, 9], [["Cylinders"], ["mean_horsepower"]]),
        ("multiline_highlight.py", [560, 560], [["price"], ["date"]]),
        ("multiline_tooltip.py", [300, 300, 300, 0, 300], [["x"], ["y"], ["y"], ["x"], ["x"]]),
        ("pie_chart_with_labels.py", [6, 6], [["category"], ["value"]]),
        ("radial_chart.py", [6, 6], [["values"], ["values_start"]]),
        ("scatter_linked_table.py", [392, 14, 14, 14], [["Year"], ["Year"], ["Year"], ["Year"]]),
        ("scatter_marginal_hist.py", [34, 150, 27], [["__count"], ["species"], ["__count"]]),
        ("scatter_with_layered_histogram.py", [2, 17], [["gender"], ["__count"]]),
        ("scatter_with_minimap.py", [1461, 1461], [["date"], ["date"]]),
        ("scatter_with_rolling_mean.py", [1461, 1461], [["date"], ["rolling_mean"]]),
        ("seattle_weather_interactive.py", [1461, 5], [["date"], ["__count"]]),
        ("select_detail.py", [20, 1000], [["id"], ["x"]]),
        ("simple_scatter_with_errorbars.py", [5, 5], [["x"], ["upper_ymin"]]),
        ("stacked_bar_chart_with_text.py", [60, 60], [["site"], ["site"]]),
        ("us_employment.py", [120, 1, 2], [["month"], ["president"], ["president"]]),
        ("us_population_pyramid_over_time.py", [19, 38, 19], [["gender"], ["year"], ["gender"]]),
    ])
    # fmt: on
    @pytest.mark.parametrize("to_reconstruct", [True, False])
    def test_compound_chart_examples(filename, all_rows, all_cols, to_reconstruct):
        source = pkgutil.get_data(examples_methods_syntax.__name__, filename)
        chart = eval_block(source)
        if to_reconstruct:
            # When reconstructing a Chart, Altair uses different classes
            # then what might have been originally used. See
            # vega/vegafusion#354 for more info.
            chart = alt.Chart.from_dict(chart.to_dict())
        dfs = chart.transformed_data()

        if not to_reconstruct:
            # Only run assert statements if the chart is not reconstructed. Reason
            # is that for some charts, the original chart contained duplicated datasets
            # which disappear when reconstructing the chart.
            assert len(dfs) == len(all_rows)
            for df, rows, cols in zip(dfs, all_rows, all_cols):
>               assert len(df) == rows
E               assert 19 == 17
E                +  where 19 = len(    bin_step_5_age  bin_step_5_age_end gender  __count\n0             45.0                50.0      M      247\n1       ...       11\n17
   30.0                35.0      F        5\n18            70.0                75.0      F        1)

tests\test_transformed_data.py:132: AssertionError
```

* ci: tidy up `ruff` section of `pyproject.toml`

- removed `SIM` rules from `extend-safe-fixes`, only 1 required it and manual fixes should be fine for new code
- Provided links for new config groups/settings

#3431 (comment)

* perf: adds config `pyproject.toml` for faster build, test runs

- `pip` -> `uv`
- `pytest` single-core -> logical max

https://hatch.pypa.io/latest/how-to/environment/select-installer/#enabling-uv
https://pytest-xdist.readthedocs.io/en/latest/distribution.html#running-tests-across-multiple-cpus

Should speed up CI, although GitHub Actions only runs on 4 cores iirc

* ci: adds `update-init-file` hatch script

* ci: adds newer-style `hatch` test config

I've attempted to recreate the existing `test` scripts, but haven't quite worked out the `coverage` parts.

This provides: matrix python version testing, parallel tests (within a python version), and slightly shorter commands:
```bash
>>> hatch run test
>>> hatch test

>>> hatch run test-coverage
>>> hatch test --cover
```

* feat(typing): Use `TYPE_CHECKING` block for `schema` modules

- Removes the need for `_Parameter` protocol
- Reduce lines of code by more than 5000, by removing prefix for `SchemaBase`, `Parameter`

* test: use a more reliable `xfail` condition for flaky test

The difficulty in reproducing came from the python debugger disabling `xdist`.

* build: Embed extra `ruff` calls in `generate-schema-wrapper` into the python script

Hopefully solves: https://github.com/vega/altair/actions/runs/9456437889/job/26048327764?pr=3431

* build: Manually rewrite certain exceptions with flaky autofix

These were previously fixed by `EM` rules, but seem to not always occur - leading to CI fail

https://github.com/vega/altair/actions/runs/9466550650/job/26078482982#step:8:60

* refactor: remove now-unneeded `PARAMETER_PROTOCOL`

* refactor: replace existing references to `typing.Optional`

* refactor(typing): rename `T` TypeVar to `TSchemaBase`

* feat(typing): Adds dedicated `Optional` alias for `Union[..., UndefinedType]`

`typing.Optional` is deprecated and is no longer needed in `altair`. The new alias improves the readability of generated signatures, by reducing noise (repeating `UndefinedType`) and clearly grouping together the options for an argument.

* refactor(typing): Remove `UndefinedType` dependency in `api`

* refactor(typing): Remove `UndefinedType` dependency in `api`

* refactor(typing): Remove annotation scope `UndefinedType` dependency in `api`

Aligns `api` with `schema` changes in fbc02e2

* ci: add `W291` to ensure trailing whitespace is autofixed

* refactor: define non-relevant attributes closer to imports in `update_init_file`

Will make it easier to keep the two in sync, and now they can be copy/pasted - rather than adding an `or attr is _` expression

* feat(typing): Adds `_TypeAliasTracer` and reorders `generate_schema_wrapper` to collect unique `Literal` types

The next commit will show the changes from build

* build: run `generate-schema-wrapper` using `_TypeAliasTracer`

This commit reduces the length of each module in `schema` by ~50-70% and provides the each alias in the new `_typing` module.

## Existing art
[`pandas`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py#L199) and [`polars`](https://github.com/pola-rs/polars/blob/faf0f061eead985a8b29e4584b619fb5d8647e31/py-polars/polars/dataframe/frame.py#L105) similarly define a large number of `TypeAlias`, `Union`, `Literal` types in a single file. These definitions can be imported anywhere in the project - without concern for circular dependencies. It helps keep signatures consistent, and in many cases, simplifies those signatures such that the user can quickly understand what they represent.

## Future config
The generated names are currently:
```py
f"{SchemaInfo.title}_T"
```
so as to not conflict with the names defined in `core`/`channels`.
I have left this configurable via:
```py
_TypeAliasTracer(fmt=...)
```

Personally don't mind what the aliases are called, so if anyone has any ideas please feel free to change.

* fix: Add missing `LiteralString` import

* test: add `pytest.mark.filterwarnings` for tests that cannot avoid them

- `pandas` warnings cannot trivially be avoided, as they need to be compatible with `pandas~=0.25` and are used in examples.
- I have been unable to track down what specific calls are triggering the `traitlets` warnings. There seems to be multiple levels of indirection, but the stacklevel reported is external to `altair`.

* refactor(typing): Replace `Literal[None]` -> `None`

Came across [Literal[None] conversion](https://typing.readthedocs.io/en/latest/spec/literal.html#legal-parameters-for-literal-at-type-check-time) during #3426 (comment)

* test: Change `skipif` -> `xfail` for `test_sanitize_pyarrow_table_columns` on `win32`

Seems to be a better fit, considering this would fail for `win32` devs not using `pyarrow` but have it installed.
This could be removed in the future if installing/confirming install `tzdata` to the correct location was handled during environment activation.

Possibly resolves: #3431 (comment)

* ci: bump `actions/setup-python`, `python-version`, use `uv` in `lint.yml`

I've made quite a few improvements to performance locally like fd20f00, but only `docbuild.yml` uses `hatch` so they haven't reduced CI run time.

This is more of a test run on the simplest workflow to identify any issues.

* ci: create venv before uv pip install

* ci: ensure venv is activated after install

* ci: use environment provided by `hatch`

* ci: testing `pytest-xdist` in `build` workflow

Would be faster if using `uv`/`hatch` instead of `pip` - but I'm not sure how to untangle `pip` from this yet.

* test(perf): adds significant parallelism for slow `test_examples`

#3431 (comment)

* refactor, perf: reduce `sys.path` usage, use dictcomp in `update_init_file`

# `sys.path`
Mainly modules in `tools`, but also `tests`, `sphinxext` repeat the same pattern of manipulating `sys.path` to find imports.
In some cases, I think this wasn't need at all - but others were resolves by adding `tools.__init__.py`.

- Aligns with many of the other sub/packages
- Removes need for ignoring `E402`
- Makes many of the imports in these modules explicit, and therefore easier to follow.

#  `update_init_file`
While rewriting the imports here, I saw an opportunity to simplify gathering relevant attributes.

- `attr = getattr(alt, attr_name)` using the keys of `__dict__`, was a slow equivalent of a dictionary comprehension.
- `str(Path(alt.__file__).parent)` is a verbose `str(Path.cwd())`
  - An earlier commit of mine used this instead of `os.path`
  - This version is much shorter and obvious in what it does

* build: adding debug message to help with build failure

The previous run worked locally due to `hatch` managing `sys.path` - but not in CI, which doesn't use hatch yet

 https://github.com/vega/altair/actions/runs/9527747043/job/26264700374

* build: add cwd to debug message

* fix: possibly fix cwd not on path

* ci: testing invoking script with `-m` for cwd

* fix: remove debug code

* fix: resolve lint, format, type conflicts following rebase

Fixes https://github.com/vega/altair/actions/runs/9581196541/job/26417448238?pr=3431

* DO NOT MERGE - TESTING DOC PERF

* revert: undo last commit

Intended to discard the changes, but accidentally pushed

* refactor: Remove commented out dead code

Fixes #3431 (comment)

* ci: Remove extra whitespace in `build.yml`

Fixes #3431 (comment)

---------

Co-authored-by: Stefan Binder <[email protected]>
dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jul 11, 2024
I should have removed this in vega#3426 but better late than never
mattijn pushed a commit that referenced this pull request Jul 11, 2024
I should have removed this in #3426 but better late than never
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants