Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix opt_start_time usage and other datetime fields #535

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

charbonnierg
Copy link
Contributor

@charbonnierg charbonnierg commented Feb 19, 2024

Problem

This PR is related to issue #474.

Today it's possible to use the opt_start_time field in ConsumerConfig but a # type: ignore comment is required:

cfg = ConsumerConfig(
    deliver_policy=DeliverPolicy.BY_START_TIME,
    opt_start_time="2024-02-19T19:20:50.52Z",   # type: ignore
)

This is required because an int is expected by the ConsumerConfig dataclass.

Moreover, users must encode the opt_start_time value prior to using it.

Proposal

Annotate ConsumerConfig.opt_start_time as a datetime.datetime and automatically convert from/to string when needed.

cfg = ConsumerConfig(
    deliver_policy=DeliverPolicy.BY_START_TIME,
    opt_start_time=datetime.datetime(2024, 2, 19, 19, 20, 50, 52, tzinfo=datetime.timezone.utc)
)

By default (e.g., unless user-provided), datetimes are assumed to be UTC. So the example above should be equivalent to:

cfg = ConsumerConfig(
    deliver_policy=DeliverPolicy.BY_START_TIME,
    opt_start_time=datetime.datetime(2024, 2, 19, 19, 20, 50, 52)
)

Additional proposal

It seemed that several fields used the same date format in the JetStream API. I uncommented the commented lines with a FIXME note where it seemed appropriate. Those changes can be reverted if not desired.

Added tests

I added several test cases to make sure that:

  • when creating a config, timezone is preserved when user-provided
  • when creating a config, timezone is considered to be UTC when not provided
  • when reading a config, datetimes can be parsed up to microseconds precision
  • when reading a config, datetimes are parsed with timezone support

Side effects

I added pytz as a development dependency, and ran pipenv lock to refresh the lockfile. This dependency is not required by the nats-py package, only during development.

@charbonnierg
Copy link
Contributor Author

charbonnierg commented Feb 20, 2024

FYI, I ran an additional test which I did not commit because it used pytest:

Basically I want to make sure that the timezone information is not lost when using opt_start_time. I ran the same test for all existing timezones in pytz package using pytest.mark.parametrize and various microsecond precisions:

Test case:

    @pytest.mark.parametrize(
        "tz",
        [*pytz.all_timezones],
    )
    @pytest.mark.parametrize(
        "microseconds",
        [0, 1, 123456]
    )
    async def test_consumer_with_opt_start_time_all_timezones(self, tz: str, microseconds: int):
        nc = NATS()
        await nc.connect()
        jsm = nc.jsm()
        await jsm.add_stream(name="ctests", subjects=["a", "b", "c.>"])
        con = await jsm.add_consumer(
            "ctests",
            opt_start_time=datetime.datetime(1970, 1, 1, microsecond=microseconds, tzinfo=pytz.timezone(tz)),
            deliver_policy=api.DeliverPolicy.BY_START_TIME,
        )
        assert isinstance(con.created, datetime.datetime)
        assert con.config.opt_start_time == datetime.datetime(1970, 1, 1, microsecond=microseconds, tzinfo=pytz.timezone(tz))
        await nc.close()
(click to unfold complete test with setup with pytest)
@pytest.mark.asyncio
class SetupTest:

    @pytest.fixture(autouse=True)
    def setup(self):
        self.server_pool = []

        server = NATSD(port=4222, with_jetstream=True)
        self.server_pool.append(server)
        for natsd in self.server_pool:
            start_natsd(natsd)
        try:
            yield
        finally:
            for natsd in self.server_pool:
                natsd.stop()
                shutil.rmtree(natsd.store_dir)


class TestOptStartTime(SetupTest):

    @pytest.mark.parametrize(
        "tz",
        [*pytz.all_timezones],
    )
    @pytest.mark.parametrize(
        "microseconds",
        [0, 1, 123456]
    )
    async def test_consumer_with_opt_start_time_all_timezones(self, tz: str, microseconds: int):
        nc = NATS()
        await nc.connect()
        jsm = nc.jsm()
        await jsm.add_stream(name="ctests", subjects=["a", "b", "c.>"])
        con = await jsm.add_consumer(
            "ctests",
            opt_start_time=datetime.datetime(1970, 1, 1, microsecond=microseconds, tzinfo=pytz.timezone(tz)),
            deliver_policy=api.DeliverPolicy.BY_START_TIME,
        )
        assert isinstance(con.created, datetime.datetime)
        assert con.config.opt_start_time == datetime.datetime(1970, 1, 1, microsecond=microseconds, tzinfo=pytz.timezone(tz))
        await nc.close()
Output:
(.venv) charbonnierg@surface-dev:~/github/charbonnierg/nats.py$ pytest -k test_consumer_with_opt_start_time_all_timezones -x
================================ test session starts ================================
platform linux -- Python 3.10.12, pytest-8.0.1, pluggy-1.4.0
rootdir: /home/charbonnierg/github/charbonnierg/nats.py
plugins: cov-4.1.0, asyncio-0.23.5
asyncio: mode=strict
collected 1973 items / 185 deselected / 1788 selected                               

================= 1788 passed, 185 deselected in 257.31s (0:04:17) ==================

is assumed to be in UTC timezone.
"""
if date.tzinfo is None:
date = date.replace(tzinfo=datetime.timezone.utc)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this part. It is described in the PR body though, and there are test cases for it.

I wonder if we should use .astimezone(datetime.timezone.utc) instead of .replace(datetime.timezone.utc).

The current implementation considers that date without timezone are UTC, which is the rule that we use at work but may not be universal.

If we change to astimezone, the date is assumed to be local and is converted to UTC.

@wallyqs wallyqs changed the base branch from main to release/v2.7.2 February 26, 2024 09:58
@wallyqs wallyqs changed the base branch from release/v2.7.2 to main February 26, 2024 10:03
@arguile-
Copy link

So it appears that the response precision is variable at times, which is causing datetime.isoformat to throw Invalid isoformat string.

A response for StreamInfo from the server (2.10.5):

(partial, prettied)

  "delivered": {
    "consumer_seq": 2,
    "stream_seq": 4,
    "last_active": "2024-03-20T13:15:22.894299Z"
  },
  "ack_floor": {
    "consumer_seq": 2,
    "stream_seq": 4,
    "last_active": "2024-03-20T13:15:22.89525Z"
  }

Looking at the two timestamps we can see the precision difference:

   "last_active": "2024-03-20T13:15:22.894299Z" # delivered
   "last_active": "2024-03-20T13:15:22.89525Z"  # ack_floor

The second isn't valid for fromisoformat, additionally it ends in Z which is otherwise stripped by val[:26] on the correctly formatted one.

    def _convert_rfc3339(resp: Dict[str, Any], field: str) -> None:
        """Convert a RFC 3339 formatted string into a datetime.

        If the string is None, None is returned.
        """
        val = resp.get(field, None)
        if val is None:
            return None
        raw_date = val[:26]
        if raw_date.endswith("Z"):
            raw_date = raw_date[:-1] + "+00:00"
        resp[field] = datetime.datetime.fromisoformat(raw_date).replace(
            tzinfo=datetime.timezone.utc
        )

As it's coming in the raw response this looks like a nats-server issue. I haven't checked the code nor other versions yet.

It's also appears to be intermittent (dropping trailing 0s?).

@arguile-
Copy link

arguile- commented Mar 20, 2024

So it (nats-server) is stripping trailing zeros as per RFC3339Nano.

RFC3339Nano = "2006-01-02T15:04:05.999999999Z07:00"

The latest Python parses this correctly (code):

def _parse_hh_mm_ss_ff(tstr):
    # Parses things of the form HH[:?MM[:?SS[{.,}fff[fff]]]]

However at least as of Python 3.9 it only accepts microseconds with 3 or 6 digits.

def _parse_hh_mm_ss_ff(tstr):
    # Parses things of the form HH[:MM[:SS[.fff[fff]]]]
    len_str = len(tstr)
    
    ...
    
        if pos < len_str:
        if tstr[pos] != '.':
            raise ValueError('Invalid microsecond component')
        else:
            pos += 1

            len_remainder = len_str - pos
            if len_remainder not in (3, 6):
                raise ValueError('Invalid microsecond component')

            time_comps[3] = int(tstr[pos:])
            if len_remainder == 3:
                time_comps[3] *= 1000

@arguile-
Copy link

Not sure if an added dependency is wanted but backports-datetime-fromisoformat exists and can be used without the monkey patching. This brings the ISO 8601 parsing up to Python 3.11 in any Python 3.

from backports.datetime_fromisoformat import datetime_fromisoformat

datetime_fromisoformat("2024-03-20T13:15:22.89525Z")

@charbonnierg
Copy link
Contributor Author

Thank you for looking into this !

If I follow correctly, does it mean that the 4 commented test cases should pass but won't ? (I'm not on my computer so I can't run it)

def test_parse_consumer_info_with_created_timestamp(self):
    for created in [
        # "1970-01-01T01:02:03.4Z",        # Nok: should pass
        # "1970-01-01T01:02:03.400Z",      # Nok: should pass
        "1970-01-01T01:02:03.400Z",        # OK
        # "1970-01-01T01:02:03.4000Z",     # Nok: should pass
        # "1970-01-01T01:02:03.40000Z",    # Nok: should pass
        "1970-01-01T01:02:03.400000Z",     # OK
        "1970-01-01T01:02:03.4000000Z",    # OK
        "1970-01-01T01:02:03.40000000Z",   # OK
        "1970-01-01T01:02:03.400000000Z",  # OK
    ]:
        info = api.ConsumerInfo.from_response({
            "name": "test",
            "stream_name": "test",
            "config": {},
            "created": created
        })
        created = info.created
        assert created == datetime.datetime(
            1970, 1, 1, 1, 2, 3, 400000, tzinfo=datetime.timezone.utc
        )

The test may not be written correctly, I did not test it

@charbonnierg
Copy link
Contributor Author

I just pushed a commit to address those changes, and I added a test with more cases.

The function cost is still relatively cheap, on my computer (CPU: Intel Pentium 6500Y @ 1.10GHz)

765 ns ± 114 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

What do you think @arguile- ?

Guillaume Charbonnier and others added 5 commits March 20, 2024 19:04
nats/js/api.py Outdated
raw_date = val[:-6]
# Padd missing milliseconds
if "." not in raw_date:
raw_date += ".000000"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for fromisoformat, invert if and leave out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right ! I removed this block and inverted the if statement.

nats/js/api.py Outdated
else:
offset = val[-6:]
raw_date = val[:-6]
# Padd missing milliseconds

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Padd missing milliseconds
# Pad missing milliseconds

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth linking to the issue that caused Python 3.11 to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bunch of comments. Thanks for reviewing those changes, feel free to comment again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants