Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] check_step_determinism obscur working #1111

Open
1 task done
qgallouedec opened this issue Jul 4, 2024 · 6 comments
Open
1 task done

[Bug Report] check_step_determinism obscur working #1111

qgallouedec opened this issue Jul 4, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@qgallouedec
Copy link
Contributor

qgallouedec commented Jul 4, 2024

Describe the bug

It's the kind of issue that's hard to name or explain, or even reduce to simple code. But here's what I've observed since check_step_determinism was added: when I do the check myself, it passes. When it's the checker, it doesn't. For the moment the code depends on panda_gym, sorry for that, I'll reduce it in the past, but I wanted to postpone it as soon as possible.

Code example

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence

env = gym.make("PandaPickAndPlace-v3").unwrapped

seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)

assert data_equivalence(obs_0, obs_1)  # Passes

check_env(env, skip_render_check=True)  # But! this fails in check_step_determinism
Traceback (most recent call last):
  File "/Users/quentingallouedec/panda-gym/94.py", line 16, in <module>
    check_env(env, skip_render_check=True)  # fails
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/quentingallouedec/panda-gym/env/lib/python3.11/site-packages/gymnasium/utils/env_checker.py", line 402, in check_env
    check_step_determinism(env)
  File "/Users/quentingallouedec/panda-gym/env/lib/python3.11/site-packages/gymnasium/utils/env_checker.py", line 198, in check_step_determinism
    assert data_equivalence(
           ^^^^^^^^^^^^^^^^^
AssertionError: Deterministic step observations are not equivalent for the same seed and action

What's even weirder, is that it only happens in two environments. I'll keep digging and let you know.

System info

Gymnasium 1.0.0a2
Panda-gym 10c4d8a

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
@qgallouedec qgallouedec added the bug Something isn't working label Jul 4, 2024
@Kallinteris-Andreas
Copy link
Collaborator

The only thing I can think of is perhaps your environment does not properly reset internal state after the second reset

Does this pass?:

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence

env = gym.make("PandaPickAndPlace-v3").unwrapped

check_env(env, skip_render_check=True)  # does this fail??

@qgallouedec
Copy link
Contributor Author

qgallouedec commented Jul 8, 2024

It fails. Doesn't make any sense 😅

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence
import traceback

env = gym.make("PandaPickAndPlace-v3").unwrapped

try:
    check_env(env, skip_render_check=True)  # Fails
except Exception as exc:
    traceback.print_exception(exc)

seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)

assert data_equivalence(obs_0, obs_1)  # Passes
Traceback (most recent call last):
  File "/Users/quentingallouedec/panda-gym/94.py", line 9, in <module>
    check_env(env, skip_render_check=True)  # Fails
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 412, in check_env
    check_reset_seed_determinism(env)
  File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 116, in check_reset_seed_determinism
    assert data_equivalence(
           ^^^^^^^^^^^^^^^^^
AssertionError: Using `env.reset(seed=123)` then `env.reset()` is non-deterministic as the observations are not equivalent.

@Kallinteris-Andreas
Copy link
Collaborator

I have no idea how, this is failing

check_step_determinism does the same thing:

def check_step_determinism(env: gym.Env, seed=123):
"""Check that the environment steps deterministically after reset.
Note: This check assumes that seeded `reset()` is deterministic (it must have passed `check_reset_seed`) and that `step()` returns valid values (passed `env_step_passive_checker`).
Note: A single step should be enough to assert that the state transition function is deterministic (at least for most environments).
Raises:
AssertionError: The environment cannot be step deterministically after resetting with a random seed,
or it truncates after 1 step.
"""
if env.spec is not None and env.spec.nondeterministic is True:
return
env.action_space.seed(seed)
action = env.action_space.sample()
env.reset(seed=seed)
obs_0, rew_0, term_0, trunc_0, info_0 = env.step(action)
seeded_rng: np.random.Generator = deepcopy(env.unwrapped._np_random)
env.reset(seed=seed)
obs_1, rew_1, term_1, trunc_1, info_1 = env.step(action)
assert (
env.unwrapped._np_random.bit_generator.state # pyright: ignore [reportOptionalMemberAccess]
== seeded_rng.bit_generator.state
), "The `.np_random` is not properly been updated after step."
assert data_equivalence(
obs_0, obs_1
), "Deterministic step observations are not equivalent for the same seed and action"

you could try adding a breakpoint() in line 210 and printing obs_0 and obs_1, that might reveal something

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jul 9, 2024

I can reproduce the error but seems to require a strange setup

You must reset, step, reset, step for the second step to fail equivalence

If I change the error to assert then I can discover the obs["observation"] is the problem. Plus if I subtract the two data points, we can see it is in the second half only, this is from the task

AssertionError: data_1 - data_2=array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
        0.0000000e+00,  0.0000000e+00,  0.0000000e+00, -4.6938658e-07,
        4.4703484e-07, -3.2596290e-07,  1.1971366e-05, -1.0663000e-05,
        3.4053983e-06,  2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
        1.1928683e-03,  1.7360186e-03,  2.0111194e-04], dtype=float32)

Looking at all the environment, this is a problem for most of them except Reach and Slide

import gymnasium as gym
from gymnasium.utils.env_checker import data_equivalence

import panda_gym
from panda_gym.envs import PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv

gym.register_envs(panda_gym)

for env_cls in [PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv]:
    env = env_cls()
    print(f'{env}')

    seed = 123
    env.action_space.seed(seed)
    action_0 = env.action_space.sample()
    action_1 = env.action_space.sample()

    obs_0, _ = env.reset(seed=seed)
    obs_1, _, _, _, _ = env.step(action_0)
    obs_2, _ = env.reset()
    obs_3, _, _, _, _ = env.step(action_1)

    obs_4, _ = env.reset(seed=seed)
    obs_5, _, _, _, _ = env.step(action_0)
    obs_6, _ = env.reset()
    obs_7, _, _, _, _ = env.step(action_1)

    data_equivalence(obs_0, obs_4)
    data_equivalence(obs_1, obs_5)
    print(f'{obs_1["observation"] - obs_5["observation"]=}')
    data_equivalence(obs_2, obs_6)
    data_equivalence(obs_3, obs_7)

Of which, all of these differences only exist in the task (not the robot)

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jul 9, 2024

I've found a sort of source for the noise in observation.
The _sample_object function in PushAndPlace task, if you comment out line 83 that adds the noise to the object_position, object_position += noise, the error disappears to PushAndPlace.
However, if you print the noise value produced, its the same in the two episodes.

noise=array([0.04380345, 0.00589226, 0.        ])
<PandaPickAndPlaceEnv instance>
noise=array([-0.09722823,  0.09362835,  0.        ])
noise=array([-0.07651062,  0.09727248,  0.        ])
noise=array([-0.09722823,  0.09362835,  0.        ])
noise=array([-0.07651062,  0.09727248,  0.        ])
obs_1["observation"] - obs_5["observation"]=array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
        0.0000000e+00,  0.0000000e+00,  0.0000000e+00, -4.6938658e-07,
        4.4703484e-07, -3.2596290e-07,  1.1971366e-05, -1.0663000e-05,
        3.4053983e-06,  2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
        1.1928683e-03,  1.7360186e-03,  2.0111194e-04], dtype=float32)

I can't figure out why adding this noise is causing the output to change
The problem is that if I change the code to

    def _sample_object(self) -> np.ndarray:
        """Randomize start position of object."""
        object_position = np.array([0.0, 0.0, self.object_size / 2])
        noise = self.np_random.uniform(self.obj_range_low, self.obj_range_high)
        object_position += np.array([-0.09722823,  0.09362835,  0.        ])
        return object_position

the problem persists even if we are still adding the noise

EDIT: The next day I can't replicate the last point

@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jul 10, 2024

Looking at the next day, I can't replicate the problem I noted at the end

I tested the minimal example

seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()

obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)  # This line is necessary

obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1)  # This line isn't necessary for the issue

print(f'{obs_1["observation"] - obs_5["observation"]=}')

Another test I made was to add another reset case to compare the 3 observations
Interestingly, the three observations are different, meaning there is an unknown source of randomness that is deterministic (the observation error being constant across many runs). This is a very strange combination of deterministic unknown randomness

seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()

obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)  # necessary

obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1)  # unnecessary

obs_8, _ = env.reset(seed=seed)
obs_9, _, _, _, _ = env.step(action_0)
obs_10, _ = env.reset()
# obs_11, _, _, _, _ = env.step(action_1)  # unnecessary

print(f'{obs_1["observation"] - obs_5["observation"]=}')
print(f'{obs_5["observation"] - obs_9["observation"]=}')
print(f'{obs_1["observation"] - obs_9["observation"]=}')

Checking the seeding, separating the action space and reset seeding, only the reset seeding affects the observation, i.e., the actual action taken doesn't matter

The last check I've made is related to the _sample_object function and the noise.
Rechecking, I couldn't replicate the constant noise still causing the issue however in modifying the bounds I could avoid it.
It seems like if the position is not close to zero then there isn't an issue
If someone could plot a graph of errors for different object positions could be interesting to prove this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants