[Bug Report] check_step_determinism obscur working #1111

qgallouedec · 2024-07-04T13:39:20Z

Describe the bug

It's the kind of issue that's hard to name or explain, or even reduce to simple code. But here's what I've observed since check_step_determinism was added: when I do the check myself, it passes. When it's the checker, it doesn't. For the moment the code depends on panda_gym, sorry for that, I'll reduce it in the past, but I wanted to postpone it as soon as possible.

Code example

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence

env = gym.make("PandaPickAndPlace-v3").unwrapped

seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)

assert data_equivalence(obs_0, obs_1)  # Passes

check_env(env, skip_render_check=True)  # But! this fails in check_step_determinism

Traceback (most recent call last):
  File "/Users/quentingallouedec/panda-gym/94.py", line 16, in <module>
    check_env(env, skip_render_check=True)  # fails
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/quentingallouedec/panda-gym/env/lib/python3.11/site-packages/gymnasium/utils/env_checker.py", line 402, in check_env
    check_step_determinism(env)
  File "/Users/quentingallouedec/panda-gym/env/lib/python3.11/site-packages/gymnasium/utils/env_checker.py", line 198, in check_step_determinism
    assert data_equivalence(
           ^^^^^^^^^^^^^^^^^
AssertionError: Deterministic step observations are not equivalent for the same seed and action

What's even weirder, is that it only happens in two environments. I'll keep digging and let you know.

System info

Gymnasium 1.0.0a2
Panda-gym 10c4d8a

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo

The text was updated successfully, but these errors were encountered:

Kallinteris-Andreas · 2024-07-08T12:00:08Z

The only thing I can think of is perhaps your environment does not properly reset internal state after the second reset

Does this pass?:

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence

env = gym.make("PandaPickAndPlace-v3").unwrapped

check_env(env, skip_render_check=True)  # does this fail??

qgallouedec · 2024-07-08T13:13:40Z

It fails. Doesn't make any sense 😅

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence
import traceback

env = gym.make("PandaPickAndPlace-v3").unwrapped

try:
    check_env(env, skip_render_check=True)  # Fails
except Exception as exc:
    traceback.print_exception(exc)

seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)

assert data_equivalence(obs_0, obs_1)  # Passes

Traceback (most recent call last):
  File "/Users/quentingallouedec/panda-gym/94.py", line 9, in <module>
    check_env(env, skip_render_check=True)  # Fails
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 412, in check_env
    check_reset_seed_determinism(env)
  File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 116, in check_reset_seed_determinism
    assert data_equivalence(
           ^^^^^^^^^^^^^^^^^
AssertionError: Using `env.reset(seed=123)` then `env.reset()` is non-deterministic as the observations are not equivalent.

Kallinteris-Andreas · 2024-07-08T14:02:55Z

I have no idea how, this is failing

check_step_determinism does the same thing:

Gymnasium/gymnasium/utils/env_checker.py

Lines 188 to 218 in a09dcfd

    
           def check_step_determinism(env: gym.Env, seed=123): 
        
               """Check that the environment steps deterministically after reset. 
        
               Note: This check assumes that seeded `reset()` is deterministic (it must have passed `check_reset_seed`) and that `step()` returns valid values (passed `env_step_passive_checker`). 
        
               Note: A single step should be enough to assert that the state transition function is deterministic (at least for most environments). 
        
               Raises: 
        
                   AssertionError: The environment cannot be step deterministically after resetting with a random seed, 
        
                       or it truncates after 1 step. 
        
               """ 
        
               if env.spec is not None and env.spec.nondeterministic is True: 
        
                   return 
        
               env.action_space.seed(seed) 
        
               action = env.action_space.sample() 
        
               env.reset(seed=seed) 
        
               obs_0, rew_0, term_0, trunc_0, info_0 = env.step(action) 
        
               seeded_rng: np.random.Generator = deepcopy(env.unwrapped._np_random) 
        
               env.reset(seed=seed) 
        
               obs_1, rew_1, term_1, trunc_1, info_1 = env.step(action) 
        
               assert ( 
        
                   env.unwrapped._np_random.bit_generator.state  # pyright: ignore [reportOptionalMemberAccess] 
        
                   == seeded_rng.bit_generator.state 
        
               ), "The `.np_random` is not properly been updated after step." 
        
               assert data_equivalence( 
        
                   obs_0, obs_1 
        
               ), "Deterministic step observations are not equivalent for the same seed and action"

you could try adding a breakpoint() in line 210 and printing obs_0 and obs_1, that might reveal something

pseudo-rnd-thoughts · 2024-07-09T17:38:17Z

I can reproduce the error but seems to require a strange setup

You must reset, step, reset, step for the second step to fail equivalence

If I change the error to assert then I can discover the obs["observation"] is the problem. Plus if I subtract the two data points, we can see it is in the second half only, this is from the task

AssertionError: data_1 - data_2=array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
        0.0000000e+00,  0.0000000e+00,  0.0000000e+00, -4.6938658e-07,
        4.4703484e-07, -3.2596290e-07,  1.1971366e-05, -1.0663000e-05,
        3.4053983e-06,  2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
        1.1928683e-03,  1.7360186e-03,  2.0111194e-04], dtype=float32)

Looking at all the environment, this is a problem for most of them except Reach and Slide

import gymnasium as gym
from gymnasium.utils.env_checker import data_equivalence

import panda_gym
from panda_gym.envs import PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv

gym.register_envs(panda_gym)

for env_cls in [PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv]:
    env = env_cls()
    print(f'{env}')

    seed = 123
    env.action_space.seed(seed)
    action_0 = env.action_space.sample()
    action_1 = env.action_space.sample()

    obs_0, _ = env.reset(seed=seed)
    obs_1, _, _, _, _ = env.step(action_0)
    obs_2, _ = env.reset()
    obs_3, _, _, _, _ = env.step(action_1)

    obs_4, _ = env.reset(seed=seed)
    obs_5, _, _, _, _ = env.step(action_0)
    obs_6, _ = env.reset()
    obs_7, _, _, _, _ = env.step(action_1)

    data_equivalence(obs_0, obs_4)
    data_equivalence(obs_1, obs_5)
    print(f'{obs_1["observation"] - obs_5["observation"]=}')
    data_equivalence(obs_2, obs_6)
    data_equivalence(obs_3, obs_7)

Of which, all of these differences only exist in the task (not the robot)

pseudo-rnd-thoughts · 2024-07-09T18:09:23Z

I've found a sort of source for the noise in observation.
The _sample_object function in PushAndPlace task, if you comment out line 83 that adds the noise to the object_position, object_position += noise, the error disappears to PushAndPlace.
However, if you print the noise value produced, its the same in the two episodes.

noise=array([0.04380345, 0.00589226, 0.        ])
<PandaPickAndPlaceEnv instance>
noise=array([-0.09722823,  0.09362835,  0.        ])
noise=array([-0.07651062,  0.09727248,  0.        ])
noise=array([-0.09722823,  0.09362835,  0.        ])
noise=array([-0.07651062,  0.09727248,  0.        ])
obs_1["observation"] - obs_5["observation"]=array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
        0.0000000e+00,  0.0000000e+00,  0.0000000e+00, -4.6938658e-07,
        4.4703484e-07, -3.2596290e-07,  1.1971366e-05, -1.0663000e-05,
        3.4053983e-06,  2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
        1.1928683e-03,  1.7360186e-03,  2.0111194e-04], dtype=float32)

I can't figure out why adding this noise is causing the output to change
The problem is that if I change the code to

    def _sample_object(self) -> np.ndarray:
        """Randomize start position of object."""
        object_position = np.array([0.0, 0.0, self.object_size / 2])
        noise = self.np_random.uniform(self.obj_range_low, self.obj_range_high)
        object_position += np.array([-0.09722823,  0.09362835,  0.        ])
        return object_position

the problem persists even if we are still adding the noise

EDIT: The next day I can't replicate the last point

pseudo-rnd-thoughts · 2024-07-10T09:36:16Z

Looking at the next day, I can't replicate the problem I noted at the end

I tested the minimal example

seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()

obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)  # This line is necessary

obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1)  # This line isn't necessary for the issue

print(f'{obs_1["observation"] - obs_5["observation"]=}')

Another test I made was to add another reset case to compare the 3 observations
Interestingly, the three observations are different, meaning there is an unknown source of randomness that is deterministic (the observation error being constant across many runs). This is a very strange combination of deterministic unknown randomness

seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()

obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)  # necessary

obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1)  # unnecessary

obs_8, _ = env.reset(seed=seed)
obs_9, _, _, _, _ = env.step(action_0)
obs_10, _ = env.reset()
# obs_11, _, _, _, _ = env.step(action_1)  # unnecessary

print(f'{obs_1["observation"] - obs_5["observation"]=}')
print(f'{obs_5["observation"] - obs_9["observation"]=}')
print(f'{obs_1["observation"] - obs_9["observation"]=}')

Checking the seeding, separating the action space and reset seeding, only the reset seeding affects the observation, i.e., the actual action taken doesn't matter

The last check I've made is related to the _sample_object function and the noise.
Rechecking, I couldn't replicate the constant noise still causing the issue however in modifying the bounds I could avoid it.
It seems like if the position is not close to zero then there isn't an issue
If someone could plot a graph of errors for different object positions could be interesting to prove this

qgallouedec added the bug Something isn't working label Jul 4, 2024

qgallouedec mentioned this issue Jul 23, 2024

reproduce the results qgallouedec/panda-gym#94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] check_step_determinism obscur working #1111

[Bug Report] check_step_determinism obscur working #1111

qgallouedec commented Jul 4, 2024 •

edited

Loading

Kallinteris-Andreas commented Jul 8, 2024

qgallouedec commented Jul 8, 2024 •

edited

Loading

Kallinteris-Andreas commented Jul 8, 2024

pseudo-rnd-thoughts commented Jul 9, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 9, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 10, 2024 •

edited

Loading

[Bug Report] check_step_determinism obscur working #1111

[Bug Report] check_step_determinism obscur working #1111

Comments

qgallouedec commented Jul 4, 2024 • edited Loading

Describe the bug

Code example

System info

Additional context

Checklist

Kallinteris-Andreas commented Jul 8, 2024

qgallouedec commented Jul 8, 2024 • edited Loading

Kallinteris-Andreas commented Jul 8, 2024

pseudo-rnd-thoughts commented Jul 9, 2024 • edited Loading

pseudo-rnd-thoughts commented Jul 9, 2024 • edited Loading

pseudo-rnd-thoughts commented Jul 10, 2024 • edited Loading

qgallouedec commented Jul 4, 2024 •

edited

Loading

qgallouedec commented Jul 8, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 9, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 9, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 10, 2024 •

edited

Loading