Deep Mimic example in Webots? #18

rohit-kumar-j · 2021-01-17T21:15:10Z

rohit-kumar-j
Jan 17, 2021

Hi! Could you integrate/translate a deep mimic environment example from HERE.?

tsampazk · 2021-01-19T08:52:53Z

tsampazk
Jan 19, 2021
Maintainer

Hello @rohit-kumar-j !

Could you please elaborate on how do you imagine the deep mimic example being implemented in Webots?

0 replies

rohit-kumar-j · 2021-01-22T04:28:08Z

rohit-kumar-j
Jan 22, 2021
Author

HI! Thank you for your reply! @tsampazk
For Example, in Pybullet, deep mimic has been implemented like this:

(in Page 67 of the PyBullet Quick Start Guide)

You can do a pip install pybullet and it should be good to go. After that, we open a terminal in the following directory:

bullet3/examples/pybullet/gym/pybullet_data/args/

...and run the following:

python3 -m pybullet_envs.deep_mimic.testrl --arg_file run_humanoid3d_walk_args.txt --For running the pre trained model
and
python3 mpi_run.py --arg_file train_humanoid3d_walk_args.txt --num_workers 16 --For training the model.

I was hoping that this example could be implemented in a less cumbersome way(not necessarily though) within in Webots as compared to the original Deep Mimic code (in Mujoco) and the example provided in pybullet (in pybullet)

I reckon that this is an example provided in pybullet with the env, action space, reward function, etc all custom-defined. Perhaps, there can be a 'general' environment, predefined, based on the deep-mimic code(but upgraded and implemented for webots) that can be used for many other purposes. Like using the same environment for various other robots.

This could be also used as a common library for dynamic robot simulations based on the environment library. Of course, Webots, primarily being a GUI-based simulator would have much of the functionality in the GUI itself and the code would be relatively simpler (as compared to the pybullet implementation).

And as an example implementation, we could translate deep mimic example (above), similar to the cart-pole example.

The development could perhaps be supplemented by support from webots on their discord server, or other community forums

Warm Regards,

Rohit

0 replies

tsampazk · 2021-01-22T11:08:21Z

tsampazk
Jan 22, 2021
Maintainer

Thanks for the detailed response!

The generic environment that can support multiple robots using Deep Mimic sounds like a great idea and a nice addition to deepbots examples. I will list some thoughts regarding the implementation:

In all RL examples, it is required for the user to implement some methods regarding the robot. For example, a method that translates discrete action values to robot actions; the agent outputs the discrete action 0, the method translates 0 to forward motion for a certain time. Generally, this is specific to the robot used or at least the specific actuators used (e.g. robots that all use left and right motors attached to wheels), so swapping out to another robot could mean a different action set and different method of translating actions to actuator movement. This can be problematic for a generic Deep Mimic environment where we can swap out robots.

I think that the first step would be to implement a specific example with a specific robot and extent it from there with more robots that include the required controllers.
Deep Mimic seems to target humanoid animation/movement. Unfortunately, humanoid models and animation in Webots is beyond the scope of deepbots and our knowledge of it. At most, we could try to implement Deep Mimic with an example such as Boston Dynamic Spot inlucded in Webots. This example has an implemented controller that performs some basic hardcoded actions that could be used. The most basic example that could work is to adapt the Cartpole example with a teacher cartpole using a PID controller and a student Cartpole trying to mimic it using its existing reward function plus the teacher reward. This could be a really interesting and useful baseline example to introduce the concept and the way to implement it, but is a far cry from the complex humanoid models of the original work.
Correct me if i am wrong, i think that we need:
- a robot, lets call it teacher, with a predefined controller executing some action
- another robot of the same kind, lets call it student, controlled by the RL agent
- The student needs a way to get information about the action being executed by the teacher, so it can train based on reward from the teacher in addition to some regular reward defined by the user, regarding the target action.
  
  Generally, to exchange information between robot controllers we require an emitter/receiver setup which can easily be implemented and used for low data sizes such as small vectors containing action values. This would need some additional code to implement the sending/receiving of such information. The student robot can use the RobotSupervisor deepbots class to implement the gym-style environment that does not introduce any overhead.

Feel free to share what you think about all these and/or any additional thoughts that you might have.

0 replies

rohit-kumar-j · 2021-01-26T07:47:38Z

rohit-kumar-j
Jan 26, 2021
Author

Thank you for your reply! I now have some more insight into the implementation of Deep Mimic after some research. I might be wrong in my understanding, but here goes:

I did find this video of the implementation of deep mimic in PyBullet. Here, the ghost of the robot (lighter shade robot) is merely a representation of the reward function (lines 42-79). It is not actually in a physics environment and does not face any forces or collisions. The darker shade robot is supposed to follow the 'ghost' and thereby earn the reward. The ghost itself does not have a controller. It uses the methods listed humanoid_pose_interpolator.py for the spherical linear interpolation. I think that both the actual robot and the ghost share the same group of methods to interpolate joints, base_links, etc.

In all RL examples, it is required for the user to implement some methods regarding the robot. For example, a method that translates discrete action values to robot actions; the agent outputs the discrete action 0, the method translates 0 to forward motion for a certain time. Generally, this is specific to the robot used or at least the specific actuators used (e.g. robots that all use left and right motors attached to wheels), so swapping out to another robot could mean a different action set and different method of translating actions to actuator movement. This can be problematic for a generic Deep Mimic environment where we can swap out robots.

There is a custom method for interpolating motion to the joints that are referenced from the mocap files, These mocap files follow a pattern of having 43 indices. The 1st 3 are for the base_link position and the next 4 are for the base_link orientation. The ones that follow are quaternions (and a few Euler joints representations)that represent the joint state at the particular keyframe(cool video for keyframe overview).
Swapping out robots is indeed a problem when there are custom robots and the joint indices might not map to the correct joints of the robot. In addition, there is a problem with robots having fewer or more joints.
(However, I think this can be left up to future discussions when we have implemented a basic example. We could implement a class that calls the joint info from the urdf and builds a dictionary and maps it out to the joint index. Similar to what is in minitaur.py. )

I think that the first step would be to implement a specific example with a specific robot and extent it from there with more robots that include the required controllers.

Indeed, implementing a specific example first and then exporting a new urdf model to swap out with the humanoid is a better option. I think we can use the given humanoid.urdf at first, as the methods are already predefined in the original code within Pybullet's implementation of deep mimic.

Deep Mimic seems to target humanoid animation/movement. Unfortunately, humanoid models and animation in Webots is beyond the scope of deepbots and our knowledge of it. At most, we could try to implement Deep Mimic with an example such as Boston Dynamic Spot included in Webots. This example has an implemented controller that performs some basic hardcoded actions that could be used. The most basic example that could work is to adapt the Cartpole example with a teacher cartpole using a PID controller and a student Cartpole trying to mimic it using its existing reward function plus the teacher reward. This could be a really interesting and useful baseline example to introduce the concept and the way to implement it, but is a far cry from the complex humanoid models of the original work.

If we can find a way to make the 'ghost' of the humanoid robot transform in space. through the keyframes within the specified keyframe durations, then we could, in theory, make the RL agent perform using the RobotSupervisor class, after adding some additional functionality. The reward function given in Pybullet's implementation is given in the getReward() method. This would not essentially require a robot controller (teacher controller) and robots with similar physical structures can be trained.

Correct me if I am wrong, I think that we need:

a robot, let's call it a teacher, with a predefined controller executing some action

I think we do not need a predefined teacher controller, but a reward function 'ghost' transforming through space.

another robot of the same kind, let's call it student, controlled by the RL agent

Yes, we need an RL agent that works on this or a similar reward function.

The student needs a way to get information about the action being executed by the teacher, so it can train based on reward from the teacher in addition to some regular reward defined by the user, regarding the target action.

Yes, the student needs methods to gather information from the action executed by the teacher (ghost) and generate an action based on the action weights.

Generally, to exchange information between robot controllers we require an emitter/receiver setup which can easily be implemented and used for low data sizes such as small vectors containing action values. This would need some additional code to implement the sending/receiving of such information. The student robot can use the RobotSupervisor deepbots class to implement the gym-style environment that does not introduce any overhead.

I agree. This needs an efficient implementation of the deep_mimic code originally provided. Since Webots has GUI+Code Based development, it might be inherently less cumbersome.

Warm Regards,

Rohit

PS: This was a long reply. I will try to shorten them from now on or reply in multiple comments.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deep Mimic example in Webots? #18

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Deep Mimic example in Webots? #18

rohit-kumar-j Jan 17, 2021

Replies: 4 comments

tsampazk Jan 19, 2021 Maintainer

rohit-kumar-j Jan 22, 2021 Author

tsampazk Jan 22, 2021 Maintainer

rohit-kumar-j Jan 26, 2021 Author

rohit-kumar-j
Jan 17, 2021

tsampazk
Jan 19, 2021
Maintainer

rohit-kumar-j
Jan 22, 2021
Author

tsampazk
Jan 22, 2021
Maintainer

rohit-kumar-j
Jan 26, 2021
Author