Skip to content

dair-iitd/nsrmp

Repository files navigation

Neuro-symbolic Robot Manipulation (NSRM)

Learning Neuro-symbolic Programs for Language Guided Robot Manipulation
Namasivayam Kalithasan*, Himanshu Singh*, Vishal Bindal*, Arnav Tuli, Vishwajeet Agrawal, Rahul Jain, Parag Singla, Rohan Paul
ICRA 2023

For the latest updates, see: nsrmp.github.io

Given a natural language instruction, and an input and an output scene, our goal is to train a neuro-symbolic model which can output a manipulation program that can be executed by the robot on the input scene resulting in the desired output scene. Our approach is neuro-symbolic and can handle linguistic as well as perceptual variations, is end-to-end differentiable requiring no intermediate supervision, and makes use of symbolic reasoning constructs which operate on a latent neural object-centric representation, allowing for deeper reasoning over the input scene. Our experiments on a simulated environment with a 7-DOF manipulator, consisting of instructions with varying number of steps, as well as scenes with different number of objects, and objects with unseen attribute combinations, demonstrate that our model is robust to such variations, and significantly outperforms existing baselines, particularly in generalization settings.

Index

Setup

  • Install Jacinle: Clone the package, and add the bin path to your PATH environment.

      git clone https://github.com/vacancy/Jacinle --recursive
      export PATH=<path_to_jacinle>/bin:$PATH
    
  • Clone the NSRM repository

      git clone https://github.com/dair-iitd/nsrmp.git
    
  • Add the root directory to PATH and PYTHONPATH

     cd nsrmp
     export PYTHONPATH=$(pwd):$PYTHONPATH
     export PATH=$(pwd):$PATH
    
  • Create a conda environment from nsrmp_conda_environment.yaml. (Prerequisite: conda)

      conda env create -f nsrmp_conda_environment.yaml
      conda activate nsrm
    

Quickstart

This section contains details of evaluating trained model checkpoints on pre-generated data, and simulating examples. See Training for training models, and Data generation for generating new data.

Setting up downloads

See Downloads for links to model checkpoints and dataset. The paths for these would be referred to thereon as:

  • <path_to_nsrm_checkpoint> for trained NSRM checkpoint

  • <path_to_baseline_checkpoint> for trained baseline checkpoint

  • <path_to_dataset> for dataset directory. Note that after unzipping the download, this directory should have the following files

      <path_to_dataset>
      └── instructions-*.json
      └── scenes-*.json
      └── train
      └── test
      └── val
      └── vocab.json
    
  • <path_to_dataset_general> for dataset (generalizzation experiments). Note that after unzipping the download, this directory should have the following files

      <path_to_dataset_general>
      └── color_comb
      └── multi-objects
      └── multi-step
      └── type-combinatorial
    

Evaluate NSRM

TODO

jac-crun 0 scripts/eval.py --dataset roboclevr --datadir <path_to_dataset> --vocab_json --instruction_transform basic --use_cuda True --batch_size 32  --load_model_from_file <path_to_nsrm_checkpoint> 

Evaluate baseline

Run the command below to evaluate the performance of baseline model on given dataset:
<baseline_type>: Type of baseline used ('single': only supports single-step instructions, 'multi': supports instructions of arbitrary length)
<path_to_model>: Absolute path to the baseline checkpoint

python3 scripts/test_baseline.py --datadir <path_to_dataset> --type <baseline_type> --load_model <path_to_model>

[OPTIONAL] Arguments:
--use_cuda: Use CUDA or not (default: True)
--num_steps: Range of lengths of instructions. For e.g., --num_steps a b filters examples where the number of steps in execution are > a and <= b (default: 0 2)
--num_objects: Range of number of objects in scene. For e.g., --num_objects a b filters examples where the number of objects in scene are > a and <= b (default: 0 5)
--language_complexity: Filter complexity of natural language instructions. For e.g., simple and complex (default: None)
--remove_relational: Filter out examples involving spatial reasoning (default: False)
--only_relational: Filter examples involving spatial reasoning (default: False)

Simulate an example

To render a pybullet simulation for any example, run the below command (Note: <path_to_dataset>/test/00297 contains an example for a 2-step instruction)

cd nsrmp
jac-run scripts/simulate.py --model_path <path_to_nsrm_checkpoint> \ 
                            --example_path <path_to_dataset>/test/00297 \
                            --predicted True

For other options, see simulate.py

Visualise plan for an example

TODO

Reconstruct final scene for an example

TODO

Downloads

Hardware requirements

We have trained and tested the code on

  • GPU - NVIDIA Quadro RTX 5000
  • CPU - Intel(R) Xeon(R) Gold 6226R
  • RAM - 16GB
  • OS - Ubuntu 20.04

Training

Training NSRM End-to-End

jac-crun 0 scripts/train.py --dataset roboclevr --datadir <path_to_dataset>  --vocab_json <path_to_dataset>/vocab.json --instruction_transform program_parser_candidates --use_cuda True --batch_size 32 --num_epochs 300  --model_save_interval 1 --training_target all --eval_interval 10   

Abalations

  • The contribution of the Language Reasoner can be efaced by using the ground truth symbolic-program. Set the --training_target flag to concept_embeddings to train the visual and Action Modules using ground truth symbolic programs. That is,
jac-crun 0 scripts/train_single_step.py --dataset roboclevr --datadir <path_to_dataset>  --vocab_json <path_to_dataset>/vocab.json --instruction_transform program_parser_candidates --use_cuda True --batch_size 32 --num_epochs 300  --model_save_interval 1 --training_target concept_embeddings --eval_interval 10   
  • Similarly, if the visual modules are pre-trained, the Language and Action Modules alone can be trained by setting the --training_target flag to non-visual. Refer model_new.py to know more about training targets.

Training baseline

Run the command below to train the baseline model on given dataset:
<baseline_type>: Type of baseline used ('single': only supports single-step instructions, 'multi': supports instructions of arbitrary length)

python3 scripts/train_baseline.py --datadir <path_to_dataset> --type <baseline_type>

[OPTIONAL] Arguments:
--use_cuda: Use CUDA or not (default: True)
--batch_size: Size of batch during training (default: 32)
--num_epochs: Number of training epochs (default: 200)
--save_model: Name with which to store the model dict (in model_saves directory) (default: None)
--load_model: Name from which to load the model dict (in model_saves directory) (default: None)
--save_spliiter: (only for multi type) Whether to store model's splitter dict or not (default: False)
--load_splitter: (only for multi type) Whether to load model's spliiter dict or not (default: False)
--save_interval: Model save interval (in epochs) (default: 5)
--use_iou_loss: Use IoU loss or not (during training) (default: False)
--train_splitter: (only for multi type) Train model's splitter or not (default: False)
--num_steps: Range of lengths of instructions. For e.g., --num_steps a b filters examples where the number of steps in execution are > a and <= b (default: 0 2)
--num_objects: Range of number of objects in scene. For e.g., --num_objects a b filters examples where the number of objects in scene are > a and <= b (default: 0 5)
--language_complexity: Filter complexity of natural language instructions. For e.g., simple and complex (default: None)
--remove_relational: Filter out examples involving spatial reasoning (default: False)
--only_relational: Filter examples involving spatial reasoning (default: False)
--wandb: Use WandB for logging training losses and model parameters (default: False)

Training of baseline model on the given dataset is achieved by the following pipeline:

  • Train baseline on examples consisting of single-step commands and upto 5 objects without IoU loss
python3 scripts/train_baseline.py --datadir <path_to_dataset> --type single --save_model <single_step_path> --num_steps 0 1
  • Fine-tune trained model on the same dataset with IoU loss
python3 scripts/train_baseline.py --datadir <path_to_dataset> --type single --save_model <single_step_path> --num_steps 0 1 --load_model <single_step_path> --use_iou_loss True
  • Using model trained on single-step commands, train model (with splitter) on examples consisting of upto two-step commands and upto 5 objects with IoU loss
python3 scripts/train_baseline.py --datadir <path_to_dataset> --type multi --save_model <multi_step_path> --load_model <single_step_path> --use_iou_loss True --train_splitter True
  • Freeze model's splitter, and fine-tune the model further
python3 scripts/train_baseline.py --datadir <path_to_dataset> --type multi --save_model <multi_step_path> --load_model <multi_step_path> --use_iou_loss True

Training image-reconstructor

TODO

Data Generation

The break-up of the dataset used is defined in curriculum.json. It can be modified to generate examples of any particular kind.

Each entry in categories has the following paramaters:

  • type: any (cube/lego/dice) or cube (only cube)
  • num_objects: number of objects in scene
  • steps: number of steps in instruction
  • relational: true or false, whether it contains relational attributes to refer to objects (e.g. the block which is to the left of yellow cube)
  • language: simple or complex
  • count: number of examples to be generated for this category. Note that count/train_count_downscale examples are generated for train set, and similarly for val and test set

After setting up curriculum.json, run

cd data_generation
./construct_dataset.sh

Object Detector Integration

TODO

Acknowledgements

This work uses and adapts code from the following open-source projects

NSCL

Repo: https://github.com/vacancy/NSCL-PyTorch-Release
License: MIT

Cliport (adapted for baseline)

Repo: https://github.com/cliport/cliport
License: Apache

Citation

@inproceedings{Kalithasan2023NSRM,
	title={{Learning Neuro-symbolic Programs for Language Guided Robot Manipulation}},
	author={Kalithasan, Namasivayam and Singh, Himanshu and Bindal, Vishal and Tuli, Arnav and Agrawal, Vishwajeet and Jain, Rahul and Singla, Parag and Paul, Rohan},
	booktitle={IEEE International Conference on Robotics and Automation},
	year={2023}
}

About

NSRM: Neuro-Symbolic Robot Manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published