The organization of the training data is easy. We only need to put all the videos recursively in a directory. This makes the training more convenient when using multiple datasets.
Training Dataset
|——sub_dataset1
|——sub_sub_dataset1
|——video1.mp4
|——video2.mp4
......
|——sub_sub_dataset2
|——video3.mp4
|——video4.mp4
......
|——sub_dataset2
|——video5.mp4
|——video6.mp4
......
|——video7.mp4
|——video8.mp4
bash scripts/causalvae/train.sh
We introduce the important args for training.
Argparse | Usage |
---|---|
Training size | |
--num_frames |
The number of using frames for training videos |
--resolution |
The resolution of the input to the VAE |
--batch_size |
The local batch size in each GPU |
--sample_rate |
The frame interval of when loading training videos |
Data processing | |
--video_path |
/path/to/dataset |
Load weights | |
--model_name |
CausalVAE or WFVAE |
--model_config |
/path/to/config.json The model config of VAE. If you want to train from scratch use this parameter. |
--pretrained_model_name_or_path |
A directory containing a model checkpoint and its config. Using this parameter will only load its weight but not load the state of the optimizer |
--resume_from_checkpoint |
/path/to/checkpoint It will resume the training process from the checkpoint including the weight and the optimizer. |
bash scripts/causalvae/rec_video.sh
We introduce the important args for inference.
Argparse | Usage |
---|---|
Ouoput video size | |
--num_frames |
The number of frames of generated videos |
--height |
The resolution of generated videos |
--width |
The resolution of generated videos |
Data processing | |
--video_path |
The path to the original video |
--rec_path |
The path to the generated video |
Load weights | |
--ae_path |
/path/to/model_dir. A directory containing the checkpoint of VAE is used for inference and its model config.json |
Other | |
--enable_tilintg |
Use tiling to deal with videos of high resolution and long duration |
--save_memory |
Save memory to inference but lightly influence quality |
The evaluation process consists of two steps:
Reconstruct videos in batches: bash scripts/causalvae/prepare_eval.sh
Evaluate video metrics: bash scripts/causalvae/eval.sh
To simplify the evaluation, environment variables are used for control. For step 1 (bash scripts/causalvae/prepare_eval.sh
):
# Experiment name
EXP_NAME=wfvae
# Video parameters
SAMPLE_RATE=1
NUM_FRAMES=33
RESOLUTION=256
# Model weights
CKPT=ckpt
# Select subset size (0 for full set)
SUBSET_SIZE=0
# Dataset directory
DATASET_DIR=test_video
For step 2 (scripts/causalvae/eval.sh
):
# Experiment name
EXP_NAME=wfvae-4dim
# Video parameters
SAMPLE_RATE=1
NUM_FRAMES=33
RESOLUTION=256
# Evaluation metric
METRIC=lpips
# Select subset size (0 for full set)
SUBSET_SIZE=0
# Path to the ground truth videos, which can be saved during video reconstruction by setting `--output_origin`
ORIGIN_DIR=video_gen/${EXP_NAME}_sr${SAMPLE_RATE}_nf${NUM_FRAMES}_res${RESOLUTION}_subset${SUBSET_SIZE}/origin
# Path to the reconstructed videos
RECON_DIR=video_gen/${EXP_NAME}_sr${SAMPLE_RATE}_nf${NUM_FRAMES}_res${RESOLUTION}_subset${SUBSET_SIZE}