Recommended papers for Introduction to Multimodal Generative Models
Year | Title | Venue | Paper | Code |
---|---|---|---|---|
2023 | Multimodal Image Synthesis and Editing: The Generative AI Era | TPAMI 2023 | https://arxiv.org/abs/2112.13592 | |
2023 | Text-to-image Diffusion Models in Generative AI: A Survey | https://arxiv.org/abs/2303.07909 | ||
2023 | Vision + Language Applications: A Survey | GCV@CVPR2023 | https://arxiv.org/abs/2305.14598 |
Title | Venue | Code link | Paper link | Year |
---|---|---|---|---|
High Resolution Image Synthesis with Latent Diffusion Models | CVPR | https://github.com/CompVis/latent-diffusion | https://arxiv.org/abs/2112.10752 | 2022 |
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing | https://github.com/salesforce/LAVIS/blob/59273f651b9bffb193d1b12a235e909e9f826dda/projects/blip-diffusion/README.md | https://arxiv.org/abs/2305.14720 | 2023 | |
3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models | https://arxiv.org/abs/2212.00842 | 2022 | ||
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models | https://github.com/JiauZhang/hyperdreambooth | https://arxiv.org/abs/2307.06949 | 2023 | |
Magic3D : High-Resolution Text-to-3D Content Creation | CVPR | https://arxiv.org/abs/2211.10440v2 | 2023 | |
DreamBooth3D: Subject-Driven Text-to-3D Generation | ICCV | https://arxiv.org/abs/2303.13508 | 2023 | |
Conditional Text Image Generation With Diffusion Models | https://arxiv.org/abs/2306.10804 | 2023 | ||
DCFace: Synthetic Face Generation with Dual Condition Diffusion Model | CVPR | https://github.com/mk-minchul/dcface | https://arxiv.org/abs/2304.07060 | 2023 |
3D Neural Field Generation using Triplane Diffusion | CVPR | https://github.com/JRyanShue/NFD | https://arxiv.org/abs/2211.16677v1 | 2023 |
DiffCollage: Parallel Generation of Large Content With Diffusion Models | CVPR | https://github.com/sbyebss/DiffCollage | https://arxiv.org/abs/2303.17076v1 | 2023 |
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models | CVPR | https://arxiv.org/abs/2212.14704 | 2023 | |
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation | CVPR | https://github.com/google/dreambooth | https://arxiv.org/abs/2208.12242 | 2023 |
LayoutDiffusion: Controllable Diffusion Model for Layout-to-Image Generation | CVPR | https://github.com/zgctroy/layoutdiffusion | https://arxiv.org/abs/2303.17189 | 2023 |
LayoutDM: Transformer-Based Diffusion Model for Layout Generation | CVPR | https://arxiv.org/abs/2305.02567 | 2023 | |
NeuralField-LDM: Scene Generation With Hierarchical Latent Diffusion Models | CVPR | https://arxiv.org/abs/2304.09787 | 2023 | |
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation | CVPR | https://github.com/pals-ttic/sjc/ | https://arxiv.org/abs/2212.00774 | 2023 |
SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation | CVPR | https://github.com/yccyenchicheng/SDFusion | https://arxiv.org/abs/2212.04493 | 2023 |
Shifted Diffusion for Text-to-Image Generation | CVPR | https://github.com/drboog/Shifted_Diffusion | https://arxiv.org/abs/2211.15388 | 2023 |
SINE: SINgle Image Editing with Text-to-Image Diffusion Models | CVPR | https://github.com/zhang-zx/sine | https://arxiv.org/abs/2212.04489 | 2023 |
SparseFusion: Distilling View-Conditioned Diffusion for 3D Reconstruction | ICCV | https://github.com/yichen928/sparsefusion | https://arxiv.org/abs/2304.14340 | 2023 |
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models | CVPR | https://github.com/ucsb-nlp-chang/diffusiondisentanglement | https://arxiv.org/abs/2212.08698 | 2023 |
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation | CVPR | https://arxiv.org/abs/2303.08320 | 2023 | |
Multi-Concept Customization of Text-to-Image Diffusion | CVPR | https://github.com/adobe-research/custom-diffusion | https://arxiv.org/abs/2212.04488 | 2023 |
RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation | CVPR | https://github.com/Anciukevicius/RenderDiffusion | https://arxiv.org/abs/2211.09869 | 2023 |
Year | Title | Venue | Paper | Code |
---|---|---|---|---|
2021 | FENeRF: Face Editing in Neural Radiance Fields | CVPR | https://arxiv.org/abs/2111.15490 | https://github.com/MrTornado24/FENeRF |
2021 | StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis | ICLR | https://arxiv.org/abs/2110.08985 | https://github.com/facebookresearch/StyleNeRF |
2022 | 3DMM-RF: Convolutional Radiance Fields for 3D Face Modeling | https://arxiv.org/abs/2209.07366 | ||
2021 | MoFaNeRF: Morphable Facial Neural Radiance Field | ECCV | https://arxiv.org/abs/2112.02308 | https://github.com/zhuhao-nju/mofanerf |
2023 | ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field | CVPR | https://arxiv.org/abs/2303.13817 | https://github.com/TangZJ/able-nerf |
2022 | CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields | CVPR | https://arxiv.org/abs/2112.05139 | https://github.com/cassiePython/CLIPNeRF |
2023 | Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields | ICCV | https://arxiv.org/abs/2306.12760 | https://github.com/orig333/Blended-NeRF |
2023 | 3D-Aware Multi-Class Image-to-Image Translation with NeRFs | CVPR | https://arxiv.org/abs/2303.15012 | https://github.com/sen-mao/3di2i-translation |
Year | Title | Venue | arxiv link | github link |
---|---|---|---|---|
2022 | Zero-Shot Text-Guided Object Generation with Dream Fields | CVPR | https://arxiv.org/abs/2112.01455 | https://github.com/ashawkey/dreamfields-torch |
2022 | CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields | CVPR | https://arxiv.org/abs/2112.05139 | https://github.com/cassiePython/CLIPNeRF |
2022 | DreamFusion: Text-to-3D using 2D Diffusion | arXiv | https://arxiv.org/abs/2209.14988 | |
2023 | NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views | CVPR | https://arxiv.org/abs/2211.16431 | https://github.com/VITA-Group/NeuralLift-360 |
2023 | Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures | CVPR | https://arxiv.org/abs/2211.07600 | https://github.com/eladrich/latent-nerf |
2023 | SKED: Sketch-guided Text-based 3D Editing | ICCV | https://arxiv.org/abs/2303.10735 | https://github.com/aryanmikaeili/SKED |
2023 | 3D-CLFusion: Fast Text-to-3D Rendering with Contrastive Latent Diffusion | arXiv | https://arxiv.org/abs/2303.11938 | |
2023 | Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions | ICCV | https://arxiv.org/abs/2303.12789 | https://github.com/ayaanzhaque/instruct-nerf2nerf |
2023 | CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout | arXiv | https://arxiv.org/abs/2303.13843 | |
2023 | DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models | arXiv | https://arxiv.org/abs/2304.00916 | |
2023 | DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model | arXiv | https://arxiv.org/abs/2304.02827 | https://github.com/janeyeon/ditto-nerf-code |
2023 | Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields | arXiv | https://arxiv.org/abs/2305.11588 | https://github.com/eckertzhang/Text2NeRF |
2023 | Towards Language-guided Interactive 3D Generation: LLMs as Layout Interpreter with Generative Feedback | arXiv | https://arxiv.org/abs/2305.15808 |