@Charlie Cheng-Jie Ji

Hi, I’m Charlie Cheng-Jie Ji 季诚杰 (you can call me Charlie Ji in short) a rising third-year undergrad at UC Berkeley, majoring in Computer Science and Data Science. More information about myself is on my personal website and blog.

I’m the author of this article. If you have any questions or feedback, please send an email to [email protected]. If you feel this article helped you, you want additional advice or discuss opportunities to collaborate, feel free to let me know and connect with me here 👇 (I love making new personal connections! You will certainly make my day if you do so 😀)

Instagram, WeChat, Email, LinkedIn, Discord

(I currently use Instagram and Wechat the most, Email and LinkedIn second most)

1. Introduction

从2021年初的DALL-E到2022年9月的基于Latent Diffusion Model优化的Stable Diffusion，全球掀起了一波AI生成图像的热潮。许多行业基于图像生成模型衍生出许多use cases，比如市场营销品牌推广 https://www.zmo.ai/ https://hexiangteng.github.io/papers/CVPR 2023.pdf，游戏产业（比如游戏领域的各类asset generation），美术制作，娱乐行业（比如各类人脸生成试编辑等等）3D模型绘图等等。许多创投公司纷纷对AIGC对各行各业的发展做出预测，尤其是游戏产业，a16z 早在2022年11月做出了详细报告 https://a16z.com/2022/11/17/the-generative-ai-revolution-in-games。游戏开发门槛和周期会不断下降，带来的游戏更多的冒险和创造性探索等，新的游戏类型等都是这一波风暴会带来的改变。因为对于图像生成在游戏领域的革新我非常激动。

这篇文章综述报告会包含一个详尽的对于在扩散模型前的生成式AI（比如GAN和GAN的变种，借助LLM的生成方式），扩散模型，可控扩散模型，新的2023年对于扩散模型的创新概述。

Credit: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

2. 扩散模型前的生成式AI (2014 - 2020)

这块内容的具体信息是基于 Stanford CS 236G Generative Adversarial Network 课件内容。这里做一些背景介绍因为在最新的Latent Diffusion Model中有去用到扩散模型前的模型架构，从而做数据压缩的工作，后续会做更详细的介绍

<aside> 💡 GAN specialization

Generative Adversarial Networks (GANs)

</aside>

GAN是早在2014年Ian GoodFellow和他的同事们发明的，GAN生成的过程是一种博弈来得到目标的学习方法，有一个Generative Network和一个Discriminative Model，Discriminative Model的目标是当给它们展示一个生成图像和真实的数据集的例子时，它能够识别这个实例是真实的。Generative Model的目标是从一个D-dimensional noise vector 生成能够通过生成出来的图像，去欺骗判别器。这是一个min max optimization problem formulation。通过来回iteratively的训练从而能够训练出一个足够好的Generative Network可以用在下游应用中。

Credit：https://zhuanlan.zhihu.com/p/42606381

Untitled

AE（Autoencoder）和VAE （Variational Autoencoder）是另一个生成式家族里的成员

AE和VAE主要用于数据压缩的应用，他们的训练方式主要是

Untitled

如图所示，AE的训练方式是从input image x通过Encoder，得到low dimension latent vector z，再从latent vector z decode to high dimension image x_hat的一个过程。训练AE的方式主要是minimize reconstructive loss，也就是input 和 reconstructed input之间的difference，从而能训练Encoder来压缩高维图像和配套的Decoder来重建latent vector z。

但是AE有很明显的弊端，也就是AE的latent space没有regularize，相似的图像可以在latent space中的距离相差的很远，并且并不是每一个latent vector能重建出有意义的reconstructed input。