GLIDE is a generative model based on text-guided diffusion models for more photorealistic image generation. Guided diffusion is applied to text-conditional image synthesis and the model is able to handle free-form prompts. The diffusion model uses a text encoder to condition on natural language descriptions. The model is provided with editing capabilities in addition to zero-shot generation, allowing for iterative improvement of model samples to match more complex prompts. The model is fine-tuned to perform image inpainting.

Deep Neural Networks (DNNs) are typically trained by backpropagation in a batch learning setting, which requires the entire training data to be made available prior to the learning task. This is not scalable for many real-world scenarios where new data arrives sequentially in a stream form. We aim to address an open challenge of "Online Deep Learning" (ODL) for learning DNNs on the fly in an online setting. Unlike traditional online learning that often optimizes some convex objective function with respect to a shallow model (e.g., a linear/kernel-based hypothesis), ODL is significantly more challenging since the optimization of the DNN objective function is non-convex, and regular backpropagation does not work well in practice, especially for online learning settings.

Online Deep Learning: Learning Deep Neural Networks on the Fly

GLIDE

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

**Dilated Bottleneck Block** is an image model block used in the [DetNet](https://paperswithcode.com/method/detnet) convolutional neural network architecture. It employs a bottleneck structure with dilated convolutions to efficiently enlarge the receptive field.

Source	GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Guided Language to Image Diffusion for Generation and Editing?

Viet-Anh on Software