Stability AI Unveils the Advanced DeepFloyd IF AI Image Generator

Stability AI releases DeepFloyd IF AI image generator

Stability AI has released a new AI image generator called DeepFloyd IF that is specifically designed to address one of the current generation’s shortcomings. While AI-supported image generators are capable of great things, they often struggle with generating lettering in images, such as “bar and restaurant” or “hotel, no vacancy” on a building. This is where DeepFloyd IF comes in, as it is designed to produce photorealistic representations with lettering that is suitable for graphics tasks, including logo design.

DeepFloyd, the AI laboratory behind the AI image generator If, is supported by Stability AI, a software manufacturer known for developing the image generator Stable Diffusion. The algorithm If’s name is borrowed from the Pink Floyd hit of the same name, and its modified lyrics reflect its potential: “If I was a model, I’d be open source.” DeepFloyd has also developed RU Dall-E, which is the Russian copy of the similarly named image generator.

If is based on Google’s unreleased AI image generator Imagen and has a completely different architecture than Stable Diffusion. It contains a large language model (LLM) connected to a cascading pixel diffusion model, where T5-XXL-1.1 is used as the language model. Google Research has also released the English-language LLM as open source, which can help understand the prompt better. Other image generators often rely on the AI model CLIP with multimodal architecture.

If generates an image with dimensions of 64 x 64 pixels from the prompt. There are three basic models: IF-I 400M, IF-I 900M, and IF-I 4.3B, each with different numbers of parameters. Two super-resolution work steps follow, which add further details until it reaches the native resolution of 1024 x 1024 pixels. Two models are available for the first super-resolution level (IF-II 400M and IF-II 1.2B) and one for the second stage (IF-III 700M).

The image generator is trained using the LAION-A dataset, with a total of 4.3 billion parameters. For comparison, Midjourney V5 was trained using 5 billion image parameters, and Stable Diffusion XL used 2.1 billion parameters. Currently, DeepFloyd IF is available for research purposes only and cannot be used for commercial gain. However, the software can be downloaded on Github for further exploration.

Leave a Reply