What is DALLE and what is it capable of?

3 min read

DALL-E, along with DALL-E 2 and DALL-E 3, are neural network models developed by OpenAI that create images from text descriptions given by users. Each successive version improved on the previous one, generating more detailed and realistic images.

Capabilities

DALL-E is capable of generating images in various styles, including photorealistic images, paintings, and cartoons. It can add appropriate details such as shadows without explicit instructions.

From Wikipedia: An image generated by DALL-E 3 based on the text prompt "An illustration of an avocado sitting in a therapist's chair, saying 'I just feel so empty inside' with a pit-sized hole in its center. The therapist, a spoon, scribbles notes."

A notable feature of DALL-E and other image generation models is “inpainting”, which uses an image’s context to “fill in” the missing areas. The result usually has characteristics (art style, textures, tones) consistent with the original.

Example of inpainting from OpenAI’s DALLE-2 video.

DALL-E can also produce new images with similar subjects and styles based on an input image.

DALL-E 2 variations on “The Son of Man” by René Magritte

Comparison

Compared to other image generation models, DALL-E can be accessed with ease using OpenAI’s API.

Unlike models such as Stable Diffusion, it is not open-source. This allows OpenAI to restrict the content it creates or provides to the clients. It also prevents DALL-E from being fine-tuned by users.

Concerns

There have been some ethical concerns regarding DALL-E and image generating models in general, such as exacerbating gender and racial biases. These biases result from imbalanced representations in public datasets used to train DALL-E. The company’s own risks and limitations document gives examples of words like “assistant” and “flight attendant” generating images of women and words like “CEO” and “builder” almost exclusively generating images of white men.

Another concern is that they could cause technological unemployment for artists, photographers, and graphic designers. Clients may choose to use AI-generated content instead of commissioning an artist, due to its lower pricing, convenience, or popularity.

There is also an ongoing debate about the nature of DALL-E models and their data usage. The models’ datasets were scraped mostly from public image sharing websites such as Twitter or Pixiv, and some artists have voiced concerns about the collection and utilization of their art without explicit permission. Recently, DALL-E has introduced an “opt-out” mechanism for artists who do not want their creations to be used for the training of future image generation models.

Trivia

The name “DALL-E” is a combination of the Pixar robot WALL-E and surrealist artist Salvador Dalí.

How could something like ChatGPT be dangerous?

What is GATO and what is it capable of?

What is Minerva and what is it capable of?