DALL-E-2 takes the spotlight. In April of this year, OpenAI released DALL-E 2, a tool that has gained a lot of attention thanks to the striking visuals it creates (see Figure 1). The waitlist for access was removed in September, and now over 1.5 million people are using the app, creating over 2 million images a day.3
A host of additional tools rush in. DALL-E 2’s appearance was quickly followed by Google’s Imagen and Parti, Meta’s Make-A-Scene, Midjourney, and Stable Diffusion, an open-source system that the startup Stability AI launched in August (see Figure 2). Previously called DALL-E Mini, Craiyon, which is free and open-source, was inspired by DALL-E and created as part of a coding competition.4 Even TikTok joined in, rolling out a basic in-app text-to-image AI generator that lets users type in a prompt and produce an image that can be used as the background in their videos.5
Applications continue to flourish. An upcoming Figma plug-in uses Stable Diffusion to create design ideas; just add some simple shapes and describe your idea and get the generated image.6 Design strategist and film maker Sarah Drummond has been experimenting with AI image generators for storyboarding.7 NovelAI is experimenting with Stable Diffusion to produce art that can accompany the AI-generated stories created by users on its platform. In September, IT professional Kevin Hess released a 706-page graphic novel adaptation of his favorite book Star Maker.8
Even for AI analysts, the pace of change in this space has been remarkable. With technology improving and the user population expanding, we can expect to see:
New paths to monetizing art. Already, a search on stock photography website Shutterstock for “AI generated” returns over 22,000 images for purchase. A new startup PromptBase, which launched in June, lets “prompt engineers” cash in with an online marketplace that sells finely tuned phrases.9 PromptBase currently hosts phrases tested on DALL-E 2, GPT-3, Midjourney, and Stable Diffusion.
Updated business models. A reconstructive surgeon is using DALL-E to help his patients visualize results.10 Stitch Fix is experimenting with DALL-E 2 to visualize its products based on specific characteristics like color, fabric, and style. For example, if a Stitch Fix customer asked for a “high-rise, red, stretchy, skinny jean”, DALL-E 2 would generate images of that item, which a stylist could use to match with a similar product in Stitch Fix’s inventory.11
Text-to-video AI systems. Next up: text to video. Meta recently unveiled a new AI, Make-A-Video, which turns text prompts into 5-second videos.12 Google quickly followed with the announcement of their own text-to-video AI system, Imagen Video. RunwayML is researching text-to-video editing enabled by Stable Diffusion.13 A recent tweet shows a changing background as someone plays tennis, based on changing text.14 Israeli AI company D-ID is launching a platform Creative Reality Studio where users can upload a single image and text and generate video.15 Some digital artists are experimenting with DALL-E in conjunction with other tools to get videos of generated fashion to brainstorm costume and fashion design ideas.16 German tech entrepreneur Fabian Stelzer is creating a 70s-style sci-fi film “Salt” using AI image generators, in addition to AI voice generators.17 In August, Alphabet’s DeepMind unveiled Transframer, which can generate a 30-second video from a single image input.18 Expect to see many more tools – and more improvements to existing tools – enabling both text-to-video and image-to-video.
Why It Matters
It’s still early days for AI image generation, and multiple challenges remain. In July, users of DALL-E 2 were granted the right to use their generations for commercial projects, raising questions about legal implications (it’s trained on public images from the web).19 Human artists have expressed concerns after artwork generated by Midjourney won first place in a Colorado State Fair’s fine art competition in the digital art category.20 And, like other AI, there’s the issue of bias.21 While OpenAI is working to reduce sexism, racism, and misinformation in the visuals their app generates, Stability AI is taking more of an “anything goes” approach to image generation.22
However, even with these challenges, the quick advances of this technology demonstrate that we need to experiment now. Sophisticated AI image generation will mean large changes to how companies think about:
Marketing and design. Already magazine covers are being generated with AI.23 In fact, in our own FCAT research, we used DALL-E 2 to create images for profiles of people with diversified income sources (see Figure 3). The founder of StabilityAI said that the company is already working on using this technology as an automatic PowerPoint generator.24 While the images generated today aren’t perfect, they can be very useful for someone without design skills or even for designers for quick prototyping and brainstorming of ideas.
The workplace.A Brooklyn-based artist is using DALL-E to reimagine roadways to be more friendly to pedestrians and bikes.25 Some architects have begun to use these tools for visualizing early-stage concepts and even giving their non-designer clients a way to play a more active role in the design process.26 Global architecture firm HDR is using Midjourney and DALL-E 2 on actual projects, including a community center in Toronto. As companies imagine how the workplace will evolve with more flexible working patterns, these tools might give us new ways to collect and share novel designs from employees as well as designers.
Their future business. Many aspects of a business can be reimagined with this technology. Image generation tools can help with suggestions on websites and app design, whether for brainstorming internally or directly with customers. ABtesting.ai uses GPT-3 to automate text suggestions for landing webpages and to help users create entirely new designs for A/B testing. They are currently working on expanding this to include image generation as well. In addition, as companies explore the metaverse, this technology can be a helpful tool (see Figure 4). In September, Google researchers announced DreamFusion, which used a pre-trained 2D text-to-image diffusion model to perform text-to-3D synthesis.27 Perhaps future enhancement to these tools, especially as text-to-3D becomes more mainstream, can allow companies to rethink how they visualize data, customer journeys, or even education, such as financial literacy efforts.
21 A new AI draws delightful and not-so-delightful images. https://www.vox.com/future-perfect/23023538/ai-dalle-2-openai-bias-gpt-3-incentives