The image shown above is created by Dall·E from OpenAI using a text prompt - 'An illustration of a baby penguin with headphones riding a car'
As you can see, results are truly amazing! Dall·E is a version of GPT-3 which is trained to generate images from text prompts.
Unfortunately, Dall·E is a walled garden from OpenAI and available only to a select few. Even if you can get access, you might not wanna build a business around something which is controlled entirely by a single entity.
However, Dall·E research paper have shown us how and whats possible. Fortunately we have open source projects like Dall·E Mini and Dall·E Flow available now.
In this post we will have a brief look at Dall·E Mini and Dall·E Flow, their capabilities and hopefully generate some product ideas. In the next few weeks I will be experimenting more and will write a more detailed post.
DALL·E mini is available on github under the most generous Apache-2.0 open source license. Even though Dall·E mini model is still in training, you can run your service with the pre trained (partially trained) model available today.
Images produced by Dall·E mini are good but not of high resolution. Here is an output from Dall·E mini using the same prompt - 'An illustration of a baby penguin with headphones riding a car'
You can try out a demo of Dall·E mini here.
Dall·E Flow is also available on github under the same Apache-2.o open source license. Dall·E Flow is based on Dall·E mini but introduces a 'human in the loop' concept where a user gets to choose from available images or styles during the generation process making it interactive.
Images produced by Dall·E Flow are of high quality since it defuses and upscale images during the process.
Both DALL·E Mini and DALL·E Flow has its place, if we are looking to build something to inspire creativity or remove the mental block then Dall·E Mini would suffice and will be much cheaper to run as a service. If you are looking to build a service like Canva, then DALL·E Flow is the way.
For any product to succeed in this space, it has to be focussed on solving a specific problem otherwise it will become just another playground.
Services that comes to mind
- A logo creator - point to a website or give a text to generate a finished logo
- A landing page creator - Given a text with some emotional context, generate a landing page design
- Service for illustrators where they can generate cartoon strips, caricature or any other illustration like undraw.co
I think logo generation might seem small or gimmicky, but it has the biggest potential.
In terms of infrastructure cost, with the advent services like google GPU / TPU clouds and Amazon GPU instances, you no longer have to spend money on buying expensive hardware. Once model is trained, inference (generating images in this case) is relatively cheaper to run.
I hope this post has inspired you in some way.