The models are comparable to the DALL-E 2 system but have been called more complex and creative.
Researchers from MIT have developed a new way to use multiple models to create a more complex AI image with enhanced understanding.
The internet has had a field day using the DALL-E artificial intelligence-based software to create pictures.
DALL-E 2, with a name combination inspired by the adorable WALL-E robot and human artist Salvador Dali, has become an online sensation with its AI image creation capacity for using natural language to create a picture of whatever a person describes. All users need to do is provide a brief description of what they would like the final picture to contain, and the software generates vivid pictures to represent the descriptions.
It uses a diffusion model, through which it seeks to encode the entire text of a user’s description for the generation of a picture. That said, once the text becomes detailed, it becomes challenging for one description to include all the desired components. Moreover, while diffusion models are quite flexible, they can struggle to understand how certain concepts are composed, such as mixing up various attributes or the relationship between different described objects.
MIT’s AI image generation software is meant to create more complex pictures based on better understanding.
Scientists from the Computer Science and Artificial Intelligence Laboratory (CSAIL) worked to improve the complexity of images through improved understanding by structuring the typical model from a different perspective.
They combined a series of models, making it possible for the models to work together to generate the requested images while capturing a spectrum of different described aspects via labels or input text. To form a picture based on two components, for instance based on a two-sentence description, each model would handle a separate component of the final picture.
In this way, it essentially begins by creating a “bad” AI image and then refines it gradually until it results in the image that was described. By combining several models, they refine the appearance of the picture together at each step. That way, the final result brings together all the components of the description.