Meta, the parent company of Instagram and Facebook, has unveiled a new AI tool called CM3leon, which the company touts as a “cutting-edge generative model for text and images.”
The company announced CM3leon (pronounced “chameleon”) in a blog post as he simultaneously published a white paper on the technological developments of the tool. Meta did not reveal when and if CM3leon would be released to the public.
Yet Meta’s research marks a significant breakthrough in the creation of multimodal models, which can generate text and images.
Currently, there is a gap between AI image generators and AI text generators, like OpenAI’s ChatGPT. Merging the two has been difficult, and although OpenAI released its multimodal GPT-4 in March, AI developers haven’t had much success.
Meta’s tool breaks this division down with a template that allows text and image input and generation, and allows for the creation of captions (or image-to-text generation) and images with a “super- resolution”.
Most AI image generators on the market, such as OpenAI’s Stable Diffusion and DALL-E, use diffusion models to generate images, a process that adds and removes Gaussian noise from training data.
The company’s process uses a technique called supervised fine-tuning to train text transformer models using a dataset of imagery and captions licensed from Shutterstock, enabling analysis of text and objects. complex to better track a user’s inputs.
“Supervised fine-tuning is essential in training large language models like ChatGPT. Despite this, its application in multimodal contexts remains largely unexplored,” the Meta researchers wrote in their paper.
The result is text-to-image generation, which results in “more consistent images that follow input prompts better,” according to Meta. In its announcement, the company included highly compositional examples that the generator produced based on prompts such as “a small cactus wearing a straw hat and neon sunglasses in the Sahara desert.”
Most notably, the model was able to generate a fairly realistic human hand, except for a few issues, which the AI generators have historically struggling with.
Meta’s CM3leon also offers better performance in text guidance editing, i.e. using text prompts to direct what the tool should add or remove in an image, than previous models like InstructPix2Pix, due to its ability to recognize both text and visual content.
In the blog post, Meta demonstrated the tool’s text-guided image editing capabilities using Vermeer A girl with an earring (circa 1665) as the initial input, then generating images using text such as “put on a pair of sunglasses” and “she should look 100 years old”.
Meanwhile, CM3leon is able to do this with a massive reduction in computing power from other processor models – an achievement that could create greater fairness in the AI space, one of the many challenges that criticisms have been voiced about artificial intelligence. CM3leon uses five times less computing power than similar models.
Meta has received praise online for fully licensing its data set from Shutterstock, a move the company says “demonstrates that strong performance is possible with a data distribution very different from that used by all previous models. “.
“By making our work transparent,” the blog continues, “we hope to encourage collaboration and innovation in the field of generative AI.”
Over the past few months, Meta has introduced a number of generative AI features to its platforms, including AI generated stickers for Messenger, a AI sandbox for Facebook advertisers, and a AI-powered video generation system.
More trending stories:
Influencers are realizing that AI may not be a magic money-making machine for artists after all
Follow Artnet News on Facebook:
Want to stay one step ahead of the art world? Subscribe to our newsletter to receive breaking news, revealing interviews and incisive reviews that move the conversation forward.