[ad_1]
Please consider to support MIT Technology Review journalism subscribe to.
Diffusion models are trained on completely distorted images with random pixels. They learn to return these images to their original state. No images available on DALL-E 2. So the diffusion model takes random pixels and, driven by CLIP, transforms it into a brand new image created from scratch that matches the text prompt.
The diffusion model enables DALL-E 2 to produce high resolution images faster than DALL-E. “This makes it much more practical and enjoyable to use,” says OpenAI’s Aditya Ramesh.
In the demo, Ramesh and colleagues showed me pictures of a hedgehog using a calculator, a corgi and a panda playing chess, and a cat dressed as Napoleon holding a piece of cheese. I draw attention to the strange cast of subjects. “It’s easy to spend an entire workday thinking about prompts,” he says.
![](https://wp.technologyreview.com/wp-content/uploads/2022/04/otter-ibis.jpg)
DALL-E 2 still fails. For example, he may struggle with a prompt asking him to combine two or more objects with two or more attributes, such as “a red cube on top of a blue cube.” OpenAI thinks this is because CLIP doesn’t always bind attributes to objects correctly.
Besides changing the text prompts, the DALL-E 2 can also extract variations of existing images. Ramesh takes a photo of street art outside his apartment. The AI immediately starts creating alternate versions of the scene with different artwork on the wall. Each of these new images can be used to start its own variation series. “This feedback loop can be really helpful for designers and artists,” says Ramesh.
user attention
The DALL-E 2 looks like a much brighter product than its predecessor. That wasn’t the point, says Ramesh. However, OpenAI plans to release DALL-E 2 to the public after an initial rollout to a small group of trusted users, as it did with GPT-3.
GPT-3 can generate poison text. However, OpenAI says it uses feedback from GPT-3 users to train a more secure version. InstructGPT. The company hopes to follow a similar path with the DALL-E 2, which will also be shaped by user feedback. OpenAI will encourage early users to crack the AI and trick it into creating offensive or harmful images. As OpenAI works on these issues, it will begin making DALL-E 2 available to a wider group of people.
OpenAI is also issuing a user policy for DALL-E that prohibits asking the AI to create offensive images (not containing violence or pornography) and political images. To prevent deep frauds, users will not be allowed to ask DALL-E to create images of real people.
Alongside the user policy, OpenAI has removed certain types of images from DALL-E 2’s training data, including those showing graphic violence. OpenAI also says it will eventually pay human moderators to review every image created on its platform.
“Our main goal here is to get a lot of feedback for the system before we start sharing it more broadly,” says OpenAI’s Prafulla Dhariwal. “Hopefully it gets rolled out eventually so developers can build apps on it.”
original genius
Highly skilled AIs that can see the world and work with concepts multiple modalitiesIt is a step towards more general-purpose intelligence—like language and vision. DALL-E 2 is one of the best examples ever.
But while Etzioni is impressed with the images produced by the DALL-E 2, he is cautious about what this means for the overall development of AI. “This kind of improvement doesn’t get us any closer to AGI,” he says. “We already know that AI is quite capable of solving narrow tasks using deep learning. But it is the people who formulate these tasks and give the walking orders to deep learning.”
For Mark Riedl, an artificial intelligence researcher at Georgia Tech in Atlanta, creativity is a good way to measure intelligence. Unlike the Turing test, which requires a machine to trick a human through speech, Riedl’s Lovelace 2.0 test evaluates a machine’s intelligence by how well it responds to requests to create something like “A picture of a penguin in a spacesuit on Mars.” ”
The DALL-E scores well in this test. But intelligence is a sliding scale. As we build better and better machines, our intelligence tests need to adapt. Many chatbots are now very good at mimicking human speech, narrowly passing the Turing test. Them still mindlessHowever.
[ad_2]
Source link