It’s no secret that large models such as the DALL-E 2 and Imagen, trained on multiple documents and images from the web, absorb the best as well as the worst of this data. OpenAI and Google openly acknowledge this.
scroll down Imagen website— next to the dragon fruit wearing the karate belt and the little cactus wearing the hat and sunglasses, when you come to the social impact section you get this: images and poisonous language, we used it too [the] The LAION-400M dataset, known to contain a wide variety of inappropriate content, including pornographic images, racial slurs, and harmful social stereotypes. Imagen relies on text encoders trained on unsmoothed web-scale data and therefore inherits the social biases and limitations of major language models. Therefore, there is a risk that Imagen encodes harmful stereotypes and representations that guide our decision not to release it for public use without further safeguards.”
This is the same admission OpenAI made when it came out. GPT-3 In 2019: “Internet-trained models have Internet-scale biases.” And, as Mike Cook, researcher on artificial intelligence creativity at Queen Mary University of London, points out, in the ethics statements accompanying Google’s broad language model PaLM and OpenAI’s DALL-E 2. they are capable of producing terrible content and have no idea how to fix it.