On the proliferation of images generated by machine-learning models and what they mean for the future of art
By K Allado-McDowell
A new era of art is dawning, characterized by intelligent machines capable of extending human creativity in ways difficult even to imagine. In the past two years alone, we have witnessed the emergence of a revolutionary new class of generative AI systems, named DALL-E, Midjourney and Stable Diffusion. These tools combine online image archives and deep neural networks that learn from massive collections of data, enabling anyone to render powerfully unexpected images from text descriptions and thereby become an instant self-styled digital artist or art director.
It is not an exaggeration to say that we have entered the age of AI art. The possibility has glimmered on the edge of culture for almost a decade (Artists + Machine Intelligence, the program I established at Google AI in 2016, is one of several art and technology programs exploring artistic applications of AI), but never before have we found ourselves so awash in AI images. Text-to-image engines are uniquely 21st-century tools, setting the texture and tone of an art to come. Here, I catalogue four of their side effects: hallucination, hybridization, mutated language and possession.
The systems accept text prompts and output pictures, translating word into image. They are trained on massive datasets, licensed or unlicensed, often scraped from the web. Engineers call the images they produce hallucinations because they are akin to the hallucinations generated by the human brain, in which a neural system is likewise activated to produce internal images. In this sense, these images are real yet not-real. They are statistical, in other words, likely to occur while not being based on any one actual image. Theirs is the fluid malleability of dreams.
Early AI image hallucinators like DeepDream were famous for their psychedelic flavor. But the new tools offer more realism, control, variety and specificity. The first wave of images made with these tools feels like a purge of the surface subconscious of internet culture. When given fine-grained control of AI imagery, users have conjured fantasies of well-known figures in provocative and surreal combinations—“Nosferatu in RuPaul’s Drag Race,” “Minions fighting in the Vietnam War,” “blobfish bubble tea,” “cast-iron skillets in a dishwasher.” The images veer toward online-adjacent pop surrealism and they are vaguely reminiscent of outsider painting and the culture-jamming art of the 1990s.
For a revolutionary technology capable of visualizing nearly anything, the aforementioned kinds of prompts are low-hanging fruit. This may be an effect of the web: Pop icons in subversive scenarios generate likes and shares through a dopaminergic combination of novelty and familiarity. It may also reflect the way AI tools take the hand out of art: Without craft and embodied practice, concepts and descriptions are all that remain. That the new images are largely figurative makes sense: The technical and material interventions that make for compelling abstraction are lacking; there is no brushstroke or squeegee, no chisel, no dodge and burn. AI image-making right now seems more about the what than the how.
And yet, there is a how. It is a how of language and reference. The crudest references are phrased as X in the style of Y, where X is a subject and Y is an artist or historical style (think Kanye West painted by a Dutch master or Donald Duck drawn by Tom of Finland). Specific methods of image-production can be used as style tags. These terms may be painterly: pastel, oil, spray paint. They may be photographic; film grade, f-stop and ISO will result in a photo with celluloid grain. The terms can also be digital; words like “octane,” “blender” and “raytrace” produce slick images associated with 3D software. The more complex the description, the more controlled and intentional the image.
“Conventional artistic materials touch the body but exist outside of it—artwork is imagination externalized. As artists explore the black box of machine imagination, AI image-making inverts this phenomenon.”
At a deeper level, vocabulary unlocks a model’s particular style. Midjourney is mostly trained on art; it does well with elaborate prompts that mix materials, concepts and adjectives. Architects and designers have begun using it to visualize buildings made of trees or interiors made completely of fabric, for example. In the model’s nooks and crannies hide keys to visual phenomena, even complex ones like fractals and high-dimensional space. DALL-E 2, by contrast, is strong with technical concepts applied to diagrams. Could a schematic drawn by AI inspire new approaches to energy storage, physics, ecology?
At the very least, we should expect the dawning of new aesthetics and artistic movements as artists discover linguistic keys to these models. Conventional artistic materials touch the body but exist outside of it—artwork is imagination externalized. As artists explore the black box of machine imagination, AI image-making inverts this phenomenon. Painting and sculpture put artists in feedback loops with physical materials. Conversely, AI image-making relies on abstract forces of language and number, entraining artists to the latent topology that forms the neural net. In much the same way that the physical properties of a chosen medium necessarily shape an artist’s movements and decisions, artistic entrainment reconditions imagination. Wet clay conditions the ceramicist’s gestures; AI systems sculpt the mind through subconscious ingestion of word/image maps. The inner world of the neural net is excavated and mimicked in the artist’s inner world model. I have elsewhere described this relation with AI in terms of the Greek concept pharmakon, the poison that is also a cure. Artists imbibe a toxin when engaging with AI tools. The effects vary depending on dosage and constitution.
“In the 21st century, art will not be the exclusive domain of humans or machines but a practice of weaving together different forms of intelligence.”
For this reason, it’s important to think critically about how AI generators augment, amplify and ultimately colonize human imagination. Variable reward mechanisms in user interfaces (the kind found in every generative AI model) are known to be addictive, and AI outputs are stunning. Yet after continued exposure one begins to notice patterns; each model has its own style that can overpower the individual using it. It starts to feel a little too easy, which can be depressing. One artist I knew recounted how her initial enthusiasm for DALL-E 2 was followed by feelings of disappointment, even a sense of her own redundancy. As these systems mature, the question inevitably arises: Will humans be needed to make the art? And also: Who will own and control the work made with these systems?
The 3D artist David OReilly posted on Instagram:
Of course, every tool is permitted and AI is happening one way or another, but this species of it derives its entire value from the creative work of uncredited and unwilling participants. To highlight the obvious exchange going on; almost everyone who contributed to the value of AI image- generation is now being exploited by it. . . . If [OpenAI] had any respect for their sources they would credit them in proportion to their contribution, and would never declare ownership over the resulting works. As long as their business model is selling tickets to take weird photos of big data, they are no better than grave robbers.
OpenAI has since retracted its claim of ownership over artworks created by DALL-E 2. But OReilly’s points stands: The artists whose work constitutes DALL-E’s training set have yet to be compensated for the value created on top of their works. When concept art and illustration can be had in seconds via text prompt, does economic incentive for people to create original works (or to share them freely online) disappear? Without novel human artworks to populate new datasets, AI systems will, over time, lose touch with a kind of ground truth. Might the next version of DALL-E be forced to cannibalize its predecessor?
To adapt, artists must imagine new approaches that subvert, advance or corrupt these new systems. In the 21st century, art will not be the exclusive domain of humans or machines but a practice of weaving together different forms of intelligence. Like any relational practice, communication produces ideas unthinkable by single individuals. When humans develop deeply responsive practices with AI, we are able to think beyond our own scope. Layered human-machine collaboration produces outputs that are always both human and posthuman. The path forward will be one of mutual exchange, making hybrids of both artists and machines. Hybridization reveals that we have always been relatives and participants in ecosystems of material and meaning.
In this vision, interdependence becomes a kind of two-way possession. The artist enters into the zone-like inner world of AI in search of hidden treasure. Through vicarious hallucination, the artist develops a mutated language. By becoming hybrid, the artist overcomes redundancy but becomes possessed by the model’s structure, adopting it as an internal map and tool for creation. Simultaneously, the AI model requires human intent to activate its lifeless virtual neurons and itself becomes possessed by the artist’s agency.
Researchers Giannis Daras and Alexandros G. Dimakis of the University of Texas at Austin have suggested that DALL-E 2 has its own internal language. In the first tweets of a thread announcing their findings, Daras declared:
DALLE-2 has a secret language. “Apoploe vesrreaitais” means birds. “Contarra ccetnxniams luryca tanniounons” means bugs or pests. The prompt: “Apoploe vesrreaitais eating Contarra ccetnxniams luryca tanniounons” gives images of birds eating bugs. (1/n)
A known limitation of DALLE-2 is that it struggles with text. For example, the prompt: “Two farmers talking about vegetables, with subtitles” gives an image that appears to have gibberish text on it. However, the text is not as random as it initially appears. . . (2/n)
We feed the text “Vicootes” from the previous image to DALLE-2. Surprisingly, we get (dishes with) vegetables! We then feed the words: “Apoploe vesrreaitars” and we get birds. It seems that the farmers are talking about birds, messing with their vegetables! (3/n)
If Daras and Dimakis are correct (which has been widely debated), this represents the movement of concepts from the model’s latent space into the minds of researchers via language, specifically the language folded and refolded into the model’s structure, language that lies dormant in the training dataset. It has been argued that the above terms bear some resemblance to binomial Latin names found in botanical and biological encyclopedia. These labels are blended with associated images to produce marks akin to written language. Yet, when feeding these back in as prompts, the hallucinated labels produce pictures that the model has previously labeled with these exact terms; the model has abstracted the text from the image and combined it in a novel way. Here, language shimmers above and around both prompt and generated image. As we feed our words and pictures into high-dimensional neural net space, we should not be surprised to find them possessed: Language is alive; it is the first layer of symbol, which we constantly weave back into reality, blending cause, effect and representation. AI’s statistical engines further this uncanny aspect of language.
What else lies hiding in neural-net space? Recently, there has been some discussion on Twitter about a gruesome demon revealed by rendering certain text prompts with negative weights (an advanced technique used with the model Midjourney). In the interest of protecting the reader from demons real and imaginary, I will refrain from naming this imagined entity. Suffice it to say that the creators of the meme seem to have cherry-picked their examples, and a certain amount of pareidolia appears to be at play. What is important is that we humans (with our yen for pattern-matching) desire such possession. Whether this is an errorprone evolutionary adaptation or a needed skill in an animated world is beyond the scope of this writing. But given our predilection for possession, we will certainly encounter other such entities in neural-net space in the future.
At its most powerful, art transmutes new technologies and cultural shadows. Through art we have processed war, famine, family, religion, desire, revolution, perspective, photography and digital media, among countless other aspects of existence. As the 21st century blossoms, the poisons of AI are cast into the air; we begin to feel the first symptoms of our posthuman state. Yet we are not helpless in treating hallucination, mutated language, hybridization and possession; the pharmakon always contains a cure. Like the microplastics that drift through even Antarctic waters, AI is now all around us. As artists and supporters of art, we must focus our creativity on uncovering the gems within our neural prosthesis—and, if we cannot find healing there, summon other, deeper sources to understand our new condition.
K Allado-Mcdowell is a writer, speaker and musician. They are the author, with GPT-3, of the books Pharmako-AI (2021) and Amor Cringe (2022) and co-editor of The Atlas of Anomalous AI (2021). They record and release music under the name Qenric. In 2016, Allado-McDowell established the Artists + Machine Intelligence program at Google AI. They are a conference speaker, educator and consultant to think tanks and institutions seeking to align their work with deeper traditions of human understanding.