Can AI do Establishing Shots?
Using Midjourney for establishing shots - architecture, cityscapes, and environment
In this issue of The Brief, we embark on an artistic and evaluative journey that attempts to use AI to capture the essence of Aegis and its majestic heart as part of the Shepard’s Tone film - the Lunaris Spire. This particular issue will present that results first, and then at the bottom - scroll until you see SKIP TO HERE, I’ll go through a recap of using AI for Establishing Shots.
The Lunaris Spire
A decade after Serak's revelation, the "Pinnacle of Aegis", the Lunaris Spire, was built. Designed by world-class architects, it's reminiscent of the Tower of Babel. Made from Luminalis Mountains' Lumina stone, it took fifty years to complete. Standing at 1,200 feet, it's both storm-resistant and engraved with ancient tales and Serak's lessons. The Spire is encircled by a maze of stone favelas housing Seraphim workers. However, only the elite enter the Spire. Beside it lies the "Abyss of Serak", contrasting its grandeur. The Spire's bronze doors signify the sacred knowledge within.
The Abyss of Serak
The Abyss of Serak, often whispered about in hushed tones, is said to be a portal to a bygone era. As one approaches its edge, the sheer depth gives an illusion of infinity, as if it tunnels straight into the core of the world. Staring into its expanse, one sees swirling mists of deep blues and purples, hiding the mysteries that lay beneath. It emanates a peculiar scent, a mixture of damp moss and ozone, reminiscent of a storm about to break. The cool air that rises from the abyss carries with it whispers of the past, making the skin prickle with a blend of dread and curiosity. The legend goes that the abyss was formed when a celestial entity, angry with the hubris of ancient civilizations, thrust its arm into the land, creating this endless pit. Over time, the anger of the entity faded, but the abyss remained, serving as a constant reminder of the powers beyond human comprehension. Every visitor to the Abyss of Serak walks away with an unsettling feeling of insignificance in the vast timeline of the universe and a heightened reverence for the forces that sculpted such a mysterious chasm.
The Lunaris Stairs
The Lunaris Stairs, a marvel in themselves, serve as the grand approach to the towering Lunaris Spire. Crafted from the same shimmering Lumina stone as the Spire, each step glows with a muted radiance, lending an ethereal quality to the ascent. The steps, broad and shallow, stretch in a gentle, sweeping arc across the landscape, and their polished surfaces reflect the skies and the city of Aegis around them. Intricately carved patterns, similar to the engravings on the Spire, run along the edges of each stair, narrating the legends of Aegis in delicate relief. But it's not just their beauty that captures the heart of every Aegisian; it's their historical significance. It was upon these very steps that the "Pact of Unity" was declared, uniting the fragmented tribes of Aegis into one harmonious city. Every year, a grand procession commemorates this pivotal moment, ascending the Lunaris Stairs to pay homage to the foresight and unity of their ancestors. This ritual reinforces the importance of the Lunaris Stairs, not just as a physical path to the Spire, but as a symbolic bridge to Aegis's past and the promises of its future.
The Entrance the the Lunaris Spire
The entrance to the Lunaris Spire is nothing short of majestic. Dominating the threshold is the "Circle of Celestials," a grand carving embedded into the very heart of the stone floor. According to ancient lore, this intricate design was engraved by Serak himself, using a single Lumina crystal under the guidance of a celestial vision. The carving portrays intertwining constellations, each representing a fundamental value of Aegis, reminding all entrants of the virtues that uphold the city. Encircling this celestial mosaic are tall, imposing stone monoliths, each alight with a roaring, eternal fire. These Flames of Perpetuity not only illuminate the entrance but are believed to be a gift from the gods, providing warmth and a guiding light.
As one approaches the entrance, the aromatic scent of burning cedarwood and a hint of frankincense from the fires envelops them, evoking feelings of reverence and mystique. The gentle hum of whispered prayers, echoing from within the Spire, reaches the ears of visitors, accompanied by the distant, melodious chimes of wind-bells. The combined ambiance — the Circle of Celestials underfoot, the protective ring of fire-lit monoliths, and the rich tapestry of sounds and scents — instills a profound sense of awe and sanctity, making every step into the Spire feel like a passage into a world steeped in both history and divinity.
The Celestial Hall
The Celestial Hall, nestled within the Lunaris Spire, is the spiritual and political epicenter of Aegis, renowned for hosting pivotal events and meetings of the 12 disciples. Influenced by ancient basilica designs, the hall is adorned with luminescent marble walls that recount Aegis's creation and the Seraphim's divine role. Its dome is graced with celestial paintings by Elandrial, narrating the universe's creation and Aegis's birth, while at its core sits a massive sapphire table, symbolic of unity. Opened to the public annually on the Day of Luminance, the hall becomes a hub of celebration, reflecting the deep bond between the Seraphim and Aegis's inhabitants.
The Crypt of Serak and The Dark Nexus
Tucked beneath the shadow of the Lunaris Spire, the Crypt of Serak is a chilling descent into history and horror. Its arched entryway, framed in tarnished silver, gives way to a vast expanse of cold stone floors and walls dripping with moss and condensation. As you step over the threshold, a metallic taste, akin to blood, invades your mouth, while the acrid scent of mold and the sterile coldness of long-sealed chambers assail your nostrils. The barely audible hum of The Dark Nexus resonates in the distance, a low-frequency vibration that one can feel deep in their bones, setting nerves on edge.
On either side of the entrance, the formidable stone Dire Wolves stand guard, their genetically-engineered eyes scanning for intruders with a ferocity that's palpable. Along the winding passageways, ghastly visages of failed genetic undertakings peer out from behind thick glass enclosures, a grim testament to the lengths the Seraphim went in their quest for advancement. The heart of the crypt houses the ornate sarcophagus of Serak, set on a raised dais, its surroundings lit only by the unsettling, pulsating glow from The Dark Nexus. This AI, contained within a crystalline prism, casts shadows that seem to dance and writhe on the walls, each movement stirring feelings of trepidation and awe. The atmosphere in the Crypt of Serak is one of reverence and dread, a stark reminder of the Seraphim's pursuit of power and the costs of their ambition.
The Shepard’s Tone Bell
Note this is a rewrite: The Lunaris Spire, majestic in its stance, is more than just an architectural marvel; it embodies the essence of the Shepard's Tone Bell. Made from advanced polymers and resonant alloys, a testament to the Seraphim's relentless pursuit of innovation, the Spire gleams with iridescent colors, a beacon in the heart of Aegis. When the Spire sounds its call, the ever-ascending, infinite note of the Shepard's Tone resonates, reaching every corner of the land. This sound, both haunting and divine, deeply moves all who hear it, guiding them into a state of profound introspection. Alongside this celestial symphony, the stone labyrinth of favelas that surrounds the Spire awakens, glowing a mesmerizing iridescent orange, making the experience even more enchanting. This illuminating phenomenon not only unites the people of Aegis in shared prayer and reflection but also stands as a vigilant sentinel, its resonating alarm alerting of potential threats. In the Lunaris Spire, the duality of faith and innovation converge, revealing the delicate dance between time-honored traditions and the ever-evolving needs of a dynamic society.
SKIP TO HERE — Recap and Assessment
So for those of you who are following along, I wanted to share my take on the experience of using Midjourney to do the concept art for Shepard’s Tone. Overall, it’s a game changer, but its pros and cons come about in curious ways you might not expect. It’s strengths were a little different than I imagined, some of my assumed weaknesses were false, and many of my assumed positives were false. In short, I learned a lot - and I’m glad I went through this process.
A THEME TO REMEMBER. If anyone tries to give you advice about using AI for concept art and they haven’t done in for a real production do one thing - ignore what they have to say - because the reality of doing it produces much different conclusions that imagining the doing of it.
Here are the positives:
Could I manifest my vision? In short, yes, but not without hurdles. Initially, I'd hoped for 50% alignment with my concept, expecting Midjourney's unpredictability for the rest. At the start, I struggled with Midjourney's over-the-top creations. However, after numerous trials and studying AI workflows on Youtube, I saw vast improvement. Impressively, I captured 99% of my imagined details. This breakthrough suggests generative AI art can enhance communication between film teams, reducing back-and-forths. Having photorealistic art early on also identifies potential production challenges, like my depicted clouds and dust, which may be tough to replicate in Unreal Engine.
Could I nail the scene setup and camera angle? Yes, it was challenging. Many believe scene blocking and camera angles with Generative AI are impractical, but I discovered a solution for extreme wide shots and close-ups. In short, you choose a "foundation image" with the desired blocking, then use prompts to detail the scene. For distant or close-up shots, this was 99% effective. However, it was time-consuming to find the right foundational image. I foresee databases, perhaps from shot list services, simplifying this. Two concerns persist: 1) Mid-range shots aren't effective with this method, and 2) Does using another's image for blocking infringe on their intellectual property?
Does is look natural? Does it appear authentic or reminiscent of an HR Giger Robot Model? I'm genuinely impressed. Observing many Midjourney creations, there's a distinct AI-generated "look". I aimed for personal vision, not just appealing randomness. Could I direct Midjourney to match my vision rather than generating serendipitous outputs? The answer is "Yes". It demands effort, but I firmly believe one can achieve their exact vision, provided they're prepared for countless attempts.
Did I need to steal some else’s style? No way! And I'm genuinely relieved. I entered this process expecting to steer the AI with prompts like "in the style of...[artist/movie]". Yet, the reality was different. After navigating through thousands of trials, errors, and experimenting with various workflows, I've decided against using such prompts in the future. They either added nothing, mirrored the style too closely making it seem like a knockoff, or diverged from my vision. While this addresses the "prompting as copying" concern, the "training as copying" issue persists.
Could I achieve consistency? Yes-ish… and… Depends. To those tuned into generative art, the cry of consistency being "unattainable" is all too familiar. Initially, achieving a consistent vibe felt like a Herculean task. However, with time and aided by evolving tech, my pieces began to exhibit impressive uniformity. It's crucial, though, to tread lightly on assumptions here. Throughout my journey, both Midjourney and I were on simultaneous growth trajectories. Given the rapid advancements in this field, I won't be shocked if consistency woes become obsolete within a year. And just a heads-up, as of now, Adobe has rolled out a new screen blocking tech for Firefly, its answer to Midjourney but without drawing from non-sanctioned artist IP.
Could I upscale to 4K and beyond? I thought that since the image size max of Midjourney, when I started, was only 1080p that I wouldn't be excited by the fidelity. But as I was doing this exercise I found Topaz Labs AI which can upscale to 4K easily and 8K with my super powerful PC. Then, within the last week, Midjourney added upscale to 4K for pro users. It's not as good as Topaz Labs AI, but it's good enough for concept art work.
Workload and Satisfaction: , I thought that at the end of this exercise (meaning creating the concept art for the Lunaris Spire) I would be exhausted because of the annoyingness of prompting. And while it was annoying to do what ended up being 1,000s of iterations inside Midjourney, being able to see the end results, and use those end results to communicate my idea outweighed any shit work that was necessary to get there.
And the negatives:
Overall, I think that Midjourney is a very effective tool for a writer/director to communicate the imagery that’s in their head - and that alone is of huge value. But, on the flip side, I would be cautious about anyone who advises you that AI is a simple fix with revolutionary overnight changes. Functionally, it’s a different animal than the workflows of traditional concept art - and therefore it needs its own workflows, it will need solutions to its unique sets of challenges, and it will require a workforce with difference skills.
Number of Iterations: When I began, I optimistically anticipated that I would need only 10 to 20 attempts to produce an image suitable for sharing on Substack. However, reality painted a different picture. On average, it demanded between 100 to 500 iterations to achieve the desired image quality. There were several reasons for this discrepancy. While part of it was attributed to my learning curve, even when that was accounted for, the challenges remained. Utilizing Midjourney to create the envisaged image can be likened to guiding a teenager through their homework. Achieving the right balance between scene blocking, capturing the ideal shot, coordinating subjects, and ensuring impeccable cinematography necessitated an extensive process of experimentation and refinement. What this means is that if you are using generative AI a real film, where precision matters and has direct impacts on budgets, timelines, and resource requirements, it’s not as easy as build a cool prompt and then you are done. It requires potentially 500 iterations for each rough draft. This means one of two things, either the writer/director themselves will do this (which is unlikely amongst the current generation of writers/directors, or we are back to the back and forth that is required in traditional concept artistry. That beings said, I think that it’s highly likely that the next generation of AI-native writers/directors, will do this themselves so that they only need to do this 500 try > draft workflow once.
Camera Shots and Foundation Images: Getting the camera shot (e.g. the angle, the blocking, etc) that I wanted was... well different than I thought. The key to getting the shot you want, they way you want it, requires finding a "foundation image" that has the shot angle you want. That means, finding an image on the Internet that has the blocking and shot you need, and then you've got to get your subjects into that blocked scene. So when you see that first image of the Lunaris Spire, I had to find an image of a tall building to use as the blocking, and then I used prompt descriptions (see the technique I described in my last issue). In the case of the Lunaris Spire, my foundation images included the Tower of Babel, the Empire State Building, and the Devil's Tower. And it's important to state that to find just the right blocking and shot for the foundational images took me the better part of 3 hours for each image. Also, notice that I tried to use real objects for the establishing foundation images - I felt that using real buildings wasn't stealing, or real historical buildings - but... I would have to say that someone took those shots, so even thought I'm not using their shot of the Empire State Building itself, I am using their blocking and shot composition - so is that ok? The rest, which included the labyrinth of stone favelas, had no foundation image, that was accomplished with prompt descriptions. So what’s my take on all of this - This means that for films something super cool can happen. A director can put together a showcase of the camera angles, shot types, and scene blocking examples together - either by grabbing them from a shot deck service, hand drawing them, or finding them on the Internet with a Google search - then use them to quickly emulate the shots they want.
Once I got one good image could I scale quickly. I thought that once I had one good image for something, creating more images from different angles or from different distances would be significantly easier - e.g. once I had the establishing image of the Spire of Lunaris, see that spire from multiple directions and seeing shots from different distances of the same subject would be 100 times faster. It wasn't, I would say that once I had my first good image of a building or scene, it was in more like a two times faster production process to create additional images of the original from different angles, distances, or blocking.
Enjoyment and Artistic Satisfaction: I thought I would enjoy this much more than I did. As a musician and artist, I found the process of prompting, the process of finding foundational images, and the trial and error about as inspiring as taking an accounting class. It isn't art it's AI. I absolutely cannot imagine a concept artist enjoying this process. I was able to do this because this excercise is paramount for me and creating Shepard's Tone creates enough of an incentive for me to do anything necessary - but for artists who have a career in creating concept art, I call bullshit on anyone who says "well just learn how to prompt and become a promptologist" because I think that artists simply want to paint, not prompt.
Resource Optimization: I thought that I would likely be reporting that we will see 100 X reduction in resources, time, and budget required to create concept art. What I now think is the likely case is that the time and budget will likely stay the same, but the expectations of the number of images, the quality of images, and the breadth of image coverage will be 1000X of life before AI.
That concludes our current segment. Up next, we'll delve into the interior perspectives of the Lunaris Spire—taking you through distant, mid-range, and intimate views—and the conceptual representations of the Seraphim and the Ostinatum. For each, I’ll be sharing the “how I did it…” part and my reactions as to the efficacy of AI for using it to do concept art in the filmmaking process. Additionally, once we've wrapped up the concept art phase, I'll present a comprehensive breakdown, covering all imagery, costs, time, and resources, simulating a movie budgeting scenario and compare that to traditional concept art budgets and results.
Quick preview, there are some truly revealing insights that I'm eager to share.
What is The Brief and Who should read it?
I release a weekly digest every Friday, tailored for professionals ranging from executives to writers, directors, cinematographers, editors, and anyone actively involved in the film and television domain. This briefing offers a comprehensive yet accessible perspective on the convergence of technology and its implications for the movie and TV industry. It serves as an efficient gateway to understanding the nexus between Hollywood and Silicon Valley.
Who am I?
I'm Steve Newcomb. Functionally, I’m a recovering Silicon Valley founder that is finally old enough to have a bit of care. I’m perhaps most recognized for founding Powerset— it was the largest AI and machine learning project in the world when I founded it. It was later acquired by Microsoft and transformed into something you might recognize today - Microsoft Bing. Beyond Bing, I had the privilege of being on the pioneering team that witnessed the inaugural email sent via a mobile device. My journey also led me to SRI (Stanford Research Institute), where we laid the groundwork for contemporary speech recognition technology. Additionally, I was a co-founder of the debut company to introduce a 3D physics engine in Javascript. I've held positions on the board of directors and contributed funding to massive open source initiatives like NodeJS and even the largest such project, jQuery. My experience extends to academia, having been a senior fellow at the University of California, Berkeley's engineering and business faculties. Recently, I ventured into Layer 2 internet protocols and assisted a company named Matter Labs in securing $440 million in funding to bolster their endeavors.
What am I doing besides writing these posts?
Typically, I allocate a year between groundbreaking ventures. My exploration for the upcoming project commenced in May 2023, and the sole certainty is its nexus with the film, television, SMURF, and AI domains. Sharing insights on my research endeavors helps me discern between feasible prospects and mere illusions. My hope is that for this venture, I appropriately consider the ethical and sociological repercussions.
If you are interested in contacting me, being interviewed, being helped, or yelling at me, my email is steve.e.newcomb@gmail.com.