A whole new way to create films
How AI is helping me iterate fully develop, test, and iterate scenes on a daily basis
For everyone who is following along with the Shepard’s Tone project - THANK YOU! Your support and kind words are truly helping to fuel my efforts. I’d like to take a time out from talking about Shepard’s Tone as a film and instead share with you three insights I’ve had as a result of conducting this experiment.
For each, I’d like to share how I’m connecting the advent of technologies, workflows, and processes that already occurred in Silicon Valley and apply them to filmmaking.
The concept of CICD as it applies to filmmaking
The concept of Lossy Transmission as it applies to filmmaking
Defining the Uni-Cycle scene of filmmaking
When Silicon Valley moved from Waterfall to CICD
The Waterfall development process, conceptualized in 1970 by Winston W. Royce, is a sequential design framework often used in software development. It is characterized by a linear and orderly approach, where each phase (such as conception, initiation, analysis, design, construction, testing, and maintenance) must be completed before the next begins. This method was dominant in software engineering throughout the 1970s and 1980s, and its structured nature made it suitable for projects with well-defined requirements and deliverables.
The Waterfall approach, while seemingly structured and straightforward, has several pitfalls that can lead to failures in project management, particularly in industries where flexibility and adaptability are crucial. Specifically, it was found that the Waterfall approach created massive inefficiencies when it came to creative thinking.
Here are some of its key shortcomings, along with anecdotal examples:
Rigidity in Planning and Execution - Waterfall's linear nature means that once a phase is completed, it’s difficult to go back and make changes. For instance, in software development, if a flaw is found in the design phase after the development phase has begun, it can be costly and time-consuming to return to the design phase.
Poor Response to Changing Requirements - Waterfall struggles with adapting to changes in project requirements. An example is seen in construction projects, where client requirements can change during the build. The Waterfall model's inflexibility can lead to significant delays and budget overruns in such scenarios.
Delayed Testing and Integration - In Waterfall, testing only occurs after the build phase, which can lead to the late discovery of issues. A notable example is in the video game industry, where games developed under Waterfall have encountered critical bugs late in the development cycle, leading to rushed fixes, poor quality releases, or significant delays.
Limited User Feedback Integration - Since Waterfall doesn’t easily accommodate changes once a phase is completed, it's challenging to integrate user feedback until the later stages. This was evident in some early software products, where user feedback received late in the development process led to products that didn’t fully meet user needs.
Overemphasis on Documentation - Waterfall often requires extensive documentation before any actual work begins, which can be time-consuming and may not always contribute directly to the end product. In various government IT projects, this has led to lengthy planning phases with no tangible progress in development.
Inefficient Use of Resources -The sequential nature can lead to resource idleness. For example, in a Waterfall film production, the editing team remains idle until filming is completed, leading to inefficient use of resources and time.
Difficulty in Estimating Time and Costs Accurately - Waterfall projects, such as large-scale engineering projects, have often suffered from inaccurate time and cost estimations due to their inability to adapt to unforeseen challenges that arise during the project.
Sound familiar?
In filmmaking, the traditional methods most often use highly resemble the Waterfall approach. Thus, the process is generally linear, starting from scriptwriting, followed by pre-production (including storyboarding and shot listing), production (shooting), and post-production (editing and distribution). Each stage is distinct and usually completed before moving to the next. An example is the making of classic films where scripts were finalized before shooting, and editing only commenced post-production, allowing little room for iteration based on feedback during the process.
But we can learn from Silicon Valley, because in the early 90’s a group of software engineers got together, having seen the problems with sequential, silo’d, production, and published a new proposed process - then the world of software development changed forever - and boom! the Internet revolution → remember that?
Let’s start with a bit of historical context. CI/CD, or Continuous Integration/Continuous Deployment was introduced by Grady Booch in his 1991 book "Object-Oriented Analysis and Design," and it gained prominence with the rise of Agile methodologies in the early 2000s.
Agile Methodology, created in 2001 by 17 software developers including Ken Schwaber, Jeff Sutherland, and Jim Highsmith, is a flexible approach to iterative development of fully delivered products that emphasizes adaptive planning, early delivery, and continual improvement, through collaborative, cross-functional teams - as opposed to Waterfall, which is based on completing each sequential step one by one without iteration on a final product until that last step in the sequence is complete.
CI/CD emphasizes incremental changes to finished products - frequent testing, and quick releases, allowing for more flexibility and adaptability in the software development process. It facilitates ongoing automation and continuous monitoring throughout the lifecycle of an application, from integration and testing phases to delivery and deployment.
The CI/CD methodology in software engineering can be metaphorically compared to the concept of "building a unicycle first, then testing and improving in an interactive cycle," eventually leading to the creation of a bicycle, tricycle, and ultimately a Ferrari - as opposed to Waterfall which design, plans, codes, then builds a space shuttle.
So I thought what if AI is what enables us to apply CICD methods to filmmaking?
Rather than treating the final film as a space shuttle and trying write the perfect script, and then laboriously create the perfect storyboard, then the perfect shot list, and the shoot it perfectly, and then hand a pile of stuff to an editor and hope for the best - why not us the CICD process and start with a uni-cycle?
Imagine a world in which we create entire filmed scenes on a daily basis, get all stakeholders in a room and then critique those iterations.
There are soooooooo many advantages of this approach. Here’s some big ones.
In existing filmmaking process, you only get your location and your actors for a scene once - you have to do your best to get scene coverage and then hand what you got to an editor - but what if you could shoot with digital characters and digital scenes every day at no extra charge?
How do you know if your scene works? In existing filmmaking process it’s a function of “trust me”, but in an AI world of CICD you can test it, get data, and know it works.
Everyone sits at the table together looking at a final produce - e.g. everyone has the same thing in their head that they are critiquing - more on this one next up.
Do not confuse CI/CD with the process of seeing and critiquing “dailies” and dailies are iterating only on one part of the sequence, but not the whole.
Lossy Transmission
Another relevant concept to discuss is that of lossy transmission. Lossy transmission, in a general sense, refers to the degradation or alteration of information as it passes through various stages or mediums. The more stages, the more mediums, the bigger the team - the worse the lossy transmission.
In traditional filmmaking, this lossy transmission occurs at nearly every step.
Is a script is an accurate means to convey what’s in the mind’s eye of the writer?
Ditto for a storyboard and a director
Ditto for a concept artist and a director
Ditto for a shot list and a cinematographer
And we have nothing for editors until they edit
But do we need these stages, mediums, and this workflow in the world of AI? What if the participants in making the film go from 100s, to 10, or even to one? And equally as important, what if AI, new workflows, and other technologies enable CI/CD instead of Waterfall methodologies. Could we collapse the stages, the mediums, and the workflows from many to one - thus eliminating lossy transmission.
KEY INSIGHT: The perfect medium to minimize lossy transmission is an integration of the SCENE ITSELF, every thing else is an ersatz representation of that scene and will cause lossy transmission of ideas. AI, new workflows, and other technologies, enable using a iteration of a completed scene as the atomic element of discussion.
In short, we can, using AI, create a “unicycle” of a completed scene every day. Then, bring together the stakeholders, writers, directors, actors, cinematographers, and editors in one room and review and critique the scene with zero loss in transmission.
Then, using the principles of CI/CD iterate until that scene becomes a bi-cycle, tri-cycle, and ultimately the space shuttle we want.
The Unicycle for Filmmaking
In the film Shepard’s Tone, I encompass the roles of writer, actor(s), director, cinematographer, and editor, AI acts as a multifaceted assistant, VFX team, and producer. So I set out to create a process by which I could create a new workflow.
My goal was to define the minimum viable pre-visualization that would result in the least lossy transmission.
And that’s when things got weird…
I started with “well… what’s actually possible, and where can I move super fast?” The idea - get to an MVP for cheap and fast, find product market fit, then expend time, energy, and capital to expand.
And the answer there was interesting. I started with Auditory first.
A movie, if you think about it, is half visual and half auditory. What if I created the things we hear first? The dialogue, the sound effects, the foley, and the music - and then used AI to back into the visuals. I know it sounds weird, but why not start with what we hear - especially if it is something I could create daily and was sufficient to eliminate lossy transmission, evaluate, and iterate.
But I only had one constraint - voice AI sucks… And that’s just when Eleven Labs announced it’s new voice to voice AI - where a voice actor could act any scene and AI would translate it to any other voice with the same intonations.
BOOM!
So I set out to create a new filmmaking process that utilized CI/CD, eliminated lossy transmission, and went from uni-cycle, to bi-cycle, and eventually space shuttle.
Step 1. Get my uni-cycle built.
Write a scene (know that it will suck at first, but iterate, iterate, iterate)
Record the dialogue for the characters of the scene and have AI translate that into the voices of the characters (special note: I took some voice acting classes and I’m horrible, but perhaps good enough to get the point of the scene across. It’s a big stretch to have me act as MOTHER the high priestess - but you can judge for yourself soon enough)
Get the foley perfect for the scene (this was fun )
Get any sound effects blending in appropriately (this was fun)
Write the music for the scene (this was challenging but very fulfilling)
Then, iterate daily on my uni-cycle. Once auditory part of the scene was just right… I went on to the adding photo stills created in Midjourney for each shot in the shot list.
I write in the directing and cinematography directions in the script not abiding by any screenplay writing formats that were taught to me.
I create a storyboard/shot list that exactly matches the screenplay using Midjourney
I insert the still images of each of the shot lists into a video edit with the sound, dialogue, foley, sound effects, and music.
Then, iterate daily on my bi-cycle. Once I got stills + auditory done, I moved to replacing the still images with animated images using Pika Labs.
For scenes that are short - 5 seconds or less, I upload the MidJourney shot into Pika Labs and direct the camera movement for the scene - I ignore bad lip-syncing for now.
For scenes longer that 5 seconds - I go to Unreal Engine and block out the scene and cinematography using white boxes - then I take the video recorded with white box composition, inject it into Pika Labs and then ask Pika to skin the scene with materials, lighting, etc that I want.
Then I insert the moving images into the shot list with the audio tracks.
Then, I iterate daily on my tri-cycle. Once I have the animatic to where I want it, I then plan to expend more capital, time, and money and move on to actually shooting the scenes with reel actors on a volume.
For CGI scenes - send the roughs over to a VFX partner and have them build them out for finals - over time I think this evolves into AI auto generating the scenes for us - but for now, this will likely be required.
Hire real actors and put them on a volume, build out real sets for the volume and shot scenes with real actors. I think that real people should always be the actors in a movie - I have trouble rooting for an AI character - I think AI is perfect for drafts - but I personally want the actor to bring their own mojo to the scene.
Hire a compositing team to complete any clean up and final compositing. Again, I think that perhaps in the long run this will be completely AI, but for the mid-term - e.g. next five years I think for top quality feature films this will be required.
Conclusion
This process is far from mature, and far from perfect, but I what I like about it is that it starts out humble and improves quickly - and does so using methods that I’m deeply familiar with and I know produce quantifiable and predictable results. I could even imagining a burn down bug report, using a GitHub repo to manage pull requests - blah blah blah technical stuff, but yeah - basically using all of the well tuned, and well tested, methods of software development on filmmaking.
I also like this process because it enables me to know if a scene is working cheaply and quickly. It only requires significant capital, time, and effort once you know have something that’s working. Before you know it boom! - we have a group of scenes in a sequence that represents a 10 minute short to pitch the film.
—
I can’t wait to show you what I’ve done - it’s fascinating, addictive, and prevents sleep - but it’s a ton of fun.
A THEME TO REMEMBER. No amount of technology will ensure that a film has meaning, is done well, and causes humanity to stop for a moment and think about something.
Only humans can do this, and for me it requires three things.
First, conscientiousness- Conscientiousness, in the context of filmmaking and music composing, refers to the meticulous attention to detail, dedication to quality, and a disciplined approach to the creative process, ensuring that every aspect of the work aligns with the intended artistic vision.
Second, tenacity - Tenacity in filmmaking and music composing refers to the persistent and determined pursuit of creative vision and goals, often in the face of challenges and setbacks.
And lastly, heart - Heart in filmmaking and music is all about being a person with depth and meaning in their life and a desire to translate those experiences into the auditory and visual medium we call filmmaking.
That’s all for now - talk soon.
What is The Brief and Who should read it?
I release a weekly digest every Friday, tailored for professionals ranging from executives to writers, directors, cinematographers, editors, and anyone actively involved in the film and television domain. This briefing offers a comprehensive yet accessible perspective on the convergence of technology and its implications for the movie and TV industry. It serves as an efficient gateway to understanding the nexus between Hollywood and Silicon Valley.
Who am I?
I'm Steve Newcomb. Functionally, I’m a recovering Silicon Valley founder that is finally old enough to have a bit of care. I’m perhaps most recognized for founding Powerset— it was the largest AI and machine learning project in the world when I founded it. It was later acquired by Microsoft and transformed into something you might recognize today - Microsoft Bing. Beyond Bing, I had the privilege of being on the pioneering team that witnessed the inaugural email sent via a mobile device. My journey also led me to SRI (Stanford Research Institute), where we laid the groundwork for contemporary speech recognition technology. Additionally, I was a co-founder of the debut company to introduce a 3D physics engine in Javascript. I've held positions on the board of directors and contributed funding to massive open source initiatives like NodeJS and even the largest such project, jQuery. My experience extends to academia, having been a senior fellow at the University of California, Berkeley's engineering and business faculties. Recently, I ventured into Layer 2 internet protocols and assisted a company named Matter Labs in securing $440 million in funding to bolster their endeavors.
What am I doing besides writing these posts?
Typically, I allocate a year between groundbreaking ventures. My exploration for the upcoming project commenced in May 2023, and the sole certainty is its nexus with the film, television, SMURF, and AI domains. Sharing insights on my research endeavors helps me discern between feasible prospects and mere illusions. My hope is that for this venture, I appropriately consider the ethical and sociological repercussions.
If you are interested in contacting me, being interviewed, being helped, or yelling at me, my email is steve.e.newcomb@gmail.com.