At the close of Metal Gear Solid 4, just after Snake pulverises Liquid Ocelot, there is series of cutscenes that never ends. Well, that’s not strictly true. It does end – after 71 minutes – it’s just that I’ve never watched that far. I understand that the game’s director Hideo Kojima is a committed cinephile who has drawn much of his inspiration from movies, but I don’t care. Those are minutes of my life I’ll never get back.
I also don’t care for the 20-minute cinematic sequences dotted through Xenoblade Chronicles or Final Fantasy, or the seemingly hundreds of non-interactive scenes detailing every single plot point in the Assassin’s Creed adventures. It’s needlessly aggressive to rob the player of agency, then bully them into paying attention for prolonged periods. I think it’s time we retired the whole convention.
The origins of the video game cutscene are both technical and situational: in the 90s, games simply couldn’t render scenes in real time, and besides that, a lot of gaming’s narrative talent came from film, and they were using the tools they knew. This interestingly mirrors the evolution of cinema: from the 1920s to the early 1930s, narrative cinema was heavily inspired by the theatre. This made sense because the early film industry drew most of its talent – actors, directors, writers, technical crew – from the stage, and these people brought their techniques with them.
Cameras tended to be static with long takes between cuts, viewing the action like an audience member; they were filmed on purpose-built sets rather than on location; acting was somewhat mannered and histrionic, because performers were used to exaggerating their movements and emotions so that the people 18 rows back could see them. Early movie audiences were familiar with stage conventions, too, so using them helped ease them into the cinematic experience.
But as cinema advanced as its own medium, new and intimate methods of storytelling emerged. Thanks in part to the invention of the dolly and crane, the camera transformed from an audience member to a moving observer within the world. Actors realised they could communicate in tiny gestures and facial expressions. From German expressionism to French New Wave to the American auteur cinema of the 1970s, wild new narrative techniques emerged, and a whole wealth of film-specific lighting, directing, design and special effects conventions developed alongside. The medium came into its own.
This process is happening in games too – we see it in the increasingly sophisticated disciplines of environmental storytelling, UX/UI and narrative design. Yet for a medium that’s all about interactivity and immersion, we sure are clinging on hard to the cutscene. If we look at some of the biggest, most moving narrative games over the past five years – The Last of Us, God of War, Marvel’s Spider-Man – most of the emotional moments are happening in non-interactive filmic sequences, the controls taken from us. Like children, we cannot be trusted to take part. We’re required to just sit and watch the show.
The argument is that, sometimes, the emotional arc of a scene needs to be precisely timed and manufactured in order to deliver its emotional payload. In that case, we’re making the wrong sort of scenes. If a mature interactive medium can only tell emotional stories through non-interactive sequences, something is wrong. It’s frustrating because Valve made great strides on this issue 25 years ago: the narrative sci-fi shooter Half-Life contained no cutscenes or cinematic sequences at all. Characters (the scientists and security guards of the Black Mesa facility) delivered exposition in-game while you explored, and at the same time the increasingly unstable environment told its own story of destruction and suspense. Valve did it again a decade later with the Portal games, combining an entertainingly talkative robot antagonist with a world in which signs, symbols and audio announcements communicated every rule and background detail the player needed to know to become intellectually and emotionally engaged.
Game designer Fumito Ueda made very sparing use of cutscenes in his classic adventures Ico and Shadow of the Colossus, instead drawing us into oblique, mysterious worlds where the very lack of information inspired players to create their own mythologies. The 2012 masterpiece Journey by indie studio thatgamecompany gave us mute characters in a desert wasteland, but still moved many thousands of players to tears. Campo Santo’s game Firewatch crafted a rich mystery from the Wyoming wilderness and a disembodied voice on a walkie-talkie.
In our era of near photographic in-game realism, the reliance on cutscenes for dramatic and cathartic effect feels even more discordant and alienating. We get to explore and exist in worlds of great clarity, surrounded by characters able to communicate a range of emotions through a combination of performance capture, state-of-the-art AI and physics – that’s more than enough. These are dynamic, immersive worlds: if we as players can control weapons, vehicles and progression systems of great sophistication, we can participate in stories.
Or we can simply allow narrative to exist in the background as something we live through or experience vicariously: the interactive version of direct cinema. The works of FromSoftware provide an excellent example of this. There are cutscenes, but they’re short and usually serve to introduce a new enemy or show the player a moment in which the world has reacted to them. Otherwise, the narrative is conjured simply by moving through these wild gothic landscapes. As the writer and historian Holly Nielsen expressed on X recently, “I’ve put about 300 hours into Elden Ring. I could not tell you a single thing about the world, characters, or story other than vague vibes.”
A few years ago I interviewed Todd Howard, the head of Bethesda Game Studios and asked him what he thought was the most important part of telling a story in a video game. “You must look for tone,” he said after a long pause. “We look at old John Ford movies a lot – we study how to capture a space. Ford’s shots put you in a certain mood. There is a tone. As a designer, you have to know how you want the player to feel. Look for things outside of games that have that tone and just stare at them.” Yes, this is an example drawn from film yet again, but Howard isn’t talking about the story of The Searchers or Rio Grande, he’s talking about the feel of the spaces Ford created.
Tone. Vibes. Feel. These are different words for the same concept, and they are perhaps the foundations of a post-cinematic theory of mainstream game narrative. In an immersive environment, story is something the player walks into rather than watches, a space of discovery rather than performance, a playground not a theatre. It should be widely – and wildly – interpretative, and maybe even entirely optional or subliminal. If taking control from the player happens at all, it should be a radical moment sparingly employed, like turning the camera away, or plunging the stage into darkness.
The cinematic cutscene is a tyrannical imposter. The time has come to eject it.