When we talk about closed captioning–what it is, why it is important, the audiences it serves–we tend to assume a straightforward process of transcribing sounds for viewers who are deaf or hard of hearing. This process begins after the video is completed. The captioner transcribes the work by copying the speech sounds into a written form. Meaning is assumed to be transparent, inherent in the sounds themselves. Transcription is likewise assumed to be a self-evident practice. In the college classroom, students are instructed to caption their own multimodal video assignments using basic tools such as Amara.org. Beyond the classroom and the do-it-yourself (DIY) context, third-party vendors provide captioning and transcription services to institutions within a context and economics of disability accommodation. For both the DIY captioner and the third-party vendor, captioning is an add-on, removed from the creative and intellectual work of video production.
For the professional captioner, complexity enters into this process in the form of style guidelines about accuracy, timing and reading speeds, line breaks and caption shape, visual design (type, size, color, case, contrast), speaker identification, screen positioning, and the formatting of nonspeech captions (e.g. see The Captioning Key). Quality is typically understood in terms of decontextualized rules, with a heavy emphasis on surface level styling and formatting. These rules, and the “practices and processes associated with CC” more broadly, “have not changed within the past 30 years” (Udo and Fels 2010, 211). Case in point: upper case styling for prerecorded television captioning has persisted and even flourished despite advances in captioning technology and strong arguments for mixed case captions (p. 211).
Outside of professional contexts, captioning is understood to be a simple but time-consuming act, extraneous to the creative process, of writing down what’s being said. It is equated with unreflective, mindless transcription, so easy a machine can do it. Can we open up closed captioning to greater complexity? Can we upend practices and assumptions that haven’t changed in thirty years? What would it mean to bake captioning into our video productions and pedagogies instead of treating it only as an add-on, afterthought, simple legal requirement, or technical problem? Can we narrate a different coming out story for captioning, one that is not leveraged solely on the claim that captioning deserves greater attention because nondisabled audiences have (finally) discovered its value?
Captioning has been closed off and closeted for a long time. When closed captioning debuted in 1980, it addressed the problem of how to make television accessible to deaf audiences without making it “unpalatable to the hearing” (Downey 2008, 55). Captioning has been hidden away, at least in the mainstream, and marginalized as a stripped-down copy intended for a minority audience. In the humanities, scholars haven’t paused to pay attention to captioning as a valued resource of meanings in its own right, a rich text to be interpreted and critiqued.
My own research (see Reading Sounds) has sought to disrupt the taken-for-granted simplicity and transparency of captioning by presenting it as a rich and contested site of meaning-making. In keeping with the subject of this blog carnival, I want to consider–very tentatively–whether crip theory’s radical edge can further unsettle some of our entrenched assumptions. According to Robert McRuer (2006: 35), “Crip theory questions–or takes a sledgehammer to–that which has been concretized.” McRuer (2006: 30) defines a severe critique as similar to “the critically queer work of fabulous”:
Severe, though less common than fabulous, has a similar queer history: a severe critique is a fierce critique, a defiant critique, one that thoroughly and carefully reads a situation–and I mean reading in the street sense of loudly calling out the inadequacies of a given situation, person, text, or ideology.
Disability studies and queer theory generate what Carrie Sandahl (2003: 25) calls a “productive reciprocity.” Crip theory is located at the intersection of these disciplines. In particular, both share a “radical stance toward concepts of normalcy; both argue adamantly against the compulsion to observe norms of all kinds (corporeal, mental, sexual, social, cultural, subcultural, etc.)” (Sandahl 2003: 26). “Both ‘cripping’ and ‘queering,’ as interpretative strategies, spin mainstream representations or practices to reveal dominant assumptions and exclusionary effects” (Lewis 2015, 47).
How can we dance on crip theory’s radical edge, chipping away at the concretized assumptions, values, and norms of captioning?
Experiments with animated captions
Visually speaking, letterforms in captioning tend to be uninspiring and aesthetically lifeless. They are often dumped onto the screen in bottom-centered alignments. Captioning guidelines tend to be simplistic and decontextualized. Despite the technical constraints that currently limit visual design options to simple text, a very small number of studies have explored the potential for closed captioning to animate meaning and create a richer, more accessible experience. In these studies, captions begin to come alive with movement, effects, typography, dimensionality, color, and even avatars and icons to accompany speaker identifiers (see Rashid et al. 2006; Rashid et al. 2008; Vy 2012; Vy and Fels 2009; Vy et al. 2008). Animated captions offer an alternative in which the dynamic presentation of meaning–a fusion of form and content–can potentially enhance the experience without either sacrificing clarity or giving way to over-produced and over-designed caption tracks that intrude more than inform. Put another way, animated captioning and kinetic typography hold the promise of embodying meaning and “‘giving life’ to texts” (Strapparava et al. 2007, 1724).
My ongoing experiments using Adobe After Effects to create dynamic, open-captioned alternatives to standard captions suggest the potential for kinetic captions to become integral and baked-in rather than incidental. Captioning becomes folded into the creative process. I’m especially interested in using the resources of visual design to distinguish multiple speakers in the same scene, to signal sonic dimensionality along two main axes (near/far sounds and loud/quiet sounds), and to reinforce the meaning of sound effects, ambient sounds, and music.
This work is experimental, controversial, and disruptive. I hope it spurs a larger conversation among caption readers and producers about potential futures. I hope it also calls attention to and challenges norms, especially institutional arrangements that place captioning outside of the creative process. My own experiments were initially inspired by the artistic subtitles in Night Watch (2004), a Russian horror movie. In an unusual move, director Timur Bekmambetov “insisted on subtitling [Night Watch] and took charge of the design process himself,” as opposed to having the Russian speech dubbed into English or leaving the subtitling process to an outside company (Rawsthorn 2007). He adopted an innovative approach: “We thought of the subtitles as another character in the film, another way to tell the story” (Rosenberg 2007).
Experiment 1: Using typefaces to signal identity
The 100 (2.16) is a post-apocalyptic television drama that takes place ninety-seven years after nuclear holocaust has wiped out human civilization. 4000 survivors are living on an orbiting space station (the “Ark”). When the series opens, 100 of those survivors are sent down to Earth to begin the process of re-inhabiting the planet. At the end of season two (“Blood Must Have Blood,” 2.16), Thelonious Jaha (Isaiah Washington), formerly the Chancellor of the Ark, encounters a woman named Alie (Erica Cerra) who turns out to be an android. Thelonious realizes something is wrong when her body flickers and glitches. It is at this moment, for both Thelonious and the viewing audience, that her identity switches from human to machine.
Source: The 100, “Blood Must Have Blood, Part 2,” 2015. Hulu. Original captions.
It’s easy to miss Alie’s flickering body while reading the captions at the same time. The flicker effect is quick and subtle. Initially, I wondered whether the captions could do a better job of signaling this shift visually. Might the captions and the AI body flicker together? Captioning advocates haven’t explored how typefaces can be linked to meaning and identity (including how different typefaces can aid speaker identification). For this experiment, I used typefaces to embody the AI character’s transformation, starting with a standard yellow typeface for both Thelonious and the woman, and then mirroring her flickering body in a flickering typeface that turns over to a futuristic, robot typeface for the rest of the scene. Thelonious’ speech continues to be represented in the standard Hulu yellow, sans serif typeface.
Source: The 100, “Blood Must Have Blood, Part 2,” 2015. Hulu. Open captions created in Adobe After Effects by the author.
Experiment 2: Using icons to compress meaning and remove needless repetition
The chase scene in Blade Runner (1982), when Deckard (Harrison Ford) is hunting a replicant through the crowded, dystopian streets of downtown Los Angeles in the near future, is a cacophony of sights and sounds: people yelling in a foreign language (what Deckard calls “cityspeak”), machines hissing, an automated cross walk announcement, and pouring rain. Only the cross walk sounds are captioned on Netflix.
Source: Blade Runner, 1982. Netflix. Original captions.
In this scene, “cross now” is repeated ten times by an automated voice, and “don’t walk” is repeated seven times. All 17 crosswalk announcements are captioned, and these are the only sounds captioned in this scene. The “cross now” sequence includes five almost identical captions:
AUTOMATED VOICE:
Cross now. Cross now. Cross now.
Cross now. Cross now.
Cross now. Cross now.
Cross now. Cross now.
Cross now.
Given the sonic richness of the scene, I found this sequence of captions (as well as the “don’t walk” sequence that follows) to be reductive, unnecessarily repetitive, and distracting. The soundscape (as interpreted by the captioner) is more than the crosswalk announcements. Because the three middle captions are identical (“Cross now. Cross now.”), viewers can also easily miss when one caption is replaced on the screen by an identical caption. I wondered whether the crosswalk captions might be captioned differently. Is there be a better way than simply repeating the same words over and over? (The larger question here has to do with the challenges of captioning public address or PA announcements, a topic I take up in Chapter 4 of Reading Sounds.)
What if instead of repeatedly captioning the words spoken by the automated voice, we used the flashing crosswalk symbols that appear in the movie, bringing them down from the top of the screen where they are only momentarily visible? These symbols (as captioned icons) could then be made to flash in unison with the automated announcements. It’s possible that these icons could convey the meaning of the scene while reinforcing its visual aesthetic too.
Source: Blade Runner, 1982. Netflix. Open captions created in Adobe After Effects by the author.
Experiment 3: Using color and icons to distinguish multiple speakers in a scene
When multiple speakers are taking turns in the same scene, it can be hard to distinguish them on the caption layer. For captions that are limited to bottom-center placement, preceding hyphens can be used to distinguish multiple speakers’ lines of dialogue in a single caption. But preceding hyphens are less effective than positioning captions underneath their respective speakers. Positioning captions in a two-shot, for example, involves placing lines of dialogue for the left speaker on the left side of the screen, and lines of dialogue for the right speaker on the right side. In this way, “positioning carries meaning” (Clark 2003). But as the number of speakers accumulates in a scene along with the amount of overlapping speech, it becomes increasingly difficult to follow the back and forth. Differentiating speakers by caption color is one solution. In the United States, however, closed captioning has been limited to a single foreground color. Users can change this color to suit their preferences, but all captions will be displayed with the same foreground color.
The Goonies (1985) provides an instructive case study of the problems of speaker identification in multi-speaker and overlapping contexts. Just about every scene (or so it seems) of this beloved ’80s classic throws together multiple child actors and asks them to engage in a vigorous back-and-forth about the treasure hunt they are on. Trying to follow the dialogue and match each line with each speaker is exceedingly difficult when the captions are not effectively distinguished by speaker. The bitmap caption track on the official DVD, for example, uses preceding hyphens to distinguish multiple speakers in the same caption–a highly ineffective solution in fast-paced environments populated by multiple speakers.
Source: Goonies, 1985. DVD. Original bitmap caption track.
This scene begs for color-coded captions. It’s no accident that each kid in this scene is wearing a different colored windbreaker. Why not key the color of each windbreaker to the color of the captions for that speaker? Red for Chunk, black for Data, yellow for Mikey, and white for Mouth. I’ve incorporated some animated speaker identifiers at the opening of the scene, as the kids are climbing the hill and before they begin speaking, to set the stage and provide a legend of sorts. While this style of speaker identification may be deemed distracting, it also seems to fit the spirit of a children’s movie by helping younger viewers learn each character’s name and follow the often fast-paced dialogue.
Source: Goonies, 1985. DVD. Open captions created in Adobe After Effects by the author.
These experiments offer radical alternatives to the taken-for-granted landscape of captioning. They disrupt norms and ask us to imagine different futures for captioning. By blending form and meaning, these experiments make captioning an integral component of the creative work instead of an add-on or afterthought. At the least, they force us to reflect on the problematic relationship between programs and captions, producers and captioners, and how we might bring them closer together.
Works cited
Clark, Joe. 2003. “Reading the Tube: Illustrations.” Accessed June 8, 2016. http://joeclark.org/design/print/readingthetube-illos.html.
Downey, Gregory John. 2008. Closed Captioning: Subtitling, Stenography, and the Digital Convergence of Text with Television. Baltimore: Johns Hopkins University Press.
Lewis, Victoria Ann. 2015. “Crip.” In R. Adams, B. Reiss, and D. Serlin (eds) Kewords for Disability Studies (pp. 46-7). New York: NYU Press.
McRuer, Robert. 2006. Crip Theory: Cultural Signs of Queerness and Disability. New York: NYU Press.
Rashid, Raisa, Jonathan Aitken, and Deborah I. Fels. 2006. “Expressing Emotions Using Animated Text Captions.” In K. Miesenberger et al. (eds), International Conference on Computers Helping People with Special Needs, LNCS 4061, pp. 24-31.
Rashid, Raisa, Quoc Vy, Richard Hunt, and Deborah I. Fels. 2008. “Dancing with Words: Using Animated Text for Captioning.” International Journal of Human-Computer Interaction 24 (5): 505–19.
Rawsthorn, Alice. 2007, May 27. “The Director Timur Bekmambetov Turns Film Subtitling into an Art.” New York Times. Accessed June 9, 2016. http://www.nytimes.com/2007/05/25/style/25iht-design28.1.5866427.html?pagewanted=all&_r=0.
Rosenberg, Grant. 2007, May 15. “Rethinking the Art of Subtitles.” Time. Accessed June 9, 2016. http://content.time.com/time/arts/article/0,8599,1621155,00.html.
Sandahl, Carrie. 2003. “Queering the Crip or Cripping the Queer? Intersections of Queer and Crip Identities in Solo Autobiographical Performance.” GLQ: A Journal of Lesbian and Gay Studies 9.1-2: 25-56.
Strapparava, Carlo, Alessandro Valitutti and Oliviero Stock. 2007. “Dances with Words.” Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07). 1719-1724. Accessed June 6, 2016. https://www.aaai.org/Papers/IJCAI/2007/IJCAI07-278.pdf.
Udo, J.P. & D.I. Fels. 2010. “The Rogue Poster-Children of Universal Design: Closed Captioning and Audio Description.” Journal of Engineering Design 21.2-3: 207-221.
Vy, Quoc V. 2012. “Enhanced Captioning: Speaker Identification Using Graphical And Text-Based Identifiers.” Master’s thesis, Ryerson University. Paper 1702. Accessed June 6, 2016. http://digitalcommons.ryerson.ca/dissertations/1702/.
Vy, Quoc V., and Deborah I. Fels. 2009. “Using Avatars for Improving Speaker Identification in Captioning.” In T. Gross et al. (eds) INTERACT 2009, Part II, LNCS 5727, pp. 916–19. Berlin: International Federation for Information Processing.
Vy, Quoc V., Jorge A. Mori, David W. Fourney, and Deborah I. Fels. 2008. “EnACT: A Software Tool for Creating Animated Text Captions.” In K. Miesenberger et al. (eds.), International Conference on Computers Helping People with Special Needs, LNCS 5105, pp. 609–616.
Zdenek, Sean. 2015. Reading Sounds: Closed-Captioned Media and Popular Culture. Chicago: University of Chicago Press. Supplemental website: http://ReadingSounds.net.