Race in Children’s Books

I have known Dr. Anjali Adukia for many years — we overlapped briefly at the University of Chicago, where she is on the faculty at the Harris School of Public Policy, and also, all economists know each other. This fall, she was out at Brown giving a talk about her work on the representation of race and gender in children’s books. The paper (which you can see here and has a number of co-authors) is fascinating both for its methods and its content. So I thought you’d like to hear about it. Our conversation by email is below. Enjoy!

Emily:

Anjali! Thank you so much for doing this. I’m really excited to get to talk about the paper.

I want to start by asking you to give an “elevator pitch” for the work. (This ask makes me think back to the academic job market in economics, in which all the interviews were in hotels and we all prepped our pitches in case somehow we ended up in the elevator with Guido Imbens or Amy Finkelstein and had 30 seconds to impress them. I’m no Amy Finkelstein, but I’ll take the 30-second version anyway!)

Anjali:

Hi, Emily, thank you for reaching out! It’s fun to be in conversation with you. With so much having gone virtual, it makes me wonder what the Zoom equivalent of the elevator pitch is nowadays 🙂

Caregivers deeply want to do the best they can for the children in their worlds, and there is a desire to expose children to a range of ideas and experiences. Who is represented — and how they are represented — in children’s books transmits not only the values of society but also whose space it is. The presence and absence of characters of various identities send implicit and explicit messages that contribute to how children see their own potential and the potential of others, which can, in turn, shape subconscious defaults. But while there is some consensus that representation matters, how do you measure it systematically? Even in a world where parents (and publishers) are tuned into this issue, they often don’t know how to address it or they lack the practical tools to systematically identify inclusionary materials.

In this initial study (joint with Alex Eble, Emileigh Harrison, Hakizumwami Birali Runesha, and Teodora Szasz), we we have two main contributions:

(1) We develop and apply tools to systematically measure representation in children’s books. Specifically, we advance machine-based content analysis by developing new tools that convert images into data and apply them alongside established text analysis tools.

(2) We then use these tools to measure the representation of race, gender, and age in children’s books that have won prestigious awards in the U.S. over the past century. These books are important because they are a central part of the “children’s canon” of literature. In particular, books that won Newbery or Caldecott awards (we call these “mainstream” awards) are more than twice as likely to get checked out or purchased than other children’s books. It makes sense — these books have been sanctioned as having “high literary value” and are prominently showcased in school classrooms and libraries. The conspicuous medal on their covers signals an implicit seal of approval for parents.

What do we find? In research, we draw information from many sources, but we really are leaving data on the table by not analyzing images. We see that computers not only enable cost-effective content analysis, but they also mitigate human bias when trying to classify the skin color of characters that are shown. Importantly, we find inequality in the representation of race, gender, and age on many dimensions. Highly popular mainstream books (those that won Newbery or Caldecott awards) are more likely to show characters with lighter skin, even within the same racial group, than other books that explicitly center underrepresented identities. Children are more likely to be shown with lighter skin than adults, which is concerning in a society where Black children are treated as older than they are and where youth is equated with innocence, and then following that train of logic, innocence is then implicitly equated with lightness, or whiteness.

We see that mainstream books are more male-centered than books that are recognized for highlighting females or are female-centered. Moreover, females are more likely to be represented in images than in text, consistent with the maxim that women should “be seen more than heard.” This suggests there may be symbolic inclusion in pictures without substantive inclusion in the actual story. Males, especially white males, are more likely to be present, regardless of data source: pronouns, gendered words, character names, famous people, geography, or images.

I didn’t see myself in the world around me growing up, and I was surprised to see how things have — and have not — changed in the books read by my kids.

Emily:

I feel like the Zoom elevator pitch equivalent is finding someone in that Gather app and randomly talking to them (or, wait, is my job the only one that uses Gather for parties? anyway).

This is super-interesting; I want to unpack a bit. To give people some context, there is an existing literature on text analysis, which basically takes the text of anything — a book, a congressional speech, etc. — and uses it as data. This is a space my husband has worked in, actually. You can think about questions like “How do Republican and Democratic congresspeople speak differently?” Or, more relevant to your space, questions like “How does the language around race in kids’ books change over time?”

I think what you do is really unusual in the sense of using not only the text but the images also. That’s especially salient in kids’ books, where pictures can be a big part of the experience. So, as I understand it, you’re effectively trying to code the images and make them into some kind of data. This seems hard. With text, we can count words. With images — do you count pixels? Can you give us a specific example of what you try to measure and how you actually implement this?

Anjali:

Gather is awesome! In the pandemic, my younger child (who had just turned 4 when the pandemic began) thought part of my job was “playing Gather,” and she thought it was so cool that she asked for a hide-and-seek party for her birthday using the platform.

The field of computer vision has been trying to figure out this very problem: How can you make a computer draw meaningful information from images? In other words, how can you make a computer “see” what a human sees when viewing images? In our case, what can computers do better than what individual humans can do?

For one, they can systematically classify skin color! What one’s skin color is is a philosophical question. Is it what’s in the shadows, in the light? Or is it some sort of weighted average of all of the different colors that make up one’s skin? When we asked a set of humans to manually code skin color, there was little consistency across the raters (remember the kerfuffle about the black and blue dress? or was it white and gold?). But a computer can take each detected face, classify the perceptual tint of the RGB values of every pixel, cluster them into nearby colors using a process called k-means clustering, and then take a weighted average to give an average perceptual tint of the skin color of the face. We can then either use this continuous measure of skin color or we can create bins such as those used in emoji skin color groups (darker skin, medium skin, lighter skin).

I should note: This is just the beginning of how we can start to draw information from images. We have developed one way, but just like the matching literature, there are many ways one could go about measuring attributes such as skin color. The world of image analysis in the social sciences is wide open for researchers. There are so many opportunities for innovation and exploration; I’m excited to see others enter this space and to see all of the creative directions people take it!

Emily:

Fascinating (and the graphic is very helpful!). So you do this procedure for how many books? For every image in each book? (And is it very slow?)

Anjali:

In this initial study, we have 1,130 books. We run the face detection model across every scanned page, so every image is processed through each model, but models perform differently based on how they were trained. The state-of-the-art face detection model (the one from Google Vision) was trained on photographs, so we had to train our own model, which performed much better. To give you a sense, the attached graphic shows the faces detected using Google Vision (FDGV) vs. the faces detected using our model trained on illustrations (FDAI) for a subset of Caldecott books. Once the models were created, the slowest part was digitizing the actual books. Once they were digitized, it took approximately 20 hours on a supercomputer to run all the scans through the face detection, skin color segmentation, skin color classification, and feature classification algorithms.

Emily:

I do like a good supercomputer. It’s amazing to think about how far computing has come since we were kids. But that is not today’s topic!

As I understand it, you classify images from both a standard set of award-winning mainstream books (winners of Newbery or Caldecott medals) and also from a set of books more explicitly focused on better representation of a broad range of groups (I think you call this the “diversity sample” in the paper). One of the findings is that characters of color in mainstream books are more likely to be shown with lighter skin. Can you unpack this? Does it mean that (say) Martin Luther King Jr. appears more white in the mainstream sample?

Anjali:

That’s correct. Characters in the mainstream collection are more likely to be shown with lighter skin than in the diversity collection, even after holding race constant. Meaning, characters that are classified as a given race (e.g. Asian) in the diversity collection are more likely to have darker skin than characters that are classified as that given race (e.g. Asian) in the mainstream collection (I’ve attached a relevant figure that shows this, if it’s helpful). You can see in these examples an image from a book in the diversity collection compared to an image from a book in the mainstream collection: the characters are all classified as being Black, but they are depicted with lighter skin in the image from the mainstream book.

Emily:

What do you see in terms of text? Does the language they use around varying groups differ across the books, or is it primarily in the pictures?

Anjali:

In separate work using methods from natural language processing (e.g. word embeddings, word co-occurrence), we show that Black females are most likely to be associated with struggle and performance arts, Black males are most likely to be associated with sports and struggle, white females with family and performance arts, and white males with power and business (and really, white males are associated with everything). Females are also more likely to be associated with appearance words (sigh). (This is from separate work with Callista Christ, Anjali Das, Alex Eble, Emileigh Harrison, and Hakizumwami Birali Runesha.)

Emily:

Sigh. I wish this was surprising.

I read from this whole conversation that in these celebrated books for kids, we still see an arguably problematic type of representation. With some consistent stereotypes around gender and race, and appearance alterations that, most notably, seem to lighten the skin of characters of color. The comparison set, at least for the latter fact, is a set of diversity-oriented books.

The question is then: What to take from this? I wonder about your reflections on that both from the standpoint of a parent and maybe more broadly from the standpoint of, say, a book publisher. Why do we want to change this and who should, I guess, is the core of the question. A small one!

Anjali:

What is “optimal” representation is a normative question beyond the scope of my role as a scholar and depends on individual goals. I can say that as a parent, I want to offer content to my children that exposes them to a wide world of perspectives; I want diversity to be mainstream — I don’t want their subconscious defaults to prioritize people who happen to have power or voice. When we include, or exclude, narratives of people with different backgrounds, we create a narrow set of windows through which our children view the world and an even narrower set of mirrors for how they might view themselves. This can shape beliefs that then affect the actions that people take, which then can determine outcomes. I want their books to reflect a world of possibilities — both for themselves and what they assume of others.

And I try so hard to influence what makes it onto my kids’ bookshelves and on their video lists (who knew that slime videos were all the rage for the preschool demographic?), but I can only do so much of this curation. They get exposed to content in all parts of their lives — school, libraries, media, friends, family — so it really has to start from the source: the creators and purveyors of this content (in this case, the book publishers) play the role of social architects, intentionally or not, shaping the representation in the materials they create and promote. Increasing the availability of books with greater representation not only matters intrinsically but also is smart business: people want to see themselves in the world around them.

I do think that many publishers want to increase the representation in their books; it’s just that they don’t know how to systematically assess representation in a cost-effective manner. That’s where these AI tools can be so useful. Publishers have the digitized content that they can then feed into these easy-to-use pipelines that can provide baseline measures of representation in each book (which can complement existing manual content analysis methods that are more labor-intensive but allow for more nuanced assessments of books). To be clear, we are not proposing a scorecard system but rather an awareness system. Information is power, and this is just another tool in the toolbox that provides information that we haven’t previously had at our fingertips and that can inform decisions if people so choose.

Emily:

That seems like as good a note as any to end on. Thank you so much for doing this! It was wonderful to chat with you, and the paper is important and impressive, which is a rare combination.