Learning to Speak

One of my children’s favorite books when they were little was Knuffle Bunny. It follows a girl, Trixie, who loses her favorite stuffed animal, Knuffle Bunny, at the laundromat. She continually yells at her dad, “Aggle flaggle klabble!,” which he does not understand, and he’s left dragging her home as she screams. Only when her mom realizes Knuffle Bunny is not with them do they run back to the laundromat and find the toy, at which point Trixie yells, “Knuffle Bunny!”

Which perfectly captures the process of your child learning to speak. It’s frustrating and harrowing, and also incredibly neat when they get it.

Today’s conversation on the podcast is with Michael Frank, a professor at Stanford who works on this very question of how children learn to speak. We talk about how language works, the difference between receptive and expressive language, and which language is the hardest to learn to speak (spoiler alert: it’s Danish).

To spark your interest, here are three highlights from the conversation:

How do kids learn how to talk?

Emily Oster:

Without asking you to teach me five semesters’ worth of PhD classes, can I just ask you to talk a little bit about this very basic question of how does my kid learn to talk, because it seems like magic.

Michael Frank:

I think it seems like magic to a lot of us, actually. It’s really amazing. So, you have this trajectory over not that many years where you go from just not being able to talk to being able to do pretty well at making your way through the world in conversation.

Back in the day, the starting hypothesis was: I say things to my kid, they repeat it back — they imitate. And that’s in some sense not a terrible way to start thinking about language development, because in different languages around the world, there’s a different word for dog. And so you’ve got to learn that word by hearing it and hearing it matched with something in the world, ideally a dog, or a stuffed dog or a picture of a dog in a book or so forth. But language is more than imitation. So you’re hearing these things, but you’re also hearing how they go together in sentences. And it’s actually very, very quick to see kids, starting at maybe even 18 months, two years, to combine words in ways that they haven’t heard before — starting to generalize and produce new forms.

And when that gets super-obvious to parents is when they start to make errors, especially when they say, like, “Oh, I goed to the store like this.” That’s them combining that “-ed” from other verbs that they know with a verb that it doesn’t go with — so, basically some combination of learning from the environment from the people around them to get those word forms and then starting to combine them into novel expressions, trying to make new sentences and get their message across in whatever way they can.

At what age do babies start to comprehend language?

Emily:

My mother-in-law insists that my husband said the word “fishy” when he was six months old and that that was his first word. I mean, I’ve sort of always thought about that as, just, I think she completely fabricated it, which is almost certainly true but maybe not.

Michael:

My grandparents say that my first word was “rhinoceros,” which is hilarious for so many reasons. One of which is that the sounds in “rhinoceros” are actually extremely hard to say.

Emily:

And also, how would you have been exposed to many rhinoceroses?

Michael:

First words are really notoriously difficult. It is very interesting, though. There’s a bunch of work with more sensitive measurements that has shown that six-month-olds, seven-month-olds, nine-month-olds do know a teeny little bit about language. There are detectable traces of understanding a little bit of words under the surface. So, what that means is if you gave them a choice between “ball” and “dog,” when you say “dog,” they’re going to look at the dog like 53% of the time.

Emily:

This is these eye-tracking experiments where you watch where they’re looking to see whether they understand what you’re talking about.

Michael:

Exactly. And you can find the first traces of success in that early period, but it’s not really until about the first birthday where language comprehension really leaps above the surface and you start to see, “Oh yeah, you get this. You’re understanding what I’m saying,” in some kind of more consistent way.

Should you talk to your baby frequently?

Emily:

One of the things parents hear a lot is, you have to talk to your kid all the time — that the way to get your kid to learn language is to narrate your entire life beginning from birth. “Mommy is taking your diaper off now. Oh, you had such a big poopy in your diaper. Now mommy is putting your diaper back on,” et cetera. Your whole life needs to be narrated, and that is how you get your kid to learn to talk. Is that true? Please tell me no.

Michael:

That’s the most extreme version of it. Where that comes from is that there have been studies looking at kids’ language input — and that could be how much they hear at home, both in terms of the quantity and also the quality of the input — and some outcome measure, like how much they speak or how big their vocabulary is or maybe even how they do eventually when they get to school.

And those correlations do exist. They’re modest in size, in part because any correlation between parent and child has a lot going on. There are [also] some studies where you can intervene on the parents to try to get them to talk more to the kids. Not very many, not very big, but there are some, and you do find gains in the child’s vocabulary.

Now, the next question is: can we then estimate how big an effect that is, how it lines up? There’s a lot of complicated research questions about the causal inference there that are not well worked out.

So, that’s all to say there is some effect. Does that mean that you then should go and intervene in your own house in this kind of uncontrolled way? I would say no. What you should do maybe is think from first principles like, okay, what’s a good, high-quality, fun, language-rich thing you can do with your kid. Reading a book is one, or having a conversation or in-depth play session. A lot of the ways that we think of relating in a high-quality way to our kids actually provide really good, what we call “grounded,” language, where we’re talking about the here and now in some way that they understand.

Emily:

Okay, so I’m hearing that it’s good to talk to your kid, but you don’t need to narrate the diaper change. Or you could, if it was enjoyable.

Michael:

Yeah, if that’s the way you relate to your kid. But certainly, there are probably some ceiling effects there. You don’t need to narrate everything. And also, the kind of language that’s most important for the kid is the kind that’s about what they care about embedded in a fun and interesting interaction. So, that’s the kind of language that’s going to be most impactful if you’re really aiming in that direction.

Subscribe to the podcast

Full transcript

This transcript was automatically generated and may contain small errors.

Emily Oster:

This is ParentData. I’m Emily Oster.

When I was a baby, my mother was at a party and she met a woman named Katherine Nelson, who was an expert in child language development. And as my mom used to tell it, she explained to Katherine that I like to talk to myself in the crib, more or less nonstop. And Katherine asked her, “Well, would you be willing to record that?” And so my mom recorded, with a tape recorder under my bed, me for two years, almost every night when I talked to myself. And she returned all of these tape recorders over to Katherine, and a bunch of researchers used them to try to understand how kids develop language.

And I remember when I was nine getting the book that resulted from this research and looking through it and thinking, “Who would read this? It’s incredibly boring.” But as an adult, as a parent myself, I have gone back to read that book and I no longer find it boring. Not because I’m so interested in my own babbling, although who isn’t, but because the question of how our kids develop language as parents is so fascinating because it’s so hard to understand. At one moment your kid is just babbling, and then all of a sudden it feels like you wake up two weeks later and they’re talking. And those first moments of talking in a way you can understand are so powerful because now instead of screaming and throwing things, they can say “Milk”, and you can understand, “Ah, milk. I should have guessed.”

So child language development is exciting, is interesting, it can also be fraught. It is one of the first things that gives us a moment to say, “Well, is my kid behind? Do they have enough words? What is enough words? How do I understand whether that even is a word?”

My guest today, Michael Frank, researches language development. He’s a professor at Stanford and he studies how kids learn to speak and how we can understand how kids learn to speak. And we’re going to talk today about what’s cool about studying this, what we know about it, how we should think about the difference between understanding and speaking. And think about when you should be worried, and should you be worried, and how to think about the stresses that come with this. After the break, Professor Michael Frank.

Emily:

So, I’m delighted to welcome Stanford Psychology professor, Michael Frank, to the podcast. Mike, thanks for joining me.

Michael Frank:

Absolutely. Thanks for having me.

Emily:

So, to introduce people to you, I actually want to read language from your website because I think it’s the simplest way to explain why I’m excited about this conversation. So, here’s from your website. How do we learn to communicate using language? I, that’s you, study children’s language learning and how it interacts with their developing understanding of the social world

I am interested in bringing larger datasets to bear on these questions and use a wide variety of methods including eye-tracking, tablet experiments, and computational models. Recent work in my lab has focused on data-oriented approaches to development.

So, for parent data, you’ve pretty much just put together our favorite keywords. There’s data, big data, children, methods, development. It’s the best. And so, I was struggling with where to start and we’re going to try to touch on a bunch of those pieces, but I think we’re going to start with language development. So, without asking you to teach me five semesters worth of PhD classes, can I just ask you to talk a little bit about this very basic question of how does my kid learn to talk, because it seems like magic.

Michael:

Yeah, I think it seems like magic to a lot of us actually. No, it’s really amazing. So, you have this trajectory over not that many years where you go from just not being able to talk to being able to do pretty well at making your way through the world in conversation. So, back in the day, the starting hypothesis was, I say things to my kid, they repeat it back, they imitate. And that’s in some sense not a terrible way to start thinking about language development because in different languages around the world, there’s a different word for dog. And so, you’ve got to learn that word by hearing it and hearing it matched with something in the world, ideally like a dog, whatever, or a stuffed dog or a picture of a dog in a book or so forth. But language is more than imitation. So, you’re hearing these things, but you’re also hearing how they go together in sentences. And it’s actually very, very quick to see kids starting maybe even 18 months, two years, to combine words in ways that they haven’t heard before. So, starting to generalize and produce new forms.

And when that gets super obvious to parents is when they start to make errors, especially when they say like, “Oh, I goed to the store like this.” That’s them combining that -ed from other verbs that they know with a verb that it doesn’t go with. So, basically some combination of learning from the environment from the people around them to get those word forms and then starting to combine them into novel expressions, trying to make new sentences and get their message across in whatever way they can.

Emily:

If we sort of dialed into the brain part of this, how much do we understand about, I don’t know, the neuronal connections of how that happens? Do we actually know what part of the brain is turning on or off when we’re doing that? Or is that still kind of a black box?

Michael:

I would say basically it’s still a black box in terms of kids. So, when we’re studying adults, there’ve been a bunch of really nice studies recently that have really uncovered some of the neural structures that are supporting language. I mean, there’s a long history here, of course, dating back to people, looking at folks with brain injuries. But there’s been some exciting-

Emily:

Phineas Gage. My younger kid is named Phineas, so this is sometimes people would say, “Oh, like Phineas Gage.”

Michael:

Well, he didn’t lose language. He lost according to the [inaudible 00:03:38] version, self-control.

Emily:

That’s true. That’s true.

Michael:

But Broca and Wernicke, these neurologists back in the 19th century were looking at what happened to language when you had brain injuries. But it’s really modern neuroscience with decoding methods and even computational models that’s starting to give us insight into how language is working in the adult brain. But there’s lots of challenges when you try to scale that to the kid’s brain.

Most notably our best technique for this is functional MRI, which requires you to sit still for many minutes at a time, which is not a strength of toddlers learning language. So, we’re actually really, I think in the next couple years going to learn a lot as folks like some of my colleagues here at Stanford persuade toddlers to sit still for just a little bit longer and get a little bit faster at functional MRI scanning and maybe we start to apply some of those insights from the adults to the kids. But right now, the brain data on kids learning language is actually much more limited than you might expect.

Emily:

So, when my kid is little, I mean, this is something I think parents wonder is when does something become language? So, your kid starts, they’re making the noises, ba-ba, the noises, and then slowly it turns into something, which maybe it seems to them like a word, but certainly isn’t something that hears … Should we view those as words or is this all sort of a vague … Is there a moment when it’s like now they understand the idea of the mapping or does that just a very slow process?

Michael:

Yeah, part of the amazing thing about language is it does have this gradual emergence at the beginning. So, my daughter, Madeline, around nine months, she said something like bru-bru, which I think meant brown bear from that classic Brown Bear, Brown Bear, What Do You See book. And I was like, “Oh, my god, it’s the first word. We’re here. We’re in communication. It’s language, it’s great.” And then it went away and we didn’t hear anything else until about 13 months. So, that was kind of a long dry spell.

And first words are kind of frustrating like that. They come and they go. They’re sometimes communicative. Sometimes they just feel like you’re trying things out. One way that my father-in-law, who’s actually a speech pathologist, describes this process is that kids are playing the instrument. They’re just trying stuff out with their voices. And you’ll actually see this with signing kids, too. They’ll babble with their hands. They’ll try out different gestures.

But I think that the reason here that this is a complicated question is because there’s no language, not language distinction. It’s all communication all the way down. That’s my viewpoint is even very, very early on, kids are really communicating to other people.

And you see that actually well before language in most kids because they start to point and that emergence of pointing is a huge milestone. It’s really, “I want to share this external thing with you.” And a lot of people have argued that that’s pretty unique to humans. That big declarative like, “Let’s share this thing that’s outside of the two of us, but let’s share it between the two of us.” That triadic idea. That’s human unique and really predictive of later language. It’s really a powerful moment.

So, I actually remember that moment with both of my kids that fingers out, we’re looking at something else and you want to tell me about it. And that’s really special. I was walking down the street in New York City with my son Jonah when he was about 10, 11 months old at Thanksgiving visiting my parents and he was like, “Da.” I was like, “Yeah, that’s the moment.”

Emily:

Yeah. My mother-in-law insists that my husband said the word “fishy” when he was six months old and that that was his first word. I mean, I’ve sort of always thought about that as just I think she completely fabricated, which is almost certainly true, but maybe not.

Michael:

My grandparents say that my first word was “rhinoceros”, which is hilarious for so many reasons. One of which is that the sounds in rhinoceros are actually extremely hard to say.

Emily:

And also, how would you have been exposed to many rhinoceroses?

Michael:

So, first words are really notoriously difficult. It is very interesting though. There’s a bunch of work with more sensitive measurements that has shown that six-month-olds, seven-month-olds, nine-month-olds do know a teeny little bit about language. So, there are detectable traces of understanding a little bit of words under there under the surface. So, what that means is if you gave them a choice between ball and dog, when you say “dog,” they’re going to look at the dog like 53% of the time.

Emily:

This is these eye-tracking experiments where you watch where they’re looking to see whether they understand what you’re talking about.

Michael:

Exactly. And you can find the first traces of success in that early period, but it’s not really until about the first birthday where language comprehension really leaps above the surface and you start to see, “Oh, yeah, you get this. You’re understanding what I’m saying,” in some kind of more consistent way.

Emily:

So, that’s actually a good segue to what I think is there’s the part of this that’s so exciting that it’s just so cool, but then there’s the part of it that’s really scary because your kid isn’t doing the thing that you thought they’re not pointing, they’re not making the words, they’re not developing in the way that we expect. And I get a lot of questions about this and it is just a source of tremendous stress for parents. And so, I’m not even sure I have a question here as much as can you comment on that topic?

Michael:

Yeah, it’s totally stressful. So, I think there’s a couple things to say here. The first is that language is really variable. Early language is scary and frustrating and complicated for parents in part because the variability is very high. So, most kids in the US and actually around the world walk within a month or two of their first birthday. There’s some variation, but it’s not so big.

But if you take a milestone for early language, like saying 50 words or 100 words or combining words, it varies by a lot of months. And that’s within typically developing healthy kids who are going to have good language outcomes. So, if there’s anything that I’ve discovered from digging into data with lots and lots of kids, that was I think not really appreciated or known, it’s just how much the variation between kids is actually consistent or universal across the samples we look at. No matter where you’re looking in the world, kids are all over the place with respect to language.

Emily:

So, we’re talking about when we look at typically developing 18-month-old, you can have these ranges between a few words, maybe almost no words up to kids who kind of seem … I mean, I had a friend whose kid at that age, we showed up at a restaurant and she was like, “Look, there are umbrellas.” I was like, “What? Are you kidding me?”

Michael:

No, exactly. It’s really, really remarkable how different you walk into a preschool class, how different these language abilities are across the class. So, that’s within the typically developing range, and that’s very anxiety provoking for parents who always want to be at the very top of the range, or at the very least in the middle.

Emily:

Or who worry that it is predictive, right? I mean, I think the worry is in some ways not so much that people are worried that their kid won’t learn to talk, but that there’s something that this is denoting that is going to be true in the long term.

Michael:

So, one really important thing about that is that when you talk as a little kid, it’s not just related to how much you understand, but also how easy it is for you to move your mouth in a particular way and move your entire articulatory apparatus. Your vocal cords need to hum and buzz the right ways you need to push air through your entire system. So, there’s a lot of coordination that has to happen in order for talking and especially for combining words into complex sentences to happen.

Evolutionarily speaking, there’s a lot of mechanics that human beings have that other primates don’t for using your vocal apparatus and your breathing and your vocal cords and your articulators so flexibly to be able to do this incredibly fast dance that we do when we talk. For example, my son, Jonah, actually was by many standards what’s called a late talker. His productive vocabulary was relatively low at age two. I wasn’t worried at all about him. Well, I wasn’t worried very much about him. How can you be a parent-

Emily:

You are a parent so, yeah.

Michael:

Yeah, got to worry. But I wasn’t really very worried from a rational perspective because I knew he understood a tremendous amount. Language was working, but moving his mouth in this particular way was pretty hard. And his early vocabulary was hilarious. He had some numbers, some abstract colors, a bunch of kind of abstract complicated stuff that made me feel like, “Okay, your cognition is fine,” but this whole moving the mouth thing and lining that up with the vocal cords, that was really pretty tough for him.

Emily:

It’s interesting because I think another perhaps simpler way to say this is just that we think of productive language as a metric of cognition, as a sort of pure, it is your brain that is doing this. And in fact, it is a combination of cognition and something that’s a more basic physical milestone, which is the production and the way we don’t think about walking as something that is about your cognitive development. We just think about it as a physical thing, but there’s this physical component to this as well.

Michael:

That’s exactly right. And so, when you go to the pediatrician or the speech pathologist when your child at age two, two and a half isn’t saying that many words, one of the first things they look into is going to be, “Well, are they having trouble understanding language as well or is their understanding relatively intact, but there’s something speech motor that’s giving them difficulty.”

So, there is huge variation. That’s not to say that you shouldn’t sometimes worry. There are places where we do need to go to the pediatrician, ask for a referral, maybe go to a speech language specialist, try to figure out what’s going on. And that’s typically when a child is in that third year, they’re two, two and a half and they’re still not saying much and especially when they’re having trouble understanding or even earlier if they’re really not communicating, so they’re not doing those communicative gestures like pointing, they’re not responding to their own name consistently.

These are the kinds of indicators that we look at where it’s good to chat with a professional to try to figure out what the next steps are.

I think what you want to look for for that reassurance is that comprehension, is the understanding and the desire to communicate. Those are going to be your big, kind of easy to spot markers. If there’s comprehension and desire to communicate, it’s probably a speech motor thing. If there’s not, you might be looking at either a general language issue or something like an autism spectrum disorder.

Emily:

And there is, of course, all of these things that the top on the variation that are predictable. If your kid is learning multiple languages, the productive vocabulary in any single language is smaller because there’s many languages and it can be hard for people to perceive their kid expressing as much because of those differences.

Michael:

That’s exactly right. And this is one of the biggest challenges that our field is dealing with from a practical perspective now, is that when you have kids growing up in very varied multilingual environments, maybe they’re learning one language at school and the other at home, maybe they have one caregiver speaking one language, maybe they’ve got a whole distribution of languages in different circumstances. How do you assess whether what looks like a smaller vocabulary or smaller comprehension ability, even in one circumstance, whether that’s predictive of an actual language delay.

So, doing good assessment in that kind of case is something that we’re working on and a bunch of other groups around the world are working on because it’s so critical to find out is this child experiencing a real language delay where there might be an intervention that we could do? Or is this just a natural consequence of, “Hey, there’s a lot of language input from different sources coming in. There are different languages. I need to differentiate them. I need to learn enough vocabulary to work with each independently.”

Emily:

So, let me ask one other practical question before we talk about data, which is one of the things parents hear a lot is you have to talk to your kid all the time, that the way to get your kid to learn language is to narrate your entire life beginning from birth. “Mommy is taking your diaper off now. Oh, you had such a big poopy in your diaper. Now mommy is putting your diaper back on,” et cetera. Your whole life needs to be narrated, and that is how you get your kid to learn to talk. Is that true? Please tell me no.

Michael:

That’s the most extreme version of it. So, where that comes from, this is sort of maybe a good data story. Where it comes from is that there have been studies looking at relations between kids’ language input, and that could be how much they hear at home, both in terms of the quantity and also the quality of the input. And then, there’s a correlation with some contemporaneous, some outcome measure like how much they speak or maybe some later outcome measure like how big their vocabulary is or maybe even how they do eventually when they get to school.

And so, those correlations do exist. They’re modest in size in part because any correlation between parent and child has a lot going on. There’s measurement error, so forth. Then there’s also the question of how much those are forced by-

Emily:

Are they necessarily causal, unnecessarily?

Michael:

Right. Are they forced by genetics? Are there other aspects that are causally endogenous there? So, there’s a lot going on. So, there are some studies where you can intervene on the parents to try to get them to talk more to the kids. Not very many, not very big, but there are some, and you do find gains in the child’s vocabulary. Now, the next question is can we then estimate how big an effect that is, how it lines up? There’s a lot of complicated research questions about the causal inference there that are not well worked out.

So, that’s all to say there is some effect. Does that mean that you then should go and intervene in your own house in this kind of uncontrolled way? I would say no. What you should do maybe is think from first principles like okay, what’s a good high quality, fun language rich thing you can do with your kid. Reading a book is one, having a conversation or in-depth play session. So, a lot of the ways that we think of relating in a high quality way to our kids actually provide really good, what we call grounded language, where we’re talking about the here and now in some way that they understand.

And maybe those are typically the moments that are most predictive also in these studies. So, it’s not about constantly talking and jamming the Wall Street Journal in their ear. It’s about creating moments that are high quality and fun and interactive and motivating. And that’s often what we encourage parents to do, even when we’re not really thinking about language per se.

Emily:

Okay, so I’m hearing that it’s good to talk to your kid, but you don’t need to narrate the diaper change. That’s what I got. Or you could, if it was enjoyable.

Michael:

Yeah, yeah. If that’s the way you relate to your kid. For me, it’s kind of more fun to chat with a baby or a kid even if they’re not chatting back. But certainly, there are probably some ceiling effects there. You don’t need to narrate everything. And also, the kind of language that’s most important for the kid is the kind that’s about what they care about embedded in a fun and interesting interaction. So, that’s the kind of language that’s going to be most impactful if you’re really aiming in that direction.

Emily:

So, a lot of people have talked about the language learning loss during COVID, whether it’s because of lack of exposure to other people in a young age, whether it’s because of masks. How much do we know about even whether there was a COVID effect on language development for little kids?

Michael:

Not so much I would say. What we have are some very incomplete observational data sets and as part of a big cross-national consortium that jumped into action and tried to study language effects during lockdown and maybe there were some tiny little effects and maybe they were related a bit to things that the parents said they did or didn’t do with the kids. Maybe they said there’s more screen time and then they reported less language, methodological fans of your show or other critics would know it immediately that, and we said this in the paper, I think we’re very upfront about it.

If your kid is more on screens during the pandemic lockdown because you got to work and then you report they have less language, you both know they were on screens and you were also not hanging out with them, seeing what they knew about language. So, it’s very, very tough to figure this out and we don’t have a lot of direct assessment. It was the pandemic. We weren’t going into people’s homes and showing the ball in the box. So, it’s really tricky.

I would say from first principles, I am not worried about masking as a key source of issues in language comprehension and language learning. I think there are momentary issues. It’s hard to understand somebody when a mask is on and that’s going to be true, multimodal language input, meaning when you can see the face and the lips moving, that’s obvious. That really does help in lots of experiments. So, understanding what you’re seeing in the moment with the mask on is harder than with the mask off.

But nobody, or almost nobody was growing up for a long time with everyone around them with a mask on. They were just growing up with sometimes some people with masks on. My kid was probably the most masked of anybody in the Bay Area, was super intense about this and they had masks on outdoors for a very long time. You could shake your head at that. They’ll still have masks on if there’s a whiff of COVID in the next county. Sorry. But they’re still very, very intense about these things.

And even so, what was he getting? Okay, six, seven hours a day of masked input and then he comes home and we’re talking to him on the weekend. So, in terms of hours a day, it’s not so much.

Emily:

All right.

All right, so I want to talk now about the big data, about the value of big data in this space. And I actually want to tell you that I am personally small data. So, when I was a baby, my mom put a tape recorder under my crib on the behest of a language development specialist named Katherine Nelson, and she recorded me for many years talking to myself and a bunch of people wrote this entire book about my individual personal language development as a child. Obviously, your approach is very different because you have more than one child in many of your studies.

And so, I’m curious how you think about the value and first, it might be useful to hear a little bit about what one means here by big data, but also then how do you think about the value of that versus the kind of intensive study of say one individual kid.

Michael:

So, I think I actually have that book.

Emily:

Narratives from the Crib?

Michael:

Yup.

Emily:

Yeah, there it is.

Michael:

I remember looking at it and then somebody at one point told me that that was you in the crib. So, yeah, I’m also good friends with my colleague, Eve Clark and I also am friends with her son, who’s the subject of her monitoring.

Emily:

Another N of one.

Michael:

Yeah, I only once brought that up in a social situation where he was present.

Emily:

Nice.

Michael:

So, yeah, I think the first thing I’d say is there’s been this custom in psychology of doing experimental studies where you get really tight experimental control over something in the lab. And a lot of those studies historically have had 16 kids and each kid has contributed just a handful of data points. And I think of that as really small data. The kinds of things you learn from that can be very interesting. But typically, the estimates that you get, the measurements you get are very noisy. They don’t tell you much about any one kid and they don’t tell you that much about exactly what this experiment does.

So, I think you could either go deeper on one kid or broader with many, many kids. Both of those are actually very useful and appealing from my perspective. So, when I got started, one of the things that I was doing was collaborating with Deb Roy at the MIT Media Lab, who studied his kid by recording about 70% of his child’s waking hours and then his collaborator named Brandon Roy. Unfortunately, because they weren’t related at all. It wasn’t his son that was the collaborator. It was just another guy that happened to have the name Roy.

But anyway, Brandon led like 60 transcribers to transcribe millions of words of this one child’s daily life. Everything he said, everything he heard, and that’s an incredibly fascinating data source that we’ve worked on a lot. So, you learn a lot from that kind of big data, not big and as the number of participants.

Emily:

Like big T. We would say big T for many time periods in economics. Yeah.

Michael:

Yeah, exactly. So, I think that can be fascinating. But like I said, language is so variable that learning about one kid’s development really tells you a lot about that kid’s development and a little bit about other kids’ development. So, the opposite approach has been to look at very, very large samples. And so, that’s what we’ve done in the Word Bank Project, where we gathered together surveys of parents about their kids’ language. These are quite detailed surveys where we get them to check off what words their kid says or what words their kid they think understands.

And then we compile those in a big data database that represents not all of the world’s languages of course, nor even all of the world’s regions, but maybe a broader and more diverse set than what typically gets studied in lab studies.

And so, that gives us a window into that big and variability how different kids are from one another and also gives us a really important window into the acquisition of different languages in different cultural contexts because there’s so much about language and culture that we’ve talked about here implicitly, how we think we’re supposed to talk to kids, what we think the goal is that we want to get our kids to succeed at school and we want to talk to them a lot. That’s all very culturally conditioned. So, that approach has really been to look at variability and to try to understand the sources of variability in kids’ language and how those relate both within individual kids and across cultures.

Emily:

Are there some languages that are more difficult to learn or take longer? I mean, I know there’s a lot of variability across kids, but is there something systematic across languages?

Michael:

This is one of these questions that people always ask linguists and the linguists are always like, “No, it’s universal, the language acquisition device …” It turns out that it’s not completely. There is one language that we know empirically is really horrible for kids to learn and it’s Danish. Danish has a lot of sounds in it and the sounds are all kind of slurred together and not well articulated, and there’s all these ways that they drop the sounds.

Emily:

Danish listeners, we’re sorry. He’s not being critical.

Michael:

I know this because of a bunch of really fabulous Danish researchers who have this long-running study called The Puzzle of Danish, and The Puzzle of Danish is like, “Why is it so hard to learn our poor old language? We speak it just fine, but man, these kids are struggling.”

Emily:

That’s so interesting.

Michael:

Yeah, so a lot of the things that can make a language hard for second language learners, well, some of the things that can make the sounds, especially of a language hard for second language learners when you reduce them or they’re not as well articulated or they’re slurred together or changed when they’re next to each other, that really can make a language harder to learn. Other things, it turns out kids are remarkably good at learning all those word endings and what we call the morphology, the little bits and pieces that aren’t themselves words, all of that stuff, especially as English speakers, we struggle with kids don’t have as hard a time with.

Emily:

Yeah, I mean, that’s such an interesting aspect of learning to talk and learning to read in English, which both of which seemed a little magical to me, is this idea that in English, there are so many weird endings. There are so many why is the K silent when you learn to read? It’s ridiculous. And yet somehow, people sort of pick it up. I don’t know why.

Michael:

Well here, English is actually the hard one or a hard one. So, if you’re in an English school, kids learning to read, there’re going to be a lot of kids that are having trouble with all of those complexities of English orthography. That’s not English language per se, it’s how we write the darn thing down. In contrast-

Emily:

Whereas in Spanish, it’s completely phonetic. What you see is what you get. You read the sounds and then the sound, everything is totally phonetic.

Michael:

Yeah. So, you can go into an English speaking elementary school and find out how good a reader is, how well a kid is doing at reading by having them just read out words and if they can actually pronounce a bunch of the words, that’s a really good measure of their overall reading success. In a Spanish-speaking school, that measure is going to top out at six or seven or eight because kids are just like, “Yeah, you say the sounds, what’s the big deal? Because my language makes sense.” And English-speaking kids are like, “My language doesn’t make sense. I have to read hundreds of books and get explicit phonics training to third or fourth grade in order to just figure this thing out.”

Emily:

No, I think the low when I was teaching my daughter to read it was the word “know” because it was just like, “Why are there these extra letters? There’s two extra letters.” Not no like yes-no, but know like I know it. It’s just like why are there these extra letters? Who knows?

Michael:

My son, Jonah, loves to get everything explained. He’s been in the why phase for two and a half years now. And he’s right at the point where he is interested letters and we’re playing letter games in the car, like an animal that starts with, and we’re just like, let’s just skip C, we’re going to get stuck on C. And he’s just going to be like, “Why C make K? What’s the deal here? You got to explain this.” I’m like, “Okay. It starts with the Norman invasion.”

Emily:

It’s like, how far back do we need to go? All right, so I want to end by asking you about this extremely interesting new paper you wrote about AI. So, your abstract starts by saying the following, large language models show intriguing emergent behaviors, yet they receive about four to five orders of magnitude more language data than human children. What accounts for this vast difference in sample efficiency?

My read of this in lay terms is that in these large language AI models like ChatGPT is built on top of, they’re able to do these incredibly impressive things. Like you ask them to write a poem in the style of William Shakespeare about farts and they do a great job. My kids like to ask that question. But in order to get to that, they have to be fed in an unbelievable amount of data, say like 10,000 times more than a kid will hear, to get to the same ability to write not a very good poem about farts, but some folk poem, original generated poem about farts. So, why is that?

Michael:

Yeah, that’s the big question.

Emily:

Unfortunately, the paper doesn’t answer that. It just says this is very hard. But I’m curious. You had some ideas, so I was curious.

Michael:

Yeah, yeah. There are a couple ideas there. And let me clarify what I mean by intriguing emergent property. The fart Shakespeare is something that it might take somebody a lot of reading and writing and practice to get good at. But my example that I think is hopefully an easier one than maybe more appropriate for the various content-

Emily:

No, no, this is an adult podcast, so it’s fine to say farts.

Michael:

Well, so the thought experiment I do a lot is when a kid comes home from kindergarten and it’s a rainy day and they’re bored, you can be like, “Hey, let’s play Uno. Here’s the rules of Uno. I’m going to explain them and now we’re going to play Uno.” And for a lot of kids, that works totally fine. You can just explain the rules of a game, give them two examples, and then you’re off to the races and they’re immediately starting to argue with you about wildcards or whatever.

For AI, that’s kind of what I meant by intriguing emergent capacities is that you could give instructions and provide examples and they could learn from that. And that’s the thing that popped out after gazillions of examples. They have to read a third of the entire internet in order to be able to figure that out.

So, one big difference is the kind of data that the AI is getting. It’s getting decontextualized language. It’s basically imagine just printing out all of Wikipedia in a single stream with no formatting and shoving that into a learning machine. It’s very hard to figure out what’s going on there. In contrast, what kids get is this very social language that’s built up in conversation in context they understand. It’s accompanied often by other cues. You might be looking at the Uno cards. It’s within a set of conventions and practices that they understand, “Now we’re going to play.”

So, there’s all these other structures that allow the child to make much more of the data they get. And I think it’s an open question exactly which of those matters. Is it the social stuff? Is it the fact that there’s grounded visual input or other kinds of input? Is it the context? That’s the kind of thing that we’re as a scientific community working on figuring out.

People are trying to train models on videos instead of written text. They’re trying to train models in more social ways and so forth. But that’s the general puzzle is like, okay, what would you do if you had much more enriched data like a kid? How much does that make a difference and why.

Emily:

Fascinating. I mean, I feel like the AI models, maybe they’re good enough now, so maybe we don’t need to make them better, but…

Michael:

It depends whether you need fart Shakespeare or reliable good.

Emily:

Correct answers to your questions.

Michael:

Yeah, I think we’ve got a little ways to go before we get … In some sense, it’s like they’re the dancing bear. It’s not such a good dancer per se. It’s just amazing that it is dancing.

Emily:

Right. Yes. I think that’s about where we are now. All right. This is amazing. Thank you so much. I really appreciate your time.

Michael:

Yeah, thank you for having me. Big fan of your work.

Emily:

Thank you.

Speaker 2:

The first many weeks of nursing my son were extremely painful. My husband joked that his first word was going to be (beep) because I said it so often when he was an infant.

Speaker 3:

My daughter is currently 15 months.

Speaker 4:

My daughter is 14 months old.

Speaker 5:

And his first word was when he was around 20 months.

Speaker 6:

And she said her first word, eventually, when was 20 months old.

Speaker 3:

And the first word that she said, and has been saying repeatedly, has been doggy.

Speaker 8:

Her first word wasn’t mama or dada.

Speaker 9:

That word was not mama or dada.

Speaker 8:

But it was our cat’s name, which is Olive.

Speaker 9:

That word was Stan, and Stan is the dog. Not even her dog, my mother-in-law’s dog.

Speaker 4:

I could not find one of his shoes for the life of me. And then he brings me the shoe and he goes, “Shoe”. I looked at him and I was just like, “You can speak”.

Speaker 7:

My husband grew up in New York City. His first word was taxi. Loved shouting it as the taxis went by.

Speaker 3:

And so now every dog she sees, she yells out doggy. But that also means that every other animal she sees, she thinks is a dog also including seals at the aquarium.

Speaker 6:

When she sees a clock, she goes. But every time we try to get her to say, mama, she looks at me straight in the face, pauses, and then says, “Dada”.

Emily:

ParentData is produced by Tamar Avishai with support from the ParentData team and PRX. Also, special thanks to our house violinist, my daughter Penelope.

Penelope:

No problem, mom.

Emily:

If you have thoughts on this episode, please join the conversation on my Instagram @profemilyoster. And if you want to support the show, become a subscriber to the ParentData newsletter at parentdata.org, where I write weekly posts on everything to do with parents and data to help you make better, more informed parenting decisions. You can subscribe for free or sign up for a paid subscription, which comes with great benefits, including an ad-free version of this podcast, and full access to literally hundreds of my posts at parentdata.org. If you like what you hear, please leave the show a positive review on Apple Podcasts. It really helps people find out about us. Right, Penelope?