Thursday, February 08, 2024 Michael Frank

New languages!

 We're pleased to announce a new release of the Wordbank database, bringing us to more than 100,000 CDI administrations! This release includes data from Japanese, Estonian, Saudi Arabic, and Catalan, as well as new data from Korean, Finnish, French, and French-English bilinguals. 

Tuesday, March 07, 2023 Michael Frank

A meta-analysis of outcomes for late talkers

The extreme variability in outcome of late talkers raises challenges both for theoretical work on the sources and development of individual differences in early language development, and for clinical/educational practice. Research on long-term outcomes is sparse but growing, though primarily focused on English. There is need for increasing the research database, and especially for integration of research across languages. We are undertaking a cross-linguistic meta-analysis of outcomes for late talkers. In addition to published and unpublished papers specifically focused on this issue, we believe there are projects which may have another primary focus, but which include an early (18-36 month) assessment of expressive vocabulary, and language and/or literacy outcomes at age 5 years or beyond. The target studies can include broadly representative samples, or late talkers specifically. If you are interested in the possibility of collaboration on this project, please contact either of us listed below. You may also find the PROSPERO registration of this project helpful in understanding our plans:, registration CRD42023394687.

Emma Hayiou-Thomas,

Philip Dale,

Sunday, January 29, 2023 Michael Frank

Educational materials

Mike just taught a course at the University of Amsterdam LOT winter school on "Language Learning: A Data-Driven Approach." Here's the course description:

In this course, we will examine early language learning through the lens of new data resources that facilitate quantitative studies. Our framework will be the "Standard Model" of Kachergis, Marchman, and Frank (2022) that links language input to processing and learning outcomes, and we will consider the strengths and weaknesses of this model for describing vocabulary learning as well as the learning of some morphology and syntax. Our hands-on approach will involve learning the use of CHILDES and childes-db for studying language input, Wordbank for studying language outcomes, and Peekbank for studying processing.

Materials from Days 1 and 2 focus on Wordbank and reproduce several patterns from the Wordbank Book! All of the code and slides are available here:
Thursday, August 11, 2022 Michael Frank

Wordbank updates!


We are very pleased to announce a major update to Wordbank, including some significant changes to the database structure. We are also adding more than 10 new languages and data from many thousands of children. 

We now include data from multilingual children and children with diagnosed developmental disorders (as well as functionality for identifying these children via the shiny apps and the wordbankr API). 

The prior version of the Wordbank database will remain up and available for queries via the wordbankr 0.3.0 API (or earlier) for a period of at least 6 months, but if you upgrade wordbankr you will begin accessing the new Wordbank data. 

Wordbank data will also be versioned going forward so that older snapshots of the database will be available via S3 snapshots (see Documentation page). 

Wednesday, April 07, 2021 Michael Frank

Wordbank Book!

We are very pleased to announce publication of the Wordbank Book, "Variability and Consistency in Early Language Learning," now available from MIT Press (2021). The book brings together many different ways of looking at data in the Wordbank database, in service of characterizing how children vary as well as shared patterns of learning. The book is also available free online at, and all of the code necessary to generate it from the Wordbank data can be found at

Wednesday, March 07, 2018 Michael Frank

More languages (and some naming changes)

We are pleased to announce several pieces of news:

First, the arrival of several more languages and datasets, including French (European), more Korean data, as well as more Hebrew data and Spanish (European) in the works.

Second, we have a new licensing standard such that some datasets can be licensed Creative Commons for Non-Commercial use. These datasets are marked on the contributors page.

Finally, because of the new data, we have some new naming conventions for languages. "English" is now "English (American)"; this convention will generally be followed as "Language (Country/Region)." These are breaking changes unfortunately, we apologize for the inconvenience and are working on past database images available for purposes of reproducibility.
Wednesday, July 26, 2017 Michael Frank

New frontpage and languages

Since our last update, we've reorganized the frontpage and revamped the contributors and citation policy. We've also added Latvian, Slovak, and Korean data!
Friday, September 30, 2016 Michael Frank

New languages

We're pleased to announce a number of new languages in Wordbank, bringing us up above 20 languages. The new data are from ASL, Cypriot Greek, Kiswahili and Kirigama. We also have added British English short-form data from the Twins Early Development Study (TEDS). Take a look at these new additions!
Wednesday, April 20, 2016 Michael Frank

New changes

We now have a FAQ!

Also, check out the semantic networks report and the new scoring tool.

Stay tuned for some new languages in the next few months.
Saturday, March 19, 2016 Michael Frank

Paper in press

We are pleased to announce that our paper on the wordbank site is in press at Journal of Child Language:

Frank, M. C., Braginsky, M., Yurovsky, D., Marchman, V. A. (in press). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language. 
Monday, November 02, 2015 Dan Yurovsky

wordbankr on CRAN

Our R package for accessing wordbank is now on CRAN! You can now install and use wordbankr even more easily.
Wednesday, October 28, 2015 Michael Frank


We're pleased to announce that all derivatives from Wordbank – including downloaded data, tables, and graphs – are licensed CC-BY 4.0. To attribute derivatives, please cite:

Frank, M. C., Braginsky, M., Yurovsky, D., Marchman, V. A. (under revision). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language.

Going forward, we will update the preferred citation for the site on the Publications page.
Wednesday, September 02, 2015 Michael Frank

Paper under review

We are happy to announce that we have a new paper on Wordbank and the wordbankr package now under review. For now, this is the appropriate citation for the wordbank site:

Frank, M. C., Braginsky, M., Yurovsky, D., Marchman, V. A. (under review). Wordbank: An open repository for developmental vocabulary data.
Monday, August 03, 2015 Mika Braginsky

wordbankr package

We've been working on improving how you can access the Wordbank data from R, and the result is the wordbankr package. It's still a work in progress, so a lot of it may change, but using it you can now pull in data by-administration, by-item, and administration-by-item, with minimal hassle. Check out the package vignette for a tutorial!
Friday, May 22, 2015 Michael Frank

Reorganized reports

We've just pushed an update to our analysis pages so that they include much more content. We not only have interactive visualizations (like the norms and items tools), we also have a number of non-interactive reports that we've put together on topics like gender differences, SES effects, etc. You can also see analyses from our CogSci paper this year on grammar-lexicon relationships. Let us know if you have comments (or analyses you'd like to add)!
Thursday, April 16, 2015 Mika Braginsky

CLEX data

We're excited to announce we've imported all of the CDI norming data from CLEX. Thank you to Rune Nørgaard Jørgensen! Wordbank now has almost 40,000 CDI administrations, across 10 different languages (Croatian, Danish, English, German, Italian, Norwegian, Russian, Spanish, Swedish, and Turkish).
Sunday, March 15, 2015 Mika Braginsky

Data downloading

Check out our new reports that let you explore our data, filter and sort it to suit your needs, and download it as a csv file. There's one report for by-administration data, with overall vocabulary sizes and demographic information, and one report for the full administration-by-item data. Also, all of the analysis reports now let you download the plot and the data being plotted!
Thursday, March 05, 2015 Mika Braginsky

Cross-linguistic data

We've spent the last few months overhauling Wordbank's data import to support adding in many datasets in different languages. Yesterday these updates went live, and Wordbank now has data in English, Spanish, Danish, and Norwegian, including both Words & Sentences and Words & Gestures. It's a total of almost 30,000 CDI administrations! Check out our reports to play around with all of these datasets, or look at this tutorial on accessing Wordbank from R if you're interested in running your own analyses.
Tuesday, November 18, 2014 Michael Frank

Welcome to Wordbank's blog!

This is the place where we will be announcing new features and datasets for the wordbank site ( Stay tuned!