by Alexandra Petrulevich
Alma, Freja, Alice, Olivia, Elsa, and Noa, Valter, William, Lukas, Hugo – these are the five most popular female resp. male first names given to Swedish babies in 2024. Following trends in new-born namegiving has been a special onomastic branch in the Nordics for decades now, in Sweden at least from 1998 when Statistics Sweden started publishing name statistics (which they no longer do).[1] What can these names tell us about language and more importantly linguistic change?
Three-generation rule
Of course, following trends in frequency distributions of top names over time means we are already observing linguistic change in name stocks. These onomasticon fluctuations have of course to do with parents’ preferences for certain names, which in itself is a cultural habit susceptible to change. An established mechanism for this type of change is the so-called three-generation rule, i.e. it is the third generation’s, the babies’ great-grandparents’ names that are “recycled” and thus chosen for new-borns – rather than any of the names associated with the parent or grandparent generation.
Baby Names and Cultural and Linguistic Change
However, baby names exemplify more than just changes in personal name stocks in social sciences, more specifically, in statistics, and in biology. Baby names represent population-wide large authentic linguistic datasets that allow researchers to follow cultural incl. linguistic change in real time. This specific circumstance makes baby names ideal for data-driven modelling of such change. The main explanatory framework applied to baby names-based models by statisticians, mathematicians and biologists is the so-called neutral theory of evolution. This theory postulates two possible paths of evolutionary change, a deterministic path and a random or drift path. In the former case, evolution or cultural and linguistic change favours some specific e.g. novel variants, while in the latter drift or random selection underpins and explains such change. Several studies show that authentic baby names datasets exhibit a power law distribution of variants. See e.g. frequency distributions of 1000 most popular American (a) male and (b) female names for three decades of the twentieth century in Figure 1, and frequency distributions of Norwegian boy names with the threshold value 4, i.e. names with less than four bearers were not considered, 1880–2010 in Figure 2.


Interestingly, although the explanatory framework and the distribution features of underlying data is to a large degree identical in the surveyed studies, the resulting conclusions can be described as strikingly different. One of the cornerstone papers in the field (1) postulates drift as the main mechanism of cultural change. Ten years later, this conclusion is questioned by (2) that advocates for a deterministic explanation. Why is this so?
Data-driven Modelling Has a Data Issue
At least one most likely explanation of these incompatible conclusions has to do with the data the studies are based on, more precisely with the issue of data completeness. As (3) shows, rare variants have an important role in the overall distribution and modelling picture of baby names. This means that different results are acquired depending on whether rare variants are included into the modelling dataset or not. Both of the studies mentioned in previous section build their models on incomplete data. In (3), researchers see an anti-novelty bias, i.e. a deterministic picture, rather than random selection in the distribution in the complete baby name dataset from South Australia.
However, the data issue in studies of cultural and linguistic change seems to be even more substantial than data completeness alone. The surveyed studies do not seem to take any data quality or linguistic parameters into consideration. It is for instance impossible to understand if the name statistics taken from a third source represent so-called raw data or if they have been manipulated or enriched or clustered in any way by the responsible authority, e.g. social security services or tax agencies or the like. Given the importance of completeness for the modelling outcome, it is of course paramount to only use complete and carefully curated datasets. Another related issue concerns so called unique variants. What is a unique name or a unique variant? Does a unique spelling variant of a very common name constitute a unique variant? Will the results be different if several levels of linguistic resolution such as name formation and spelling are introduced?
New Project at Uppsala University
These are all important questions to answer before data-driven modelling of cultural and linguistic change based on baby names datasets can provide us with reliable results. These are also some of the questions the recently Uppsala University funded project Swedish Baby Names for Data-driven Modelling of Cultural Change will answer in its pilot study. One of the project’s innovations is the ambition to deep-dive into the onomastic and linguistic aspects of the underlying datasets. For the first time, onomastics and (historical) linguistics can be seen as the motor in a data-driven study of cultural and linguistic change. More on the results next time!
Key references
(1) Hahn MW, Bentley RA Drift as a mechanism for cultural change: an example from baby names. Proc. Biol. Soc. 270, S120–S123 (2003)
(2) Kessler DA, Maruvka YE, Ouren J, Shnerb NM. You name it – how memory and delay govern first name dynamics. PLoS ONE 7, e38790. doi:10.1371/journal.pone.0038790 (2012)
(3) O’Dwyer JP, Kandler A. Inferring processes of cultural transmission: the critical role of rare variants in distinguishing neutrality from novelty biases. Phil. Trans. R. Soc. B 372: 20160426. (2017) http://dx.doi.org/10.1098/rstb.2016.0426
[1] The Swedish Tax Agency took over name statistics from 2024.