An annotated bibliography of surname mapping. Research by James Cheshire and his collaborators underlies this ardagraphic study. Dr. Cheshire has a blog in addition to his university site linked above.
Oliver O’Brien, Suprageography
O’Brien’s data visualization blog post got this project started. The public-access web portal provided the qualitative data for classification of family names. If you don’t have an English name, the latter site hosts a world-wide version (at a much lower resolution).
Cheshire, James A., Paul A. Longley, and Alex D. Singleton. “The surname regions of Great Britain.” Journal of Maps 6.1 (2010): 401-409.
The map of surname regions in Great Britain shows that distributions of names track well with the administrative regions. The map itself, available for download at the link (17 MB) is gorgeous.
Paul A. Longley, James A. Cheshire, Pablo Mateos, “Creating a regional geography of Britain through the spatial analysis of surnames”, Geoforum, 42, 4, July 2011, Pages 506-516.
Mapping names in the 21st century is valid for this practice because Longley, Cheshire, and Mateos’s techniques make it possible to identify “combinations of location specific surnames that date back 700 or more years”. Figure 5 shows that the “Lasker distances” between Census Area Statistics Wards in a region cluster into a tight grouping, and each region is unlike other regions in England. In fact, some geographic resemblances are visible through the multidimensional clustering: wards in the West Midlands look like they feel a gravitational pull from Wales, as names originating in the Welsh language diffuse across the English border.
The Lasker Distance is elegantly simple. If we write the fraction of people in a small area i who have the name n as p(i,n), then the distance between areas i and j is -ln(Σ p(i,n)×p(j,n)) where the summation is over all names. Names that don’t exist in one of the areas don’t contribute to the sum.
Once the distance in “name-space” between population points is established, the next step is to cluster the points in that space, and set the cluster sizes so that the result is interpretable in geographic terms. The method used here is “k-means” clustering, and I hope I’m not being uncharitable if I describe it as “try every possibility and keep those that work”. That’s unfair, of course — independent consistency checks are applied at each step; the choice isn’t arbitrary.
Cheshire, James, Pablo Mateos, and Paul A. Longley. “Delineating Europe’s cultural regions: Population structure and surname clustering.” Human Biology 83.5 (2011): 573-598.
Figure 7 in this paper shows the relationship between physical distance and Lasker distance for the countries they studied in Europe. Culturally homogenous places like Poland and Luxembourg show a tight cluster of points, lying on a line that’s almost horizontal. That is, you find the same names, no matter where in the country you go.
Countries unified by language, such as France, Italy, and Germany, show a slanted line (on a log-log plot), with a moderate upward slope. The further apart two villages are, the more likely you are to find different names in them. (France has a small Alsatian tail.) Norway and Denmark are fascinating exceptions: the line slopes downward! I’m just guessing here, but it could be due to the fact that until recently you didn’t get from one place to another by land. By sea, travel times depend on wind and currents as well, so genetics and patronymics can have a more complicated relationship with distance. (There might be a follow-on project, there, if I could only find family names in the Sagas.)
Spain has two distinct parts: One for the mainland and and one for the islands. They’re identical with respect to names. The mainland isn’t a long, thin shape, it’s an incoherent blob, caused by mixed Catalan, Spanish, Arabic, and possibly a Basque scattering off to the side.
The United Kingdom is a dense horizontal sprawl of English, with oddly-shaped protuberances of Welsh, Scottish, and Irish that make drawing a best-fit line through the points an exercise in graphical uniformity, not statistical rigor.
are always interested in technical
details when the main question is
whether the stuff is
literature or not