Custom typological maps with R – Kilu von Prince

A nice tutorial on creating typological maps in R by Kilu von Prince.

A couple more links that might be of interest, via the article:

6 Likes

Nice! Thanks for this.
One thing that came up recently with mapping for the Oxford Australian volume was the output of many off the shelf mapping tools that linguists used either don’t have the resolution for printed volumes, or have default colour schemes that don’t transfer well to black and white/greyscale printing. Moreover, they redefine variables in a way that makes using the R options for symbols not straightforward (e.g. they introduce a default sort ordering as part of the mapping function that can’t be overridden, or if you use symbols instead of their colour defaults, you can no longer also sort by frequency). We ended up redrawing a bunch of these maps in QGIS.

1 Like

Thanks for sharing! Last map I made I used this Python package (which I can barely remember at this point, except that it didn’t take too long to figure out): LingTypology: Documentation — LingTypology 0.8.6 documentation

I haven’t tried this yet, but @fmatter has a more complex mapping code that links geography with language family trees: GitHub - fmatter/lingtreemaps: Plot data on linguistic trees and maps.

3 Likes

Disclaimer about lingtreemaps: whether you can make a good-looking map really depends on the shape of the tree and the geographic distribution of the languages in your sample. There’s a couple more examples in the documentation (lingtreemaps.readthedocs.io/). It is best suited for smaller families, obviously.

If you have trouble using it, best way is to open a github issue (alternatively, contact me in private).

If you use it to make a cool map, please let me know and I’ll add it to the example gallery :slight_smile:

3 Likes

Nice tutorial. It might be worth mentioning that the toy dataset used in it is very close to a “metadata free” CLDF StructureDataset. In fact, starting with just

Language,Word order
Movima,0
Arapaho,OV
Alta,OV
Savosavo,OV
Teop,VO
Sumi,VO
Yali,OV
Beja,VO
Vera'a,OV
Cabecar,VO

we could add a column Parameter_ID

csvstack -n Parameter_ID -g Word_order typology.csv

and pull in glottocodes from glottolog-cldf:

csvstack -n Parameter_ID -g Word_order typology.csv | csvjoin -c Language,Name - glottolog-cldf/cldf/languages.csv | csvcut -c"Language,Word order,ID,Parameter_ID"
Language,Word order,ID,Parameter_ID
Movima,0,movi1243,Word_order
Arapaho,OV,arap1274,Word_order
Savosavo,OV,savo1255,Word_order
Teop,VO,teop1238,Word_order
Beja,VO,beja1238,Word_order

then do a bit of renaming:

ID,Value,Language_ID,Parameter_ID
Movima,0,movi1243,Word_order
Arapaho,OV,arap1274,Word_order

save to a file values.csv et voilà:

$ cldf validate values.csv 
WARNING values.csv:7:1 ID: invalid lexical value for string: Vera'a

(almost).

After fixing this, we could plot a very similar map using cldfviz.map:

cldfbench cldfviz.map values.csv --parameters Word_order --glottolog glottolog --markersize  15 --output map.png --format png

1 Like

Regarding the R packages lingtypR and lingtypology: I understand the desire to make the user’s life easy by bundling data with the packages. (In fact, I’ve been asked to do that multiple times for a couple of packages I maintain.) But I think it is counterproductive.

  • It separates users of datasets from maintainers of datasets (if you use Glottolog via lingtypR you are less likely to report problems with the data to Glottolog).
  • It hides the fact that many datasets have multiple versions (and makes it difficult to use different versions of data and package).

From my perspective, CLDF could be a big part of the solution to this issue:

To be honest, I’m a bit puzzled that there is not more uptake of this idea :slight_smile: Ideally, it shouldn’t be the case that each update of a dataset must trigger updates of packages using the data (e.g. updat Glottolog to 4/7e · ropensci/lingtypology@7777c73 · GitHub).

2 Likes

I had some fun today playing with the Python package that @joeylovestrand suggested! I was able to make some pretty informative graphs detailing the endangerment levels, family affliations, and number of mentions of about 59 languages spoken at a single high school in Daru, Papua New Guinea.

3 Likes

Nice maps! I’ve been playing around with this again myself, trying to figure out the best way to show both sub-classifications of a family and their value for a particular feature.

I started with making each combination a separate category and manually coding for colors that differ in saturation. This looks great but it’s not very clear for many people.

So I’ve also tried using overlays of black and white dots on color, which I don’t think is as pretty, but may be easier to read.

That’s why I added “shapes as pseudo-colormaps” in cldfviz.

So, a full example with cldfviz would look as follows. The main part is creating a suitable CLDF dataset - which (as in @joeylovestrand 's example) codes subgroup as second parameter:

import collections

from csvw.dsv import UnicodeWriter
from pycldf import Dataset

# Glottolog as CLDF dataset from https://doi.org/10.5281/zenodo.7398887
glottolog = Dataset.from_metadata('../glottolog/glottolog-cldf/cldf/cldf-metadata.json')
# World Atlas of Classifier Languages from https://doi.org/10.5281/zenodo.7322688
wacl = Dataset.from_metadata('../wacl/wacl-cldf/cldf/StructureDataset-metadata.json')

# We collect Chadic languages grouped into the 4 chadic subgroups:
chadic_by_subfamily = collections.defaultdict(list)
for lang in glottolog['ValueTable']:
    if lang['Parameter_ID'] == 'classification' and 'chad1250' in lang['Value']:
        # A Chadic languoid.
        # The Value column is coded as specified in `parameters.csv`:
        # > Path from root of family to the languoid as slash-separated list of Glottocodes.
        lineage = lang['Value'].split('/')
        findex = lineage.index('chad1250')
        if len(lineage) > findex + 1:
            chadic_by_subfamily[lineage[findex + 1]].append(lang['Language_ID'])

wacl_values = {v['Language_ID']: v['Value'] for v in wacl['ValueTable']}

i = 0
# We write a "metadata free" CLDF dataset (see https://github.com/cldf/cldf#metadata-free-conformance)
with UnicodeWriter('values.csv') as w:
    w.writerow(['ID', 'Language_ID', 'Parameter_ID', 'Value'])
    for sf, langs in chadic_by_subfamily.items():
        for gc in langs:
            if gc in wacl_values:  # A Chadic language in WACL.
                i += 1
                w.writerow([str(i), gc, 'subgroup', sf])
                i += 1
                w.writerow([str(i), gc, 'wacl', wacl_values[gc]])

(While this may look overly lengthy, I hope it becomes clear that being able to access data in different datasets uniformly helps a lot.)

Then we can use shapes to display different feature values and colors for subgroups:

cldfbench cldfviz.map values.csv --parameters wacl,subgroup --colormaps '{"TRUE": "circle", "FALSE": "triangle_up"},tol' --glottolog ~/projects/glottolog/glottolog --markersize 15

This will create an HTML map (see
map.html (41.7 KB)
) looking roughly like the PNG map created with the added option --format png:

2 Likes

This is great, thanks!

As far as I could tell, Lingtypology (for Python) only allowed shapes in black & white.

My database is in a CLDF-ish format, so it may be worth polishing it up and figuring out cldfviz to get this flexibility with the maps

If you go for a full CLDF dataset (i.e. including language metadata, etc.), you’d also have control over the coordinates and language labels. In the example above, coordinates are taken from Glottolog (as is done in lingtypology, I think).

1 Like

Not really a “cool map”, but a way to use lingtreemaps if you already have CLDF data is provided now in cldfviz.treemap.

I’ve tried to recreate your minimal example of cldfviz.map but am getting a vague syntax error. I can only assume it’s because I’m not linking to the Glottolog data correctly, but have tried a few ways and not been successful

(btw seems to be a type in the title here)

Could you send the full error output my way? I just tried in a fresh virtualenv:

pip install cldfviz
echo "ID,Language_ID,Parameter_ID,Value
1,stan1295,romance,false
2,stan1290,romance,true
3,ital1282,romance,true" > values.csv
cldfbench cldfviz.map values.csv --parameters romance --glottolog-cldf https://raw.githubusercontent.com/glottolog/glottolog-cldf/v4.7/cldf/cldf-metadata.json

and all went well.

@xrotwng Can I use geojson shapes in these maps? I have some JSON shapes for legal jurisdictions that I am trying to match with language activities. I’m wondering how useful it would be to use CLLD tools to build this. I do want interactive features. I’ve only ever seen these python maps and R maps which get published by linguists to have point based cartographic overlays. I want boxes or shapes. My preference is to use geojson.

Any thoughts or pointers?

The HTML maps created by cldfviz.map can easily be modified (by hand or programmatically) to add GeoJSON layers. E.g. to add EcoRegions, as GeoJSON overlay, you’d

  • add var ecoregions = at the beginning of your GeoJSON file
  • load the layer in the HTML map. Here’s the diff showing how to change the HTML resulting from cldfviz.map:
--- map_orig.html	2023-02-04 09:18:20.409048859 +0100
+++ map.html	2023-02-04 09:51:33.844685145 +0100
@@ -14,6 +14,7 @@
     <script src='https://api.mapbox.com/mapbox.js/plugins/leaflet-fullscreen/v1.0.1/Leaflet.fullscreen.min.js'></script>
     <link href='https://api.mapbox.com/mapbox.js/plugins/leaflet-fullscreen/v1.0.1/leaflet.fullscreen.css'
           rel='stylesheet'/>
+    <script src='ecoregions.json'></script>
     <style>
         body {
             font-family: Verdana, Arial, Helvetica, sans-serif;
@@ -120,6 +121,29 @@
             markers[i].openTooltip();
         }
     }
+    L.geoJSON(
+        ecoregions["features"],
+        {
+            style: function(feature) {
+                switch (feature.properties.BIOME | 0) {
+                    case 1: return {fillColor: '#008001', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 2: return {fillColor: '#557715', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 3: return {fillColor: '#ffffff'};
+                    case 4: return {fillColor: '#ffffff', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 5: return {fillColor: '#ffffff', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 6: return {fillColor: '#ffffff', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 7: return {fillColor: '#98ff66', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 8: return {fillColor: '#ffffff', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 9: return {fillColor: '#0265fe', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 10: return {fillColor: '#cdffcc', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 11: return {fillColor: '#ffffff', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 12: return {fillColor: '#cc9900', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 13: return {fillColor: '#feff99', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                    case 14: return {fillColor: '#870083', weight: 1, opacity: 0.8, fillOpacity: 0.7};
+                }
+            }
+        }
+    ).addTo(map);
 </script>
 </body>
 </html>

resulting in something like

1 Like

Since the result looks so nice, I made a note to add such functionality to cldfviz.map Allow specification of custom GeoJSON overlays for HTML maps · Issue #45 · cldf/cldfviz · GitHub :slight_smile: