CLDF as JSON

xrotwng · June 30, 2022, 3:50pm

I heard that some people around here like JSON (a lot) - while others are fans of tabular formats like CLDF. Now, with CLDF being based on CSVW there was always the promise of a canonical transformation of CLDF to JSON.

With version 3.0 this functionality made it into the csvw package - which is doing most of the heavy lifting for my pycldf package.

So, having installed csvw, you can now use the csvw2json command to convert any CLDF dataset to JSON. We can try that on the CLDF dataset for the Leipzig Glossing Rules. csvw2json expects a filename or URL of a CSVW metadata as argument, so

csvw2json https://raw.githubusercontent.com/cldf-datasets/lgr/main/cldf/Generic-metadata.json

will do the trick, i.e. print JSON to the screen that looks like

{
    "url": "https://raw.githubusercontent.com/cldf-datasets/lgr/main/cldf/examples.csv#row=3",
    "rownum": 2,
    "describes": [
        {
            "http://cldf.clld.org/v1.0/terms.rdf#id": "2",
            "http://cldf.clld.org/v1.0/terms.rdf#languageReference": "lezg1247",
            "http://cldf.clld.org/v1.0/terms.rdf#primaryText": "Gila aburun ferma hami\u0161alu\u01e7 g\u00fc\u01e7\u00fcna amuq\u2019da\u010d.",
            "http://cldf.clld.org/v1.0/terms.rdf#analyzedWord": [
                "Gila",
                "abur-u-n",
                "ferma",
                "hami\u0161alu\u01e7",
                "g\u00fc\u01e7\u00fcna",
                "amuq\u2019-da-\u010d."
            ],
            "http://cldf.clld.org/v1.0/terms.rdf#gloss": [
                "now",
                "they-OBL-GEN",
                "farm",
                "forever",
                "behind",
                "stay-FUT-NEG"
            ],
            "http://cldf.clld.org/v1.0/terms.rdf#translatedText": "Now their farm will not stay behind forever.",
            "http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference": "stan1293",
            "http://cldf.clld.org/v1.0/terms.rdf#source": [
                "Haspelmath1993[207]"
            ]
        }
    ]
},

for IGT examples, or

{
    "url": "https://raw.githubusercontent.com/cldf-datasets/lgr/main/cldf/languages.csv#row=4",
    "rownum": 3,
    "describes": [
        {
            "http://cldf.clld.org/v1.0/terms.rdf#id": "lezg1247",
            "http://cldf.clld.org/v1.0/terms.rdf#name": "Lezgian",
            "http://cldf.clld.org/v1.0/terms.rdf#glottocode": "lezg1247"
        }
    ]
},

for languages that examples are taken from.

While the keys in this JSON data might look somewhat unwieldy, they are actually very useful, because you can look them up in the CLDF ontology and thus unambiguously infer the semantics of the fields.

Try this out on any of the dataset from cldf-datasets or lexibank or dictionaria.