Makes sense to me. I have been experimenting with a some more extreme approach: put the metadata in the data. So for a lexicon structured I’d do:
{
"metadata": {
"title": "Education in Jaro",
"language": "Hiligaynon",
"source": "https://www.youtube.com/watch?v=cUqMWG4QJMk",
"media": "education_in_jaro.webm",
"fileName": "education_in_jaro-text.json",
"lastModified": "2019-07-17T16:01:06.046Z",
"notes": [
"transcribed with Joshua De Leon as part of the 2014 Fieldmethods class at UCSB, instructor Marianne Mithun.",
"original YouTube title ‘School Memories’"
],
"speakers": [
"Juan Lee"
],
"linguists": [
"Patrick Hall",
"Joshua De Leon"
],
"links": [
{
"type": "audio",
"file": "education_in_jaro.wav"
},
{
"type": "notes",
"url": "http://localhost/Languages/hiligaynon/ucsb-fieldmethods/Notes/hil111_2013-02-25_JDL_PH_IlonggoBoyEducationInJaro.notes.txt"
}
]
},
"sentences": [
{
"transcription": "Hello, akó si Juan Lee.",
"translation": "Hello, I’m Juan Lee.",
"words": [
{
"form": "hello",
"gloss": "hello",
"lang": "en"
},
{
"form": "akó",
"gloss": "1S.ABS"
},
{
"form": "si",
"gloss": "PERS"
},
{
"form": "Juan",
"gloss": "Juan",
"tags": [
"name"
]
},
{
"form": "Lee",
"gloss": "Lee",
"tags": [
"name"
]
}
],
"tags": [],
"metadata": {
"links": [
{
"type": "timestamp",
"start": 8.58,
"end": 9.65
}
]
},
"note": ""
},
{
"transcription": "matopic na akó subóng",
"translation": "I’ll start the topic now",
"words": [
{
"form": "ma-topic",
"gloss": "IRR-topic",
"tags": [
"english"
],
"metadata": {
"wordClass": "verb"
}
},
{
"form": "na",
"gloss": "already"
},
{
"form": "akó",
"gloss": "1S.ABS"
},
{
"form": "subóng",
"gloss": "now"
}
],
"tags": [],
"metadata": {
"links": [
{
"type": "timestamp",
"start": 9.704,
"end": 10.714
}
]
},
"note": "As if he's going to start a new topic; Of ma-, J says: “Most of the time you use it before an action verb: the process of changing the topic is itself a verb: the process of changing the"
},
{
"transcription": "parte sa mga eskwélahan.",
"translation": "about the schools",
"words": [
{
"form": "parte",
"gloss": "part",
"tags": [
"spanish"
]
},
{
"form": "sa",
"gloss": "to"
},
{
"form": "mga",
"gloss": "PL"
},
{
"form": "eskwélahan",
"gloss": "school",
"tags": [
"spanish",
"spanish:escuela"
]
}
],
"tags": [],
"metadata": {
"links": [
{
"type": "timestamp",
"start": 10.753,
"end": 11.818
}
]
},
"note": ""
}
]
}
I have found that keeping metadata in the data file like that prevents separation pretty effectively. It does have the problem that there is no (default) UI enforcing the same fields across objects — even a spreadsheet interface does that pretty effectively. So doing this in practice at scale would require either some discipline or else a custom interface.
To be honest, I don’t even worry about consistency from one file to the next. The crucial fields (language, speaker, title, etc) are always going to get included, and everything else is useful or at least informative down the road.