What if we created a repo in our Github organization of sample documentary data that is consistently structured (and thus comparable) and available for reuse?
Some critieria:
- The data have to be open-access and suitable for reuse
- They don’t need to be formatted in any particular way (we can convert them if it’s not too much work — digital something is better than a PDF or a .jpg that has to be re-transcribed!)
- They shouldn’t be gigantic — any or all of the following would suffice:
- a few short-to-medium length texts
- a wordlist or small dictionary
- some metadata
- grammatical abbreviations
- phonological inventories
- Citation required, of course
The content could come from existing sources (yesterday’s post on the CoCoON Archive, for example, might be a starting point). The point would be not to create an archive per se, but to create some data for testing various kinds of user interfaces in documentation.
I have some things I have done that I could contribute, but I need to filter through them and make sure all the citation is in order.
(There’s no data there as yet, updates soon!)