☕ Coffee Hour for Metadata (Monday Feb 8th, 2021)

msatokotsubi · January 29, 2021, 7:15pm

Hi All,

The next docling coffee hour will be next week on Monday, Feb 8 at 9:00 PST (12:00 EST).

The focused theme will be “metadata”, so we can all discuss our favorite metadata tools and workflows (any Arbil hacks, anyone??). If your metadata workflow is ???!, that is okay! We can all learn from each other and also I find it fun/interesting to see what works for others

Here is the Gather link:

https://gather.town/app/klWlH33a7bPX3oT0/uglydoclingoct272020 (open the link in Chrome or Firefox Desktop).

Looking forward to seeing you all there!

rgriscom · February 8, 2021, 7:51am

Thanks, Marti, for this new coffee hour topic! I just realized that this is actually during the same time as the writing group today I think I probably need to stick with the writing group because I’ve got a couple of deadlines this week, but I’ve also got lots to say about metadata. Here are some quick comments:

Here is the latest metadata editing tool being promoted by ELAR: Lameta

IMO Lameta is much easier to use than Arbil and has some added features which make it more compelling as a general project management app rather than simply the app you use to prepare your data for archiving. Neither Arbil nor Lameta have the ability to easily import metadata from a spreadsheet/CSV format, though, which is what causes so many linguists to waste so many hours re-entering metadata when it comes time to archive. If you use Lameta from the beginning, then this is not a problem, so if you don’t want to learn how to program then I would suggest switching to using Lameta for metadata creation as soon as possible in order to minimize the amount of metadata you have to manually re-create.

In terms of spreadsheet metadata, which is how many linguists store their metadata, it is a good idea to follow the principles of tidy data. This makes it possible to easily manipulate the data later and automatically convert it to a format used by other software such as Lameta.

Mobile metadata entry systems such as ODK or KoBoToolbox allow you to easily work with teams of data collectors in areas without an internet connection and track data collection progress remotely. I’m hoping to host a workshop sometime this year on how to setup a KoBoToolbox metadata system, so if anyone is interested please let me know. These platforms can export metadata in CSV/XML. Here is a GitHub repo I made with some form templates you can use with KoBoToolbox/ODK, and here is a script I made to convert the CSV output of KoBoToolbox to the XML format used by Lameta.

Also, some advice after having gone through multiple projects’ worth of metadata this past week: the two primary categories of metadata produced by linguists when creating new data in a field-like setting are:

Session/recording metadata
Participant metadata

This is reflected e.g. in Lameta’s two tabs “Sessions” and “People”. If you are working with spreadsheets, then that means that you will often have two different spreadsheets for these two categories of information. This allows you to enter the participant info only once, rather than repeating it every time they participate in the creation of a resource. From a data perspective this means that we are actually creating something like a very basic relational database, whereby the session data is linked to the participant metadata by the name of the participant. For this reason, it is very important that the participant names in the session metadata are 100% the same as the participant names used in the participant metadata, especially if you want to automate the conversion from one format to another rather than enter all of the data manually a second time.

pathall · February 8, 2021, 3:41pm

Sorry for the overlap guys, I should be helping with meta-scheduling!

There is a calendar plugin for this forum:

I think it could help us to manage our meetings more simply. But I’m going to have to learn out to install plugins, which is a bit of a project — I’m going to consult with an expert to get some help on this. Thank you so much for your patience and continued contributions, @msatokotsubi and @rgriscom!

Andrew_Harvey · February 8, 2021, 4:58pm

Ah! I didn’t realize this was a clash either. Sorry Marti - I’m going to have to stick with the writing as well. Good luck, and do let me know what comes up!

msatokotsubi · February 8, 2021, 4:59pm

Ah! Oh no I also didn’t realize it was at the same time! Apologies!

msatokotsubi · February 9, 2021, 12:16am

Thanks to everyone who joined today! Some links from things we discussed:

Arbil metadata tool: Arbil information, manuals & download - Arbil - The Language Archive Forums

saymore (SIL) metadata tool: SayMore

cmdi maker metadata tool: http://cmdi-maker.uni-koeln.de/

OLAC: Open Language Archives Community

Archiving for the Future: Simple Steps for Archiving Language Documentation Collections: https://archivingforthefuture.teachable.com/

inigmendoza · February 26, 2021, 3:54pm

Great! See yall there!

pathall · February 26, 2021, 5:42pm

Looking forward to it! Wait, when is?

inigmendoza · February 26, 2021, 6:16pm

I am sorry LOL! I thought we were in January. Well, I hope to see yall probably the next coffee Hour.