Hi everyone!
I was recently asked to give a lecture on “data management” for an introductory course on language description and documentation, and I thought this might be a great place to get some ideas and identify some best practices.
I realize that all of us are in different situations and are often documenting for different aims. I’d like to make this a space for sharing and learning the tools, workflows, and tricks that we use, ranging from the very basic to the more advanced
Some questions we might want to discuss are:
What does your project workflow look like?
What tools or processes do you use when moving your data from one stage to another?
What has worked really well for you?
What hasn’t worked well for you?
…but do feel free to stray from these signposts - there’s lots more we can discuss and learn
I’ll go first!
I’ve adapted @cbowern’s basic workflow, which I find works well for me:
-BEFORE SESSION: Plan the session (check equipment, compile goals)
-DURING SESSION: Conduct the session (monitor recording, take notes, ask questions and listen to the speaker)
-AFTER SESSION: File data (transfer audio / video from equipment to a new folder (bundle), assign all new bundles a unique identifying number, transfer all metadata from notebook to spreadsheet and add a bundle description); Send all new bundles to the archive; transcribe/translate data (set up ELAN project for bundle, transcribe material into working orthography); annotate data (export ELAN project to FLEx, gloss line-by-line, add notes where necessary, re-export project from FLEx to ELAN); send all annotate files to archive
-Begin again
My tools and processes are very analogue: questions written in thick notebooks, with different colour inks sometimes to transcribe i) speaker responses, ii) my initial notes and questions, and iii) later notes and questions. Metadata for speakers and bundles is recorded in a (now massive) Excel sheet. All of this is eventually archived with ELAR (I use Arbil for this). Moving data between ELAN and FLEx is relatively easy, as long as the parameters are properly set beforehand.
When I’m in the field, I try to be very regimented about my data: I try to bundle everything I collected during the day before I go to bed, and I make sure that the metadata is all recorded in the Excel sheet as well - this can take up to 3 hours an evening, depending on how many recordings I’ve made, but I find it helps to do this while everything is fresh in my mind.
I’m really not a fan of Arbil for making metadata files to be ingested in ELAR, but it’s a software I at least know how to use.
Now it’s your turn: share your own practices and tools, comment on mine, ask questions… So much experience around here - I’m really excited to learn!
I also use an adapted version of @cbowern’s workflow
-
BEFORE SESSION: Plan the session (check equipment, pack equipment compile goals (mini & stretch)
-
DURING SESSION: Conduct the session (set up video, monitor audio recording, take notes, ask questions and & listen)
-
AFTER SESSION: Do a “wrap up” of my session notes where I summarize what I’ve (think I’ve) got. File data (transfer audio / video from equipment to a new session folder (bundle), take photos of my field notes and add to session bundle, rename all the files (eg shimaguchi001), transfer all metadata from notebook to cmdi maker & arbil). If it was the first session with that particular person, I also scan and file the consent form into my “consent forms” folder. IDEALLY start transcribing right away, at least a little bit.
-
Back everything up- I have an external harddrive and I also use Dropbox.
-
Rinse & repeat
*I’m waiting for ELAR to finish migrating to archive, RIP LAT.
Thank you both so much for contributing to this topic! I hope we hear from more folks.
I have thinking about this topic but I wanted to refresh my memory on @cbowern’s advice on workflow, so I dug around and found this interesting flowchart, which I presume what you are referring to in part, @Andrew_Harvey and @msatokotsubi (Bowern 2015:48).
Bowern, Claire. 2015. Linguistic fieldwork: A practical guide. Springer.There’s a huge amount of experience and expertise embedded in that diagram, I think. Naturally being a programming nerd (more than I have really been a fieldworker over the past few years, alas), I find myself thinking about how this kind of workflow diagram interacts with specific pieces of software, and what the precise shape of the data going into and out of those applications is. I suspect that that level of detail is not appropriate for this topic, @Andrew_Harvey, since you’re teaching an introductory class, so I’ll hold my own gruesome details about database structure and so forth for other topics.
Ah! I never thought about including Arbil as part of my workflow!
I usually just add all of my metadata to a big Excel sheet, and only really start moving metadata to Arbil when I need to upload it all to the archive. I could see incorporating Arbil as a nice additional step though - especially since I already use it as part of the upload process.
…for the students I’ll be teaching in a couple weeks’ time, I’m not exactly sure what the best metadata solution is. They’ll most likely be familiar with Excel, but I don’t know if that’s the best solution here.
Also, most of them will not be in a position to upload their material to a language archive, but I want to encourage them to archive their material in some way all the same. I’m currently looking at Zenodo for this - does anyone have any experience using Zenodo as a free archiving platform? I’m also open to other ideas
What about just using CMDI Maker alone? Pretty simple as far as installation goes. Although of course there’s the question of whether it handles the data you need.
CMDI Maker does everything I need except adding keywords and topics, which I do in Arbil as the very last steps. If you’re interested, you can see my whole metadata workflow here on YouTube.
@Andrew_Harvey Let us know what you decide to go for! I’ve never used Zenodo… you could also do cmdi maker only or SayMoreX (SIL)
I think the workflow is also based in a particular type of fieldwork. This sort of thing is complicated but it works well when I’m in the field not doing much else. I found it much harder to follow in field methods classes or working remotely. I could imagine other workflows that would also work (e.g. this one is for a single person; a collective fieldwork project would look different). Thank you all for sharing your workflow variations!
But it would be interesting to think about what tools and structures we could create to make this simpler for beginners!
And to think about how our ideal remote fieldwork app would interact with the remote fieldwork workflow