Writing Out Loud: Abkhaz Converbs and Getting Data to Do Stuff

SarahDopierala · January 4, 2022, 11:17am

Thoughts for January 4, 2022

Today I remembered that every single action in trying to put together a feature spreadsheet brings more questions. During the first year of my PhD (the late 2021), developing the feature spreadsheet was something I always had in the back of my mind but hadn’t started yet because, I think, it was exciting but overwhelming. This year, though, it still is a big undertaking, but I’m feeling more excited than overwhelmed. (I’m feeling a little bit behind in my goals though since I feel like I should have plunged right into it last year, but I’m not indulging the negativity).

Anyway, today I experimented with putting a non-NWC language example (of a converb) into my spreadsheet. I also tried to add a few more “features” that I have on a Big List of Stuff to Consider When Analyzing Converbs. This brought a ton of practical and theoretical questions which I have on, basically, a digital piece of paper. (Surprisingly useful though).

Some of thoughts for today:

I want the feature spreadsheet to be useful for studying converbs cross-linguistically, but in terms of the scope of my project, it might be more practical to make as detailed a spreadsheet as possible for my own purposes (examining Abkhaz converbs from written material and, eventually, data collected during fieldwork) because there were a bunch of questions that came up trying to put an example outside this scope, like “What do with different transcription orthographies (including IPA but also idiosyncratic ones)?”, “What do with people’s glossing conventions?”, “What do when people use the same terms with slightly different meanings in different works?” Not that I won’t come across this for sources on Abkhaz, but like, maybe one area at a time is good xD

This ties into the overarching question

What is my base – the thing I’m looking at? What data do I want to collect and what do I want to learn from me? This is something I’m still working out but at least I know the question is there.

And then there’s all these practical things that have to do with the spreadsheet design:

If I use more than one “feature” in the spreadsheet (i.e. One spreadsheet – all things), what naming conventions should I use with the values? So far, I’m doing “feature 1” then “value1_1” (so value 1 of feature 1), “value1_2” (value 2 of feature 1). I’ve done this because I’m assuming I want every column to have a unique name because I have a vague sense this is important for the code to be able to distinguish the second value of feature 1 from the second value of feature 4.
If I am trying to break features down into values that can have a unique, discrete answer (like “yes” or “affix” when the two possibilities are “affix” versus “no affix” or something, how do I distinguish between feature values that are possible/impossible versus features that are manditory/optional. For example, today I worked with the feature “coreference.” In some situations, Same Subject coreference might be optional while in other situations it may be impossible which looks similar to optional but is not the same.

it would be really neat if I could make a program go through a set of options like on those maps titled “Are You a Cat?” and then it goes to the first question that could be “You purposely knock objects off tables” and then depending on whether the answer is “yes” or “no” it takes you to another question or set of questions. I think this would be a useful way for a program to go through something like “Is SS coreference possible? Yes. Is it manditory? No.” Or something along those lines.

I’ll finish off this already long post with converb specific things:

Certain features, like coreference, assume the converb has its own clause. However, morphological converbs are also found in complex predicates and don’t necessarily have their own clause. How do I deal with the different possible functions of converbs in the dataset?
How am I going to define “converb.” Good question.

I just want to mention that I watched Matt Carroll’s video series “Computational Methods for Linguistic Typology” which inspired a lot of what I thought about today. You can find the series on YouTube here: Computational Methods for Linguistic Typology - YouTube. This series also has it’s own thread on the forum: Computational Methods for Linguistic Typology - CoEDL 2022 Masterclass - #4 by pathall