I have a question that arose from reading this article:
Thieberger, N. 2004. Documentation in practice: Developing a linked media corpus of South Efate. Language documentation and description 2. 169–178.
It’s a bit old, and we don’t usually digitize from tapes any more.
Nonetheless, I did not realize that there was such a thing as a “variable bit rate encoding” for MP3s that could affect the accuracy of timestamps:
On my return from fieldwork I began digitising my analogue tapes using the built-in soundcard of a desktop computer. This was a mistake! I ended up with digital audio files that I then used to align with the transcript. However, these were not good quality audio files because the computer’s soundcard is simply not adequate to the task. Thus, when the opportunity arose to have the analogue tapes digitised at a higher, and archival, resolution it resulted in my having two versions of the digital data. These two digital versions of the same tape did not correspond in length due both to stretching of the audio tape, and to being played on different cassette players, with slightly different playback speeds. There was no simple correlation between the timecodes in the old and the archival version. While I linked all subsequent transcripts to the archival version of the audio file, due to the time constraints of dissertation writing I have kept the non-archival versions for presentation of the thesis data. Archival versionshave been lodged with PARADISEC.
The crucial lesson from this experience is to digitise field tapes at the best (archival) resolution possible and then use those files (or a down-sampled version, such as MP3 if the original is too large), as the basis for linking to transcripts. To produce the best quality digitisation, it is recommended to use external soundcards that avoid computer noise, that is, if you don’t have a friendly digitisation project at a campus near you. It is also worth keeping up with the technology to find out about new methods and media for doing this kind of work.
[footnote 10] MP3 files can be indexed by timecodes as long as the mp3 files are not encoded with a variable bitrate. Note that MP3 is absolutely not a suitable format for recording or archiving.
Does anyone know if this is still a thing? Is there a way to determine whether a given MP3 file is of this type? We’re always warned not to use MP3s, and I use WAVs, but the idea of corrupted timestamps really compounds the sense of danger!
I’m not sure if I get the passage’s reasoning fully–if I’m understanding right, it’s saying that if an MP3 is encoded with VBR, a timecode like 0:10 (10 seconds) will not be reliable. This doesn’t seem true: if you use a media player to play a VBR MP3 and just let it run, it’ll hit the same point at 0:10 every time. There are other reasons to dislike MP3 as an archival format (for one thing, it mangles higher frequencies that might be critical for phonetic analysis) but as far as I know they present no difficulty for timestamp indexing, whether they’re encoded with a constant or variable bitrate.
Is there a way to determine whether a given MP3 file is of this type?
MediaInfo can do this. Using its command-line interface, for instance, gives Bit rate mode: Variable:
» mediainfo 01.\ Who\ Loves\ The\ Sun.mp3
General
Complete name : 01. Who Loves The Sun.mp3
Format : MPEG Audio
File size : 4.97 MiB
Duration : 2 min 46 s
Overall bit rate mode : Variable
Overall bit rate : 248 kb/s
Album replay gain : -0.07 dB
Album replay gain peak : 0.937244
Album : Loaded
Album/Performer : The Velvet Underground
Track name : Who Loves The Sun
Track name/Position : 01
Performer : The Velvet Underground
Publisher : Warner Strat. Mkt.
Genre : Alternative
Recorded date : 1970
Writing library : LAME3.97b
Cover : Yes
Cover type : Cover (front)
Cover MIME : image/jpeg
REPLAYGAIN_ALBUM_GAIN : -0.07 dB
REPLAYGAIN_ALBUM_PEAK : 0.937244
Audio
Format : MPEG Audio
Format version : Version 1
Format profile : Layer 3
Format settings : Joint stereo / MS Stereo
Duration : 2 min 46 s
Bit rate mode : Variable
Bit rate : 248 kb/s
Minimum bit rate : 192 kb/s
Channel(s) : 2 channels
Sampling rate : 44.1 kHz
Frame rate : 38.281 FPS (1152 SPF)
Compression mode : Lossy
Replay gain : 0.62 dB
Replay gain peak : 0.647723
Stream size : 4.94 MiB (99%)
Writing library : LAME3.97b
Encoding settings : -m j -V 0 -q 2 -lowpass 19.5 --vbr-new -b 192
Anyway, I think the general advice (to archive in a lossless format like WAV and make lossy copies for non-archival purposes) is still right.
Very interesting, thanks. It occurred to me that this info must be in the mp3 as text somewhere, presumably MediaInfo reads that.
This post explores how to get at the id3 info with Javascript, but It’s a bit old and I don’t think the jDataView poly fill is necessary anymore. Gonna see if I can get it to work without it, just for fun.
But yeah, this whole question is irrelevant since using a compressed format like mp3 is a bad idea anyway. I do wonder if the Web Audio API can handle seeking in one of these files correctly.
Many moons later, I asked… ehem… the oracle…
A variable bit rate (VBR) in an MP3 file refers to a method of encoding the audio data in which the bit rate (i.e. the amount of data used to represent the audio per unit of time) is allowed to vary throughout the file. This is in contrast to a constant bit rate (CBR) encoding, in which the bit rate is fixed throughout the file. VBR encodings can result in smaller file sizes and better quality audio, as more data can be allocated to the parts of the audio that require it, while less data is allocated to the parts that don’t.
Which makes sense: I read this (if it’s correct!!) just to be a description of compression. Seems like a media encoding that corrupted timestamps would be… corrupt!
You asked the Oracle? Don’t you know that ChatGPT is not a reliable source?
j/k
My understanding, without looking at the original text and only looking at the content in this posts, is that the author is errant to attribute the variations in the two digitized outputs to the VBR issue on MP3s. My understanding of VBR in MP3 is that it is a variation in the encoding compression (how much assumed non-signal noise is thrown away, not in the speed of the timeline. I think I would attribute the misalignment between the two digitization processes as the results of two different playback machines with two different capture processes. This argues strongly for capture workflow metadata. It also argues strongly for manifestation based references in citation and referencing practices.
In library science there is a model called WEMI (work, expression, manifestation, item) multiple manifestations can be made of the same expression (the speaker spoke - it was the expression) he told the traditional story which has been told for generations (the work), and we recorded the expression and made variable format copies of it (manifestations). If I have exact copies of the same manifestations, those are items (I copy a file to your computer and to mine we have two items).