Creating a web page with audio tags

Guillem · April 5, 2022, 12:33am

Hello everyone!

I’ve been part of a project here in Santa Maria, California, where several Mixtec interpreters have met to discuss and prepare a glossary of terms related to COVID-19 vaccination, in different Mixtec varieties. It’s all very cool! I am now struggling on how to best present and share that!

So, I have a list of terms, and I have the corresponding audio for each term in each variety. I would like to create a web page where we can see the terms written, and click on them to listen to the audio! Ideally this would be set up in a way that it is easy for us to add more info, as we want to open it up for other varieties as well.

I’d appreciate any help! I’m pretty “basic level” with computational stuff, but I’m sure I can learn!

joeylovestrand · April 5, 2022, 12:46pm

Not exactly a website but I remember a PDF with embedded audio being used effectively for a conference presentation by Christian Brickhouse and @katelynnlindsey

pathall · April 5, 2022, 1:38pm

In this post I’ll walk you through the process of building a simple web glossary with audio. At the bottom there is a .zip file that contains all the example code and some silly sample recordings of some words in Esperanto as a toy example.

A basic almost-empty page

Here’s a very simple basic HTML page:

<!doctype html>
<html lang="en">
  <head>
    <title>Glossary</title>
    <meta charset=utf-8>
    <meta name="viewport" content="width=device-width initial-scale=1.0">
  </head>
  <body>
    <h1>Glossary</h1>
  </body>
</html>

As you probably know, HTML is made up of tags, which are written like this: <tag></tag>. There are a bunch of tags defined in HTML (142 at last count). But far fewer of these are in common use. For your project you’ll probably need no more than 20 distinct tags (all though some of them will be used many times).

If you’ve never looked at HTML before, it might be worth going through a few tutorials. You can search this site, but I’d say this Introduction to HTML on MDN is as good a start as any. Also, here’s a catalog of all the HTML elements if you’re curious.

So, from the top:

tag	description
`<!doctype html>`	tells the browser that it should expect `HTML`. The exclamation point only appears in this tag for complicated historical reasons! This tag is referred to as the “doc type” or “document type”. Every `HTML` page needs it.
`<html lang="en">`	This wraps the whole page and sets the default language via a language code. Suffice it to say that `lang` attribute is a headache (you can’t use glottocodes, for instance. )
`<head>`	This is kind of the “metadata prolog̦” to the page — it’s stuff you don’t actually see in the body of the page.
`<title>`	This is what shows up in the tab in a browser.
`<meta>`	There are two here. The first one makes sure that Unicode will work right, the second one makes sure that the display will have sensible defaults on phones.
`<body>`	Finally, the content of the page. This is where we will do most of our work. Stuff inside the body is what you see in the browser window.
`<h1>`	The only element in our page right now, this is a level-1 heading, so it gets a big bold font.

So I would suggest that you try cutting-and-pasting that content into a plain text file, saving it as glossary.html or something like that, and then opening it up in a browser by using File > Open File…. (Alternatively, you could just double-click the 01.html from the zip file linked at the end of this post.)

You should see something like this:

Very beautiful.

Let’s add a few words

I’ll just use some silly Esperanto examples:

form	gloss
akvo	water
amo	love
birdo	bird
egalas	equal
forton	strong
kaj	and
kiam	when
komune	together
moviĝas	move
montras	show
rivero	river

Okay so, we want to write some HTML tags, or “markup”, as it’s called, to display these words. The most important thing about markup is, everything meaningful should get its own tag. How to define “meaningful” is sometimes tricky, but here it’s pretty simple. We have:

Words
Forms
Glosses

(We’ll add audio in a sec.)

So here’s some HTML to represent these. I’ll use a <p> tag to represent a “word”, and to keep things very simple, I’ll use a <strong> for the form, and an <em> for the gloss. Like this:

<p>
  <strong>akvo</strong>
  <em>water</em>
</p>

Note that the <p> tag “contains” or “wraps” the form and the gloss. This is important, because linguists think about all three of those levels, and hence we might want to modify our page “in terms of” those kinds of information later… this may sound abstract for now, hopefully it will make sense as we keep working.

Finally, a glossary

Okay, we have some content. We just repeat that markup for every word. (Yes, a lot of typing if your list is long — we can talk about automatically generating the markup later.) Here’s the whole list added to the empty page above:

<!doctype html>
<html lang="en">
<head>
  <title>Glossary</title>
  <meta charset=utf-8>
  <meta name="viewport" content="width=device-width initial-scale=1.0">
</head>
<body>
  <h1>Glossary</h1>
  <p>
    <strong>akvo</strong>
    <em>water</em>
  </p>
  <p>
    <strong>amo</strong>
    <em>love</em>
  </p>
  <p>
    <strong>ankoraŭ</strong>
    <em>still</em>
  </p>
  <p>
    <strong>birdo</strong>
    <em>bird</em>
  </p>
  <p>
    <strong>egalas</strong>
    <em>equal</em>
  </p>
  <p>
    <strong>estas</strong>
    <em>to be</em>
  </p>
  <p>
    <strong>forton</strong>
    <em>strong</em>
  </p>
  <p>
    <strong>en</strong>
    <em>in</em>
  </p>
  <p>
    <strong>kaj</strong>
    <em>and</em>
  </p>
  <p>
    <strong>kiam</strong>
    <em>when</em>
  </p>
  <p>
    <strong>manĝota</strong>
    <em>eat</em>
  </p>
  <p>
    <strong>komune</strong>
    <em>together</em>
  </p>
  <p>
    <strong>puno</strong>
    <em>punishment</em>
  </p>
  <p>
    <strong>malfeliĉo</strong>
    <em>misfortune</em>
  </p>
  <p>
    <strong>instruu</strong>
    <em>instruct</em>
  </p>
  <p>
    <strong>moviĝas</strong>
    <em>move</em>
  </p>
  <p>
    <strong>montras</strong>
    <em>show</em>
  </p>
  <p>
    <strong>loĝas</strong>
    <em>reside</em>
  </p>
  <p>
    <strong>povas</strong>
    <em>be able</em>
  </p>
  <p>
    <strong>rivero</strong>
    <em>river</em>
  </p>
</body>
</html>

So that’s a start.

Adding audio

There is a specific tag for adding audio to a page, and it’s really easy to use. It looks like this:

<audio src="something.mp3" controls></audio>

There are two attributes in this tag - the first is src which stands for “source”. That’s the name of the file that the audio tag will be responsible for playing. The other one, controls, is a bit odd — basically you only see the <audio> tag if it has the controls attribute in place. We’ll see below that sometimes it makes sense to have an audio tag that you can’t see — that is, that doesn’t have the controls attribute present.)

Here are the docs for the <audio> tag:

<audio>: The Embed Audio element - HTML: HyperText Markup Language | MDN

So now we enter into the rather odd world of “file paths” as they are called in the web world. We’ll keep it super simple and assume that we are putting all our stuff into a single directory on our computer, like this:

my-glossary
┗ my-glossary.html (the HTML page we just saw)
┗ water.mp3
┗ love.mp3
┗ …etc…

Note that things can be named whatever — I’ve named the audio files after the glosses, which is sometimes convenient, but you whatever works.

So here’s the kind of modification we need to make to add a little audio player for the first word (yes, it’s clunky-looking, but it’s a start):

  <p>
    <audio src="water.mp3" controls></audio>
    <strong>akvo </strong>
    <em>water</em>
  </p>

I’ve got to take a break now (going to the dentist! ), but here is a zip file of all the stuff I’ve just described, complete with my own dorky recordings of Esperanto words:

web-page-with-audio-tags.zip (391.5 KB)

Strictly speaking, we have already completed an example of “Creating a web page with audio tags”, but I think you’ll agree that this needs more love. So next time, we’ll remove the clunky audio players, make clicking the words play the corresponding audio, and talk about making things look a little nicer.

In what will probably be a third installment, we can talk about questions like how to “deploy” our page — that is, put it on the internet. And we can also talk about figuring out a maintainable workflow for your team that can keep your project growing in the future.

Thanks for posting!!

mayhplumb · April 5, 2022, 3:24pm

If you’re looking looking for ready-made options:

Webonary can include audio files (although this requires using FLEx).
I’m also a fan of the Talking Dictionary project; I don’t know if there’s support for new dictionaries at the moment, you could ask!

xrotwng · April 6, 2022, 7:53am

Some clld web apps can do this as well, e.g. Vanuatu Voices - Doculect Aulua: Loxse-Asolokh

Now, a full-scale server-based app is overkill for only this task. But clld apps can do this easily because the data is available in a structured-enough format. In the case of the VanuatuVoices example, the data as CLDF dataset is available on GitHub. The relevant tables are the FormTable and the MediaTable - with a FormID column.

Given data formatted like this, it shouldn’t be too hard to cobble together a “visualization” of this as “web page with audio” - possibly as cldfviz command. (I made a note to myself about this at cldfviz command to create "web page with audio" · Issue #23 · cldf/cldfviz · GitHub - please comment there if that’s something you’d find relevant.)

So then the task would be reduced to formatting the data as CLDF Wordlist with MediaTable - which may not be a lot easier, but would have added benefits like making the data available indenpendently of the visualization.

xrotwng · April 6, 2022, 7:59am

To expand on what I said above: I’d hope that CLDF can become the “interface” between linguistic data and tools/apps/visualizations, to allow for what programmers mean when they say code against the interface not the implementation.

pathall · April 6, 2022, 5:03pm

Huh, interesting.

Then you may link audio files to any vernacular field, most frequently the lexeme form and example sentence. If you use the record feature in FLEx, all sound files will be in .wav format and will need to be converted to .mp3 format and re-linked before uploading to Webonary. It is a good idea to make a backup of the AudioVisual folder before re-linking, as the re-linking process deletes the .wav file.

What? There’s a record feature in Flex?

mayhplumb · April 6, 2022, 7:32pm

Yep! I’ve never actually used FLEx for audio files, but in the “pronunciation” field for a lexeme, there’s an option to add a recording or movie.

And if you add a “audio” vernacular writing system, then you get little record buttons by lexical entries too.

(Fair warning — it’s my understanding that FLEx with 500 linked audio files is (even) more lagging and awkward than regular FLEx.)

pathall · May 20, 2022, 7:03am

2 posts were split to a new topic: Building an Audio lexicon with CLDF

hp3 · June 22, 2022, 2:07pm

You might find this write-up useful: Presenting Oral Texts with Transcriptions Online | Hugh's Curriculum Vitae

cbowern · June 23, 2022, 12:14am

Yes - it doesn’t work too badly (it’s been one of the more reliable flex features in my experience). You can also record through the Language Forge site

xrotwng · January 31, 2023, 10:42am

Coming back to the original “web page with audio tags” topic: I just released cldfviz 0.11 which now comes with a command cldfviz.audiowordlist. Since this command allows providing a custom Jinja2 template, the resulting HTML is quite configurable while keeping data and presentation conveniently separate.