How to Collect Language Data Remotely

With the onset of the coronavirus, the world is scrambling to reimagine a safer way to work. This is particularly vital for technology companies. These companies rely on the collection of linguistic field data to build tomorrow’s language technology: for example speech recognition, voice assistants, chatbots and many other technologies reliant upon natural language processing.

In the past, the typical method has involved sending trained linguists to locations in the “field” so that they can be located near their respondents, facilitating the collection of the needed data.

For example, if a robotics company is training a Korean version of their voice-controlled robot for use in airports, a linguist will go to Korea, recruit native Korean subjects to provide language samples or evaluations, and then collect the data efficiently in a carefully controlled setting.

So, what happens if your field linguist can no longer travel? Or if travel is no longer financially feasible?

Or if your voice artists can’t leave their homes?

If you cannot travel, don’t cancel your field data collection projects just yet. There are many strategies and tools that can be deployed to carry out remote language data collection.

After you have tried it, you may find you prefer these methods even when travel is feasible!

.

Remote Recruitment for Language Sample Collection

These days, recruitment practices have already largely shifted to the digital realm. However it is still not uncommon for a field linguist to recruit subjects by reaching out to them in person. Often a recruiter will go to crowded locations where a good cross-section of subjects can be found.

However, online recruitment can be done particularly efficiently. It is best to employ the help of a local linguist who understands the online marketplaces available (for example Albamon in Korea), as well as local logistics for internet access or delivery of supplies.

.

Remote Language Data Collection

Instead of inviting subjects to make appointments at a recording studio. or other data collection setting, they can be invited to provide samples from the comfort of their own home. Of couse, you will need to make sure that they can meet audio requirements. More about that below.

While many remote data collectors use phones or digital conferencing software, just be aware that transmitting audio over the internet can negatively affect audio quality. If you are collecting voice data, audio quality will be particularly important.

When collecting samples remotely, it is best to deliver prompts to the subjects that can be opened and run offline, and to ask the subject to record their sample offline. After recording, the subject can send their sample to your linguist coordinator for checking.

.

Audio Requirements

Before proceeding with sample collection, confirm with your respondents that they have access to the following equipment and environment:

  • A noise-insulated room without too much echo. Tip: the respondent can surround themselves with piles of laundry, and then hide under a blanket with the mic, to reduce both echo and outside noise.
  • A quality mic or smartphone (built-in laptop mics are usually not good enough, and are too affected by computer fan noise). Snowball and Samson make very affordable and good quality mics.

Before recording, ask the subject to turn off air conditioners or heaters. It is also best if they can reduce their computer’s processing power by closing extra tabs and programs, if the computer fan is humming.

This white noise can be removed later if necessary, but it can add a lot of work if you’re collecting a large amount of data.

What if your respondents do not own the proper equipment? You can send it to them!

In many countries it is very convenient to order equipment to be delivered directly to your respondent. If necessary, you can pay for them to send the equipment back to you, or to deliver the equipment to a storage facility to be picked up later.

.

Other Tools for Remote Data Collection

Magpi: Lets you add voice interaction to your surveys and polls so that they can be conducted without requiring literacy from your respondents, or so that they can be conducted over the phone.

Teamscope: Secure online and offline data collection management.

KoboToolbox: a free and open-source tool for mobile data collection

Are you interested in collecting linguistic field data to develop multilingual technology? Don’t hesitate to reach out to us for help.

FIND SPECIALIZED LINGUISTS NOW