Voice Data Collection: How to Do it Right

Is your company developing voice-based technologies? Over the last decade our team has witnessed an explosion in requests for multilingual voice data collection: for voice assistants, voice recognition technologies, voice control technology, and more.

With our global team of multilingual freelancers, as well as our linguistic analysis expertise, we are a natural choice for businesses that need fast and high-volume voice data.

.

How to Order Voice Data Collection Services

1) Hire linguists who know how to work with the languages at hand

The trickiest thing about languages is that “you never know what you don’t know”. For example, if you didn’t know any Korean, you might not know that you need to specify the hierarchical structure of speakers in order to elicit the right voice utterances! If you didn’t know any Arabic, you might not know that Moroccan Arabic utterances won’t be very useful for Egyptian Arabic speakers.

By hiring linguists that specialize in researching language structures, you can rest assured your bases will be covered.

.

2. Have a clear understanding of the purpose of the voice data

Will this voice data end up training a voice command technology to be used in cars, around traffic noises and family conversations? Or will it be used by children talking to toys (as in our case study, below!)

The purpose of your data will affect how we design our elicitation techniques. That way we can make sure the data is actually useable toward your end goal.

We can collect data using high-tech voice recording equipment, or we can use low-tech voice recorders according to your budgets and needs. We can also provide post-processing to “clean” noisy voice data to the extent possible.

.

3. Have a clear understanding of the audience for the voice data

Will the end user of your technology tend towards a certain “sociolect”? Or will you need data that spans a range of language patterns? Region, education, even mood can affect certain language patterns.

This is especially the case in certain languages like Korean and Turkish!

Therefore, you will want to collect voice data that will help you build a technology that is actually designed for your end users.

We can work with you to create your audience profile.

.

4. Communicate your timing and pricing needs with your voice data provider

We know that as developers, programmers, and product designers, language data is often the last thing you want to think about.

If you are managing a multilingual technology project, you have many other moving parts to attend to. Sometimes voice data collection is left for the last minute. Sometimes the budgets and timelines necessary for voice data collection can catch our clients by surprise.

For this reason, we’ve developed efficient procedures to customize all manners of requests, which include:

  • A global management team: Our project managers cover all timezones.
  • Secure cloud technology for seamless delivery and storage of voice data
  • A database of thousands of linguists in all timezones, working in dozens of languages. All linguists are efficiently tagged for their availability and expertise
  • A variety of data collection procedures, ranging from the most low-fi to the most elite technology deployable...

.

Case Study: 100 Hours of Swedish Voice Data

Collected from Native Speakers in Three Timezones, Delivered in 1 Week, for a German Toy Company (Voice Command Software in a Talking Toy Dog)

One of our favorite projects involved collecting 100 hours of native Swedish voice commands. The task included strict requirements for a female, male, and child quota).

With this rush turnaround, we could not limit our data subjects to the Swedish timezone.

Our recruitment managers set in place our trademark network recruitment model. Using this system, we trained Swedish voice participants to become local recruiters in countries as far flung as included Thailand and Argentina. That way, our linguistic analysts could work nonstop on continuous data input in all timezones.

Our linguist experts also worked with the client to prepare elicitation materials: what voice commands would be needed and how to elicit them in their most natural variations.

Collecting data from children involves an extra layer of care: not only to ensure their legal protections, but to elicit useful data.

Children have short attention spans, and of course we want to make sure they feel comfortable and happy while they serve as our mini-linguistic consultants! Therefore any data elicitation must be combined with play.

Our child language specialists helped to develop child-specific elicitation materials. They also trained our recruitment and elicitation team to collect child voice data remotely, in three countries.

.

Security and Privacy of Voice Data

Our methods and storage are GDPR compliant as audited by an independent legal firm.

.

Interested in our voice data services?

Please contact us, and we’ll guide you through the next steps.