english english chinese chinese
Bookmark and Share
Home > Data Service > Data Collection > Speech Data Collection

Speech Data Collection

Speechocean provides in-country speech data collection in more than 40 languages and accents based on regional varieties of languages, for example:

          Spanish (Spain, Mexico, America, etc)
          French (Canada, France, etc)
          English (US, UK, Australia, China, Japan, etc)

Speechocean also offers speech corpora in many languages. Please see the Product Catalogue for more detailed information.


Speechocean has extensive experience in performing international data collecting projects in a variety of types. These include:

TTS speech

In-car ASR,

HTS speech

Telephony ASR (mobile, fixed line)

Broadcasting speech

Desktop ASR

Emotional speech

ASR Speech with special microphones and devices

Multi-modal speech

Songs hamming speech


Speechocean is capable of collecting speech data in a wide of sampling rate with multi-channels requirements: 
          
          8k,16bit; 16k,16bit; 22k,16bit; 44k,16bit; 48k,16bit and etc.


The speech collection can be performed in diverse environments and conditions:
 
         
♦    In car environments(parking, city driving, highway driving with different conditions) 
          ♦    Professional Recording Studio 
          ♦     In door environments with different noise conditions: office, home, supermarket, cafe, restaurant and etc. 
          ♦    Outdoor environments: street, park, bus, and other public places.


Speechocean can collect the speech data in a wide range of recording styles includes: 

Scripted speech

Conference speech

Elicited speech

Emotional speech

Spontaneous speech

Songs hamming 

Conversational speech

Psychological speech


Speechocean provides different kinds of Hardware and Software in data collection:

The IT staff of speechocean will make recommendations, source and  efficiently solve any problems for a speech data collection based on customer demands.  Speechocean uses kinds of recording equipment such as microphones and sound cards, recording software, recording platforms, and professional studios: 
 
      ♦      Special environment recording studios such as noise free, echo noise and etc.  
      ♦      Mobile recording platform for remote server supporting different mobile system such as winmobile, Symbian, iphone 
      ♦      Sophisticated software tools for data collection such as desktop, telephone, mobile and other embedded applications 
      ♦      Sophisticated Synchronically Multi-channel in-car recording platforms  
      ♦      Transcribing tools of audio data 
      ♦      Manual alignment tools  
      ♦      Multi-lingua TTS system test and evaluation platform. 
      ♦      Other Software applications that assist with the collection of languages

Speechocean can also develop software tailored to meet any special requirements that customers may have.
Some of the sophisticated processes that SpeechOcean uses to collect data include complex acoustic environments building, demographic speakers recruiting, and quality assessment.

Speechocean can provide special Language and Linguistic consulting Services in languages:

Working from a rich linguists resource of languages, Speechocean can help its customers analyze and make recommendations on the desired language.  Speechocean can make recommendations on aspects of language such as dialects and regional accents, language,proportion on speakers in demographic diversity, special language environments analysis, phonetically balanced script designing, production of pronunciation lexicon, specific material building such as SMS corpus, command word, searching queries, digits and names. Speechocean is experienced in providing special processing services in various languages, such as prosody, name entity and grammar annotating. 
Presently there are corpora available for license such as pronunciation lexicon, and SMs. For more detailed information, please visit our Product Catalogue.