NO Language Speakers 1 Canadian French 150 people 2 France French 150 people 3 UK English 150 people 4 US English 150 people 5 Australian English 150 people 6 Spain Spanish 150 people 7 Mexico Spanish 150 people 8 Italy Italian 150 people 9 German 150 people 10 Russian 150 people 11 Japanese 150 people 12 Malaysian 150 people 13 Thai 150 people 14 Korean 150 people 15 Romanian 150 people 16 Brazilian Portuguese 150 people 17 European Portuguese 150 people 18 Catalan 150 people 19 Romanian 150 people 20 Viennese 150 people
The objective of the SDM Program is to develop speech corpora to support speech recognition technology research and development for mobiles. This project is formed by a series of mobile data collections in 20 languages conducted in native countries. These corpora were specially designed for the purpose of training and testing of recent ASR applications of mobile and will be licensed upon request.
SDC data overview
Languages: it will be 20 languages in plan and some new languages will be added based on client's demands.
Script Design
All the scripts were specially designed for mobile recognition application training and testing purposes.
Recording platforms
Kinds popular mobiles such as Nokia, Samsung, i-phones, and Blackberry will be chosen for use. Some globally popular Bluetooth devices were used and balanced in different projects such as:
MOTOROLA H375
JABRA BT2020
JABRA 4010
PLANTRONIC Discovery 975
PLANTRONIC Voyager PRO
SAMSUNG WEP250
I TECH Oval 303
NOKIA BH102
Speaker's demographic information Age group Minimum # speakers (%) 18 – 30 years 30% 31 – 45 years 20 % 46 – 65 years 15% No Scenarios Example Proportion 1 Quiet inside Office/Home 50% 2 Light noises Garden/roadside of less noise/restaurant/bus 50%
In order to cover as many speaker specific factors as possible in the database recordings, the project is performed in native countries and the following three broad categories are identified for coverage: gender distribution, age distribution and dialectal distribution.
Gender balance
The database will consist of 50% (± 5%) male + 50% (± 5%) female speakers.
Age distribution
For this project, speech data will be collected in the following age categories:
Speakers above 65 or lower than 18 are optional.
Dialectal regions
The dialectal regions of particular languages are carefully identified and a balance between the number of dialectical regions and the number of speakers are specially made to reflect the number of inhabitants of the region for a particular language.
Recording Conditions
All the speakers are recorded in two real Scenarios with different sessions and noise conditions.
Data transcribing and annotation
All data is transcribed and annotated based on special rules. Each signal file is accompanied by an ASCII SAM label file which contains the relevant descriptive information.
For the detailed information, please contact us for samples.
| Privacy Policy | Terms of Use | Sitemap | Feedback | Contact Us | Copyright Speech Ocean All rights reserved |
