english english chinese chinese
Bookmark and Share
Home > Cooperation > Data Resource Platform

Data Distribution Cooperation (DDC)

1. The Purpose of Data Distribution Cooperation

2. The benefits of providing data

3. Release Types

4. Intellectual Property Protection

5. How to provide the data to Speechocean

6. Distribution Status Inquiry of data

7. Other General Information Inquiries

--------------------------------------------------------------------------------------------------------------------------------------------------------

1. The Purpose of Data Distribution Cooperation

The goal of DDC is to share, exchange and promote a large data repository to support the research and the development of Human-Computer Interface (HCI) technology and language study. The data repository includes, but is not limited to, multi-lingual and multi-channel speech data for TTS / ASR / Language learning, parallel-language text corpora for machine translation, a variety of human-features corpora for biometric recognition, and annotated images / audios / videos for data mining.

By distributing the data to other organizations under the General Principles of The Civil Law of The People's Republic of China, this DDC project intends to become involved with all kinds of data providers for this resource platform building.

2. The benefits of providing data

Speechocean staff deeply understands the needs of both researchers and providers.  The DDC can serve as a platform which benefits both parties. Below there are two examples of how data providers can benefit:
     1) Speechocean has established fruitful cooperative relationships with many researchers and organizations around the world.  By publishing the data, providers can not only get the payment for the data license from the data user, but can also let users know about their technological products, helping the providers to reach new markets.
     2) Speechocean has a plentiful database; the data provider can obtain specific data for its special research needs by sharing his data or exchanging the data with Speechocean. In such a case, the provider shall contact Speechocean via email (inquiry@speechocean.com ) with its detailed demands in advance. Speechocean will evaluate the data sharing or exchanging and, if qualified, will give to the provider feedback within 3-5 working days from the date of receiving the data with a Data Exchange Agreement.

--------------------------------------------------------------------------------------------------------------------------------------------------------

3. Release Types

In order to satisfy different needs of providers for data, Speechocean has three release types for providers to select: 
     a) Data Distribution--- the data will be released in the catalogue of SpeechoceanSpeechocean will work as an agent and only distribute the data based on the agreement with the provider.  
     b) Data sharing/donation--- The provider will share/donate all the rights to Speechocean.  In return, the provider can freely use a number of corpora listed in the Member Zone at a proportion of 1:3 at any time.  This can only be used for its own research and commercial research purposes, and the member has no right to distribute the Corpus as a standalone product to any third party. Speechocean will get all the rights to license the data to both its members and non-members without any compensation to the provider.
     c) Data exchange ---- The provider can get some corpus/corpora from the catalogue only by exchanging its data for its own research purposes, including commercial research, and has no right to distribute the Corpus as a standalone product to any third party.  After the exchange, Speechocean will get all the rights to license the data to both its members and non-members without any compensation to the provider.

4. Intellectual Property Protection

All the rights of the provider concerning Speechocean's agreements will be fully and reasonably protected by SPC under the Intellectual Property Law of the PRC.
     Data Distribution Agreement.
     Data Sharing Agreement
     Data Exchange Agreement

For any other legal protection questions, please contact inquiry@speechocean.com

--------------------------------------------------------------------------------------------------------------------------------------------------------

5. How to provide the data to Speechocean

The Speechocean  has various corpora from authors in academia, individual researchers, and private enterprises. In order to successfully release the data, please make sure that the following conditions are fully understood and satisfied:

     •    The data must serve for some need in Scientific research 
     •    The Provider must have the fully Intellectual rights of the corpus and authority to sign the Standard Distribution Agreement with Speechocean independently. 
     •    The documentation for the data and its quality standard presentation shall be completed so that the user can fully understand the data and decide its usage.
     •    The contact information for post sale service shall be provided for any inquiries  from end users.
           If you can meet the guidelines, please email us at release@speechocean.com with the following documents:

a) General introduction of the corpus including the name of the corpus, names of the providers or co-authors, name of the project (if any) for which the corpus was developed, language, medium(text, speech, video, etc),size of the corpus (in K or MB; hours of speech or video data; number of unique words and total number of words for text, etc.), file format, channel count and sampling rate for speech data, sampling format and other descriptions of the corpus and suggested use for applications.
b) Release styles.  For the agent style, please note your suggested license fee which will be further discussed with Speechocean based on the final evaluation of the corpus.
c) Sample of the data (due to our email size limit, if the sample data is too large, please separate it into files below 3 MB and send them to us separately)
d) Primary contact person (both email and telephone number for a single point of contact)

Once getting an initial inquiry with the files mentioned above, Speechocean staff will review it and will give feedback to the provider within 48 hours by email with a FTP address for data uploading.  They will also contact the primary contact person and set up a schedule for delivery of the data and formal technical documents, including a description of the data and specifications, agreements to the Speechocean, as well as any other interim dates, such as delivery of documentation, IPR agreements, or quality control methods, etc.

After getting the data, Speechocean staff will initiate an evaluation  on the data providing and will give feedback based on this evaluation to the provider concerning a specific release date.   
 
Usually Speechocean will archive the data; Speechocean will offer free copies of digital data archives to the original providers at any time on request.

--------------------------------------------------------------------------------------------------------------------------------------------------------

6. Distribution Status Inquiry of data

For the Data Distribution release type, Speechocean will send the provider a quarterly report on the distribution status.  An interim notification will also be sent by the SPC to the provider when data is sold. The providers can also inquire about the status of the data at any time by emailing inquiry@speechocean.com .

7. Other General Information Inquiries

For other information, please inquiries e-mail inquiry@speechocean.com

Speechocean is always open and looking forward to cooperating with any organization or individual on data publishing.