Let data providers come to you!

Post your request to reach 1240+ data providers and find the best match for your data needs

How it works

Tell us what you need
2-3 mins
Receive proposals
within 24 hours
Connect with providers
Post request now
Post your data request
Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech Recognition Data| AI Datasets product image in hero

Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech Recognition Data| AI Datasets

Nexdata
No reviews yetBadge iconVerified Data Provider
#
Dataset Name
Format
Link
1 xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx
2 Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx
3 Xxxxxxxxx Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx
4 xxxxxxxxx Xxxxxxx xxxxxx Xxxxx
5 xxxxxxxxxx xxxxxx Xxxxxxxxxx xxxxxx
6 Xxxxx Xxxxxx xxxxx xxxxxxxx
7 xxxxxxx Xxxxx Xxxxxxxx xxxxxxxxxx
8 xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx
9 Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx
10 xxxxxx xxxxxxx xxxxxxx Xxxxx
... xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx
Sign In To Preview Data
Volume
50K
Hours
Data Quality
98%
sentence/word
Avail. Formats
.bin, .json, and .xml
File
Coverage
28
Countries
History
5
years

Data Dictionary

[Sample] Nexdata-Multilingual Code-switching Speech Data.csv
Attribute Type Example Mapping
Dataset Name
String 303 Hours - Mixed Speech with Chinese and English Data by...
String Chinese,English Language Name
Format
String 16kHz
Link
String https://www.nexdata.ai/dataset/1080?source=Datarade
Product Attributes
Attribute Type Example Mapping
Product Name
String Volume
Multilingual Code-switching Speech Data
String 5000 hours

Description

The recorded text is a mixture multi-language sentences, covering general scenes and human-computer interaction scenes. The audio data is rich in content and accurate in transcription.
1. Specifications Format : 16kHz, 16bit, uncompressed wav, mono channel Recording environment : quiet indoor environment, without echo Recording content (read speech) : general category; human-machine interaction category Demographics : Speakers are evenly distributed across all age groups, covering children, teenagers, middle-aged, elderly, etc. Device : Android mobile phone, iPhone; Language : English-Korean, English-Japanese, German-English, Hong Kong Cantonese-English, Taiwanese-English, Application scenarios : speech recognition; voiceprint recognition. Accuracy rate : 97% 2. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. These ready-to-go Natural Language Processing (NLP) Data support instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade

Country Coverage

Africa (6)
Algeria
Egypt
Morocco
South Africa
Tanzania, United Republic of
Tunisia
Asia (4)
Hong Kong
Japan
Korea (Republic of)
Taiwan
Europe (10)
Finland
France
Germany
Italy
Netherlands
Portugal
Russian Federation
Spain
Switzerland
United Kingdom
North America (3)
Canada
Mexico
United States of America
Oceania (2)
Australia
New Zealand
South America (3)
Argentina
Brazil
Chile

History

5 years of historical data

Volume

50,000 Hours

Pricing

Free sample available
License Starts at
One-off purchase
$20,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Quality

Self-reported by the provider
98%
sentence/word

Delivery

Methods
S3 Bucket
SFTP
Email
UI Export
REST API
SOAP API
Streaming API
Feed API
Frequency
secondly
minutely
hourly
daily
weekly
monthly
quarterly
yearly
real-time
on-demand
Format
.bin
.json
.xml
.csv
.xls
.sql
.txt

Use Cases

Categories

Related Products

15K Hours
98% sentence/word
82 countries covered
The Natural Language Processing (NLP) Data of in-car speech covers 20+ languages, including read, wake-up word, commend word, code-swithing, multimodal and n...
350K calls per month
63 countries covered
1 years of historical data
Access a vast collection of transcribed customer call records tailored to your needs. Ideal for in-depth analysis of customer interactions and behavior trend...
20K voice memos
240 countries covered
We help clients source, curate, and transcribe data for AI and machine learning models. Our services include customized audio data collection and transcripti...
1B Records
250 countries covered
1 years of historical data
Comprehensive training data on 1M+ stores across the US & Canada. Includes detailed menus, inventory, pricing, and availability. Ideal for AI/ML models, powe...

Frequently asked questions

What is Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets?

The recorded text is a mixture multi-language sentences, covering general scenes and human-computer interaction scenes. The audio data is rich in content and accurate in transcription.

What is Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets used for?

This product has 4 key use cases. Nexdata recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Speech Recognition, and LLM Training. Global businesses and organizations buy Machine Learning (ML) Data from Nexdata to fuel their analytics and enrichment.

Who can use Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets?

This product is best suited if you’re a Medium-sized Business or Enterprise looking for Machine Learning (ML) Data. Get in touch with Nexdata to see what their data can do for your business and find out which integrations they provide.

How far back does the data in Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets go?

This product has 5 years of historical coverage. It can be delivered on a secondly, minutely, hourly, daily, weekly, monthly, quarterly, yearly, real-time, and on-demand basis.

Which countries does Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets cover?

This product includes data covering 28 countries like USA, Japan, Germany, UK, and France. Nexdata is headquartered in United States of America.

How much does Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets cost?

Pricing for Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets starts at USD20,000 per purchase. Connect with Nexdata to get a quote and arrange custom pricing models based on your data requirements.

How can I get Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets?

Businesses can buy Machine Learning (ML) Data from Nexdata and get the data via S3 Bucket, SFTP, Email, UI Export, REST API, SOAP API, Streaming API, and Feed API. Depending on your data requirements and subscription budget, Nexdata can deliver this product in .bin, .json, .xml, .csv, .xls, .sql, and .txt format.

What is the data quality of Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets?

Nexdata has reported that this product has the following quality and accuracy assurances: 98% sentence/word. You can compare and assess the data quality of Nexdata using Datarade’s data marketplace.

What are similar products to Mixed Speech Data 5,000 Hours Code-switching Audio Data Speech Recognition Data AI Datasets?

This product has 3 related products. These alternatives include In-Cabin Speech Data 15,000 Hours AI Training Data Speech Recognition Data Audio Data Natural Language Processing (NLP) Data, AI Training Data US Transcription Data Unique Consumer Sentiment Data: Transcription of the calls to the companies, and FileMarket 20,000 Voice Memos Multilingual Training Data for Conversational AI Machine Learning (ML) Data. You can compare the best Machine Learning (ML) Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Starts at
$20,000 / purchase
License Starts at
One-off purchase
$20,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

Nexdata

Sharpen Your AI with Better Data

Verified provider icon Verified Provider
6h Avg. response time
100% Response rate