Accented English Speech Dataset | 1.5K+ recordings | Scripted Monologues | Global Coverage product image in hero

Accented English Speech Dataset | 1.5K+ recordings | Scripted Monologues | Global Coverage

FileMarket
No reviews yetBadge iconVerified Data Provider
Name
Link
xxxxxxxxxx Xxxxxxxxx
xxxxxx xxxxxxxxxx
Xxxxx Xxxxxx
Xxxxxxxxxx Xxxxxx
Xxxxxxxxx Xxxxxxxxxx
xxxxxxxxx Xxxxxxxxx
xxxxxxxxx Xxxxxxx
xxxxxx Xxxxx
xxxxxxxxxx xxxxxx
Xxxxxxxxxx xxxxxx
Volume
1.53K
records
Avail. Formats
.mp3 and .wav
File
Coverage
203
Countries

Data Dictionary

[Sample] Accented English Dataset Samples
Attribute Type Example Mapping
Name
String Samples Dataset
Link
String https://www.dropbox.com/scl/fo/2nj38l50ikz54u3dmjqfg/AHSU...

Description

Scripted English monologue set: 1531 clean indoor recordings, each ≤30 s, where a speaker reads a fixed 104-word paragraph. Fully consented and captured on users’ own devices. Rich metadata includes country and native language for accent labeling.
Volume & Protocol: 1531 audio clips (≤30 s each). Speakers read the same 104-word English paragraph in a quiet indoor setting. No background noise, filters, or multiple speakers. All contributors gave explicit consent via our web platform. Geographic & Accent Coverage: Country distribution is led by Nigeria 58.59%, Indonesia 5.81%, Ethiopia 5.43%, Bangladesh 4.84%, India 2.97%, Kenya 1.42%, Egypt 1.23%, Iran 1.16%, Pakistan 0.84%, Uganda 0.78%, Germany 0.52%, Yemen 0.52%. Native-language backgrounds include: Yoruba 21.36%, Igbo 7.08%, Hausa 3.99%, Amharic 3.47%, Indonesian 3.28%, Arabic 1.87%, Persian 1.42%, Hindi 1.35%, Oromo 1.03%, Spanish 0.84%, Mizo 0.84%, Luganda 0.84%. Metadata per file: Anonymized speaker ID, country, self-reported native language, recording length, capture notes, consent flag. Reference transcript is the shared 104-word paragraph. Use cases: Accent-robust ASR, accent classification, pronunciation scoring, keyword-spotting with accent diversity, speech analytics and voice UX testing. Clean, uniformly scripted clips make the dataset plug-and-play for training and benchmarking.

Country Coverage

Africa (58)
Algeria
Angola
Benin
Botswana
Burkina Faso
Burundi
Cabo Verde
Cameroon
Central African Republic
Chad
Comoros
Congo
Congo (Democratic Republic of the)
Côte d'Ivoire
Djibouti
Egypt
Equatorial Guinea
Eritrea
Ethiopia
Gabon
Gambia
Ghana
Guinea
Guinea-Bissau
Kenya
Lesotho
Liberia
Libya
Madagascar
Malawi
Mali
Mauritania
Mauritius
Mayotte
Morocco
Mozambique
Namibia
Niger
Nigeria
Rwanda
Réunion
Saint Helena, Ascension and Tristan da Cunha
Sao Tome and Principe
Senegal
Seychelles
Sierra Leone
Somalia
South Africa
South Sudan
Sudan
Swaziland
Tanzania, United Republic of
Togo
Tunisia
Uganda
Western Sahara
Zambia
Zimbabwe
Asia (51)
Afghanistan
Armenia
Azerbaijan
Bahrain
Bangladesh
Bhutan
Brunei Darussalam
Cambodia
China
Cyprus
Georgia
Hong Kong
India
Indonesia
Iran (Islamic Republic of)
Iraq
Israel
Japan
Jordan
Kazakhstan
Korea (Democratic People's Republic of)
Korea (Republic of)
Kuwait
Kyrgyzstan
Lao People's Democratic Republic
Lebanon
Macao
Malaysia
Maldives
Mongolia
Myanmar
Nepal
Oman
Pakistan
Palestine, State of
Philippines
Qatar
Saudi Arabia
Singapore
Sri Lanka
Syrian Arab Republic
Taiwan
Tajikistan
Thailand
Timor-Leste
Turkey
Turkmenistan
United Arab Emirates
Uzbekistan
Vietnam
Yemen
Europe (52)
Albania
Andorra
Austria
Belarus
Belgium
Bosnia and Herzegovina
Bulgaria
Croatia
Czech Republic
Denmark
Estonia
Faroe Islands
Finland
France
Germany
Gibraltar
Greece
Guernsey
Holy See
Hungary
Iceland
Ireland
Isle of Man
Italy
Jersey
Kosovo
Latvia
Liechtenstein
Lithuania
Luxembourg
Macedonia (the former Yugoslav Republic of)
Malta
Moldova (Republic of)
Monaco
Montenegro
Netherlands
Norway
Poland
Portugal
Romania
Russian Federation
San Marino
Serbia
Slovakia
Slovenia
Spain
Svalbard and Jan Mayen
Sweden
Switzerland
Ukraine
United Kingdom
Åland Islands
South America (42)
Anguilla
Antigua and Barbuda
Argentina
Aruba
Bahamas
Barbados
Bolivia (Plurinational State of)
Bonaire, Sint Eustatius and Saba
Brazil
Cayman Islands
Chile
Colombia
Cuba
Curaçao
Dominica
Dominican Republic
Ecuador
Falkland Islands (Malvinas)
French Guiana
Grenada
Guadeloupe
Guyana
Haiti
Jamaica
Martinique
Montserrat
Paraguay
Peru
Puerto Rico
Saint Barthélemy
Saint Kitts and Nevis
Saint Lucia
Saint Martin (French part)
Saint Vincent and the Grenadines
Sint Maarten (Dutch part)
Suriname
Trinidad and Tobago
Turks and Caicos Islands
Uruguay
Venezuela (Bolivarian Republic of)
Virgin Islands (British)
Virgin Islands (U.S.)

Volume

1,530 records

Pricing

License Starts at
One-off purchase
$1,990 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Delivery

Methods
S3 Bucket
SFTP
Email
Compressed File
Google Cloud Storage
Azure Blob Storage
Frequency
on-demand
Format
.mp3
.wav

Use Cases

Machine Learning (ML)
Data-Efficient Machine Learning
Speech Recognition

Categories

Related Products

Frequently asked questions

What is Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage?

Scripted English monologue set: 1531 clean indoor recordings, each ≤30 s, where a speaker reads a fixed 104-word paragraph. Fully consented and captured on users’ own devices. Rich metadata includes country and native language for accent labeling.

What is Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage used for?

This product has 3 key use cases. FileMarket recommends using the data for Machine Learning (ML), Data-Efficient Machine Learning, and Speech Recognition. Global businesses and organizations buy Natural Language Processing (NLP) Data from FileMarket to fuel their analytics and enrichment.

Who can use Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage?

This product is best suited if you’re a Small Business, Medium-sized Business, or Enterprise looking for Natural Language Processing (NLP) Data. Get in touch with FileMarket to see what their data can do for your business and find out which integrations they provide.

Which countries does Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage cover?

This product includes data covering 203 countries like China, Japan, Germany, India, and UK. FileMarket is headquartered in United States of America.

How much does Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage cost?

Pricing for Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage starts at USD1,990 per purchase. Connect with FileMarket to get a quote and arrange custom pricing models based on your data requirements.

How can I get Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage?

Businesses can buy Natural Language Processing (NLP) Data from FileMarket and get the data via S3 Bucket, SFTP, Email, Compressed File, Google Cloud Storage, and Azure Blob Storage. Depending on your data requirements and subscription budget, FileMarket can deliver this product in .mp3 and .wav format.

What is the data quality of Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage?

You can compare and assess the data quality of FileMarket using Datarade’s data marketplace.

What are similar products to Accented English Speech Dataset 1.5K+ recordings Scripted Monologues Global Coverage?

This product has 3 related products. These alternatives include Global English Speech with Accent Conversational Dataset — Multi-Region Validated Speech with Gender, Age & Metadata for AI & NLP Training, Scripted Monologues Speech Data 65,000 Hours Generative AI Audio Data Speech Recognition Data Machine Learning (ML) Data, and Machine Learning (ML) Data 800M+ B2B Profiles AI-Ready for Deep Learning (DL), NLP & LLM Training. You can compare the best Natural Language Processing (NLP) Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Starts at
$1,990 / purchase
License Starts at
One-off purchase
$1,990 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

FileMarket

Unique Audio and Multimedia Datasets for AI

Verified provider icon Verified Provider
100% Response rate

Trusted by

Customer Logo #1 of FileMarket
Customer Logo #2 of FileMarket
Customer Logo #3 of FileMarket