As Coatue put it in a report they published last week, ‘open-source research, data, and community is at the core of the AI revolution’. In the spirit of contributing to this data sharing community, we’ve gathered top free data sources where anyone can access datasets, no commercial barriers.
We hope it helps members of our data community innovate on their business, academic, and machine learning projects 🌐🧠
Data.world is a collaborative platform hosting a diverse range of datasets, fostering a community where users can explore and contribute to datasets spanning topics from business and science to government and education.
Kaggle is a data science hub offering a plethora of datasets, often accompanied by competitions and challenges. It provides a platform for data enthusiasts to access, analyze, and share datasets while participating in machine learning competitions to showcase their skills.
As the official U.S. government open data portal, Data.gov offers a vast collection of datasets covering various sectors, including health, energy, and education. Users can access government-generated data to drive insights and innovation.
Google Dataset Search simplifies the discovery of datasets across the web. It indexes datasets from various sources, making it a valuable resource for researchers, data scientists, and analysts seeking openly available and high-quality datasets.
GitHub, primarily a code repository platform, hosts numerous datasets in its repositories. It serves as a collaborative space where developers and researchers share datasets related to software development, machine learning, and various other fields.
The Datasets subreddit is a community-driven platform where users share and discuss a wide array of datasets. It's an excellent resource for those seeking diverse datasets and engaging in conversations with fellow data enthusiasts.
Eurostat, the statistical office of the European Union, provides a comprehensive collection of datasets covering topics such as economy, population, and the environment. It serves as a vital resource for understanding European socio-economic trends.
Chartr is a platform that visualizes various datasets, offering users a unique perspective on data through interactive charts and graphics. While not a dataset repository itself, it enhances data exploration and presentation.
AWS Open Data provides a wide array of datasets hosted on Amazon S3, covering domains like climate, genomics, and satellite imagery. It allows users to access and analyze large-scale datasets using AWS cloud services.
Statista is a comprehensive statistics portal providing access to a vast range of datasets, charts, and graphs across industries. It caters to researchers, businesses, and students seeking reliable statistical information.
The World Bank's data repository offers a wealth of information on global development indicators. It includes datasets on topics such as poverty, education, and health, providing valuable insights for researchers and policymakers.
The WHO's data platform offers health-related datasets, supporting research and analysis to address global health challenges. It encompasses data on diseases, health systems, and interventions.
The PEW Research Center provides datasets on public opinion, social issues, and demographic trends. Researchers and analysts can access survey data and reports to understand societal perspectives.
Data Portals is a platform that aggregates links to various open data portals worldwide. It serves as a directory, connecting users to a multitude of datasets available on different platforms.
Open Data Network offers access to datasets from various cities and regions, facilitating the exploration of local government data. It is a valuable resource for understanding community metrics and trends.
Providing comprehensive health-related datasets, America's Health Rankings is a valuable resource for understanding health outcomes, behaviors, and determinants across the United States.
Compstack offers commercial real estate data, allowing users to access and analyze information on property transactions, lease comparables, and market trends.
Realtor.com provides real estate datasets, offering insights into property listings, market trends, and neighborhood information. It serves as a valuable resource for those involved in the real estate industry.
While primarily a financial information platform, Google Finance also provides datasets related to stock market data, allowing users to analyze historical stock prices and financial metrics.
OpenCorporates is a platform offering access to a vast database of corporate information. It facilitates the exploration of company data, including legal entity details and corporate structures.
The International Monetary Fund (IMF) provides a comprehensive set of economic and financial datasets. It includes information on global economic indicators, exchange rates, and government finance.
Google Trends offers insights into the popularity of search queries over time. It's a valuable tool for understanding public interest and trends on a wide range of topics.
While not a dataset repository, The Moz Blog provides valuable insights and articles on SEO and online marketing, offering data-driven strategies for digital marketers.
Healthdata.gov hosts a variety of health-related datasets, providing access to information on healthcare outcomes, population health, and medical research.
The CDC's data and statistics portal offers a wealth of health-related datasets, including information on diseases, vaccinations, and public health trends.
The CIA World Factbook provides geopolitical and demographic data for countries worldwide, offering a comprehensive overview of global statistics.
The US Department of Labor's statistical data covers employment, wages, and workplace trends, supporting labor market analysis and policymaking.
The US Census Bureau's data portal offers a wide range of demographic and economic datasets, providing crucial information for understanding population trends and societal changes.
Data Hub is a platform that aggregates datasets from various sources, offering a diverse collection for researchers, developers, and analysts.
Earth Data, by NASA, provides access to a wealth of environmental datasets, including satellite imagery and climate data, supporting research on Earth's ecosystems.
The CERN Open Data Portal offers access to datasets from experiments conducted at the Large Hadron Collider, contributing to open and transparent scientific research.