awesome-deeplearning-resources
Corpus
数据堂
语料库在线
3 Million Instacart Orders, Open Sourced
ACM Multimedia Systems Conference Dataset Archive
A comprehensive dataset for stock movement prediction from tweets and historical stock prices.
A dataset for book recommendations: ten thousand books, one million ratings
An awesome list of high-quality datasets
:star:
An awesome list of high-quality open datasets in public domains
:star:
A new dataset for Attribute Based Classification and Zero-Shot Learning
Audio Data Links
Clustering basic benchmark
CNSD 中文自然语言推理数据集
Cool Datasets
:star:
Corpora of misspellings for download
DATASETS FOR DATA MINING
Datasets for Data Science and Machine Learning
DeepDive Open Datasets
:star:
FiveThirtyEight开放可视化数据
Hard Drive Data and Stats
Open Datasets
Picture and specifications scraper
Pixiv Dataset Overview
SLAC: A Sparsely Labeled ACtions Dataset from MIT and Facebook
Some good papers I like
Standardized data set for machine learning of protein structure
Telenav.AI competition public repository
The Quick, Draw! Dataset
Wolfram Data Repository
CV
300 Faces In-the-Wild Challenge
A dataset for personalized highlight detection
A Large-Scale Dataset for Vehicle Re-Identification in the Wild
A MNIST-like fashion product database
:star:
Caltech 10, 000 Web Faces
CASIA WebFace Database
Cross-Age Celebrity Dataset
DeepFashion: Fashion Landmark Detection
EMOTIC Dataset
Face Recognition for Web-Scale Datasets
IMDB-WIKI – 500k+ face images with age and gender labels
Kaggle Datasets
Labeled Faces in the Wild Home
Large-scale CelebFaces Attributes (CelebA) Dataset
LLD - Large Logo Dataset
Medical imaging datasets
Media Integration and Communication Center
MegaFace Dataset
MSRA-CFW: Data Set of Celebrity Faces on the Web
Netizen-Style Commenting on Fashion Photos – Dataset and Diversity Measures
Open Images Dataset V4
SCUT HEAD is a large-scale head detection dataset
Street View Image, Pose, and 3D Cities Dataset
VGG Face Dataset
VGGFace2 Dataset
WebVision视觉数据集2.0
WIDER FACE: A Face Detection Benchmark
YouTube Faces DB
NLP
大规模中文自然语言处理语料
用于对话系统的中英文语料
搜狗实验室
情感分析︱网络公开的免费文本语料训练数据集汇总
中文情感分析用词语集
人民日报切分/标注语料库
哈工大信息检索研究中心(HIT CIR)语言技术平台共享资源
中文句结构树资料库
中文对白语料 chinese conversation corpus
中文语料小数据:Some useful Chinese corpus datasets
中文人名语料库。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名
中文突发事件语料库
联合国平行语料库
保险行业语料库
中华新华字典数据库。包括歇后语,成语,汉字。提供新华字典API
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
最全中华古诗词数据库
PTT 八卦版問答中文語料
Acemap Knowledge Graph
:star:
A dataset of 200k English plaintext jokes.
Alphabetical list of free/public domain datasets with text data for use in NLP
A New Multi-Turn, Multi-Domain, Task-Oriented Dialogue Dataset
A text file containing 479k English words for all your dictionary/word-based projects
BBC Sound Effects Archive Resource
CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB
Chat corpus collection from various open sources
Chinese Nlp Corpus
Chinese Text in the Wild
CoLA - The Corpus of Linguistic Acceptability
:star:
Collections of Chinese NLP corpus
Cornell NLVR
Course materials for Text as Data Lab
Datasets of Annotated Semantic Relationships
Datasets for Entity Recognition
Japanese Word Similarity Dataset
Movie Review Data
Multi-Domain Sentiment Dataset
Open Domain Question Answering
:star:
Open Speech and Language Resources
:star:
Poetry-related datasets collected by THUAIPoet (Jiuge) group.
Public Datasets For Recommender Systems
Second International Chinese Word Segmentation Bakeoff Data
:star:
Taiga Сorpus
Ten thousand books, six million ratings
The Big Bad NLP Database
The DBpedia Knowledge Base
The Movies Corpus
TriviaQA: A Large Scale Dataset for Reading Comprehension and Question Answering
Yelp Open Dataset
70万条对联数据库
Video
A large-scale and high-qualityFMA: A Dataset For Music Analysis dataset of annotated musical notes.
A large-scale dataset of manually annotated audio events
:star:
FMA: A Dataset For Music Analysis
Video Dataset Overview