yelp_phân cực

Người giới thiệu:

văn bản thô

Sử dụng lệnh sau để tải tập dữ liệu này trong TFDS:

ds = tfds.load('huggingface:yelp_polarity/plain_text')
  • Sự miêu tả :
Large Yelp Review Dataset.
This is a dataset for binary sentiment classification. We provide a set of 560,000 highly polar yelp reviews for training, and 38,000 for testing. 
The Yelp reviews dataset consists of reviews from Yelp. It is extracted
from the Yelp Dataset Challenge 2015 data. For more information, please
refer to

The Yelp reviews polarity dataset is constructed by
Xiang Zhang ( from the above dataset.
It is first used as a text classification benchmark in the following paper:
Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks
for Text Classification. Advances in Neural Information Processing Systems 28
(NIPS 2015).


The Yelp reviews polarity dataset is constructed by considering stars 1 and 2
negative, and 3 and 4 positive. For each polarity 280,000 training samples and
19,000 testing samples are take randomly. In total there are 560,000 trainig
samples and 38,000 testing samples. Negative polarity is class 1,
and positive class 2.

The files train.csv and test.csv contain all the training samples as
comma-sparated values. There are 2 columns in them, corresponding to class
index (1 and 2) and review text. The review texts are escaped using double
quotes ("), and any internal double quote is escaped by 2 double quotes ("").
New lines are escaped by a backslash followed with an "n" character,
that is "
  • Giấy phép : Không có giấy phép được biết đến
  • Phiên bản : 1.0.0
  • Chia tách :
Tách ra Ví dụ
'test' 38000
'train' 560000
  • Đặc trưng :
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    "label": {
        "num_classes": 2,
        "names": [
        "id": null,
        "_type": "ClassLabel"