Halaman ini diterjemahkan oleh Cloud Translation API.
Switch to English

Muat pandas.DataFrame

Lihat di TensorFlow.org Jalankan di Google Colab Lihat sumber di GitHub Unduh buku catatan

Tutorial ini memberikan contoh cara memuat dataframe pandas ke dalamtf.data.Dataset .

Tutorial ini menggunakan kumpulan data kecil yang disediakan oleh Cleveland Clinic Foundation for Heart Disease. Ada beberapa ratus baris di CSV. Setiap baris menggambarkan pasien, dan setiap kolom menggambarkan atribut. Kami akan menggunakan informasi ini untuk memprediksi apakah seorang pasien menderita penyakit jantung, yang dalam kumpulan data ini merupakan tugas klasifikasi biner.

Membaca data menggunakan panda

import pandas as pd
import tensorflow as tf

Unduh file csv yang berisi kumpulan data hati.

csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/heart.csv')
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/heart.csv
16384/13273 [=====================================] - 0s 0us/step

Baca file csv menggunakan pandas.

df = pd.read_csv(csv_file)
df.head()
df.dtypes
age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal         object
target        int64
dtype: object

Ubah kolom thal yang merupakan object dalam kerangka data menjadi nilai numerik diskrit.

df['thal'] = pd.Categorical(df['thal'])
df['thal'] = df.thal.cat.codes
df.head()

Muat data menggunakantf.data.Dataset

Gunakan tf.data.Dataset.from_tensor_slices untuk membaca nilai dari dataframe pandas.

Salah satu keuntungan menggunakantf.data.Dataset adalah memungkinkan Anda untuk menulis pipeline data yang sederhana dan sangat efisien. Baca panduan data pemuatan untuk mengetahui lebih lanjut.

target = df.pop('target')
dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values))
for feat, targ in dataset.take(5):
  print ('Features: {}, Target: {}'.format(feat, targ))
Features: [ 63.    1.    1.  145.  233.    1.    2.  150.    0.    2.3   3.    0.

   2. ], Target: 0
Features: [ 67.    1.    4.  160.  286.    0.    2.  108.    1.    1.5   2.    3.
   3. ], Target: 1
Features: [ 67.    1.    4.  120.  229.    0.    2.  129.    1.    2.6   2.    2.
   4. ], Target: 0
Features: [ 37.    1.    3.  130.  250.    0.    0.  187.    0.    3.5   3.    0.
   3. ], Target: 0
Features: [ 41.    0.    2.  130.  204.    0.    2.  172.    0.    1.4   1.    0.
   3. ], Target: 0

Karena pd.Series mengimplementasikan protokol __array__ ia dapat digunakan secara transparan hampir di mana pun Anda akan menggunakan np.array atau tf.Tensor .

tf.constant(df['thal'])
<tf.Tensor: shape=(303,), dtype=int8, numpy=
array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3,
       3, 4, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 4, 2, 4, 3, 4, 3, 4, 4,
       2, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 4,
       4, 2, 3, 3, 4, 3, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 4, 4, 4,
       3, 3, 4, 3, 4, 4, 3, 4, 3, 3, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4,
       3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 3, 3, 2, 4, 4, 2, 3, 3, 4, 4, 3, 4,
       3, 3, 4, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
       4, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 2,
       4, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 2, 2, 4, 3, 4, 2, 4, 3,
       3, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 2, 2, 4, 3, 4, 3, 2, 4, 3, 3, 2,
       4, 4, 4, 4, 3, 0, 3, 3, 3, 3, 1, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 4,
       3, 3, 4, 4, 4, 4, 3, 3, 4, 3, 4, 3, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3,
       3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 3, 2, 4, 4, 4, 4], dtype=int8)>

Kocok dan kelompokkan kumpulan data.

train_dataset = dataset.shuffle(len(df)).batch(1)

Buat dan latih model

def get_compiled_model():
  model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1)
  ])

  model.compile(optimizer='adam',
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
model = get_compiled_model()
model.fit(train_dataset, epochs=15)
Epoch 1/15
WARNING:tensorflow:Layer dense is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because its dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

303/303 [==============================] - 1s 2ms/step - loss: 5.5209 - accuracy: 0.5974
Epoch 2/15
303/303 [==============================] - 1s 2ms/step - loss: 1.0674 - accuracy: 0.6502
Epoch 3/15
303/303 [==============================] - 1s 2ms/step - loss: 0.7267 - accuracy: 0.6931
Epoch 4/15
303/303 [==============================] - 1s 2ms/step - loss: 0.6754 - accuracy: 0.7261
Epoch 5/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5856 - accuracy: 0.7591
Epoch 6/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5496 - accuracy: 0.7723
Epoch 7/15
303/303 [==============================] - 1s 2ms/step - loss: 0.7267 - accuracy: 0.7327
Epoch 8/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5309 - accuracy: 0.7789
Epoch 9/15
303/303 [==============================] - 1s 2ms/step - loss: 0.6029 - accuracy: 0.7558
Epoch 10/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5191 - accuracy: 0.7525
Epoch 11/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5402 - accuracy: 0.7855
Epoch 12/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5585 - accuracy: 0.7525
Epoch 13/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5182 - accuracy: 0.7855
Epoch 14/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5443 - accuracy: 0.7723
Epoch 15/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5150 - accuracy: 0.7855

<tensorflow.python.keras.callbacks.History at 0x7f8fdf5749e8>

Alternatif untuk kolom fitur

Meneruskan kamus sebagai input ke model semudah membuat kamus yang cocok dari lapisan tf.keras.layers.Input , menerapkan pra-pemrosesan dan menumpuknya menggunakan api fungsional . Anda dapat menggunakan ini sebagai alternatif untuk kolom fitur .

inputs = {key: tf.keras.layers.Input(shape=(), name=key) for key in df.keys()}
x = tf.stack(list(inputs.values()), axis=-1)

x = tf.keras.layers.Dense(10, activation='relu')(x)
output = tf.keras.layers.Dense(1)(x)

model_func = tf.keras.Model(inputs=inputs, outputs=output)

model_func.compile(optimizer='adam',
                   loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                   metrics=['accuracy'])

Cara termudah untuk mempertahankan struktur kolom pd.DataFrame saat digunakan dengan tf.data adalah dengan mengonversi pd.DataFrame menjadi dict , dan mengiris kamus itu.

dict_slices = tf.data.Dataset.from_tensor_slices((df.to_dict('list'), target.values)).batch(16)
for dict_slice in dict_slices.take(1):
  print (dict_slice)
({'age': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([63, 67, 67, 37, 41, 56, 62, 57, 63, 53, 57, 56, 56, 44, 52, 57],
      dtype=int32)>, 'sex': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)>, 'cp': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 4, 4, 3, 2, 2, 4, 4, 4, 4, 4, 2, 3, 2, 3, 3], dtype=int32)>, 'trestbps': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([145, 160, 120, 130, 130, 120, 140, 120, 130, 140, 140, 140, 130,
       120, 172, 150], dtype=int32)>, 'chol': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([233, 286, 229, 250, 204, 236, 268, 354, 254, 203, 192, 294, 256,
       263, 199, 168], dtype=int32)>, 'fbs': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0], dtype=int32)>, 'restecg': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 0], dtype=int32)>, 'thalach': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([150, 108, 129, 187, 172, 178, 160, 163, 147, 155, 148, 153, 142,
       173, 162, 174], dtype=int32)>, 'exang': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'oldpeak': <tf.Tensor: shape=(16,), dtype=float32, numpy=
array([2.3, 1.5, 2.6, 3.5, 1.4, 0.8, 3.6, 0.6, 1.4, 3.1, 0.4, 1.3, 0.6,

       0. , 0.5, 1.6], dtype=float32)>, 'slope': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([3, 2, 2, 3, 1, 1, 3, 1, 2, 3, 2, 2, 2, 1, 1, 1], dtype=int32)>, 'ca': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 3, 2, 0, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'thal': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3], dtype=int32)>}, <tf.Tensor: shape=(16,), dtype=int64, numpy=array([0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0])>)

model_func.fit(dict_slices, epochs=15)
Epoch 1/15
19/19 [==============================] - 0s 3ms/step - loss: 98.0478 - accuracy: 0.2739
Epoch 2/15
19/19 [==============================] - 0s 3ms/step - loss: 81.2776 - accuracy: 0.2739
Epoch 3/15
19/19 [==============================] - 0s 3ms/step - loss: 65.6498 - accuracy: 0.2739
Epoch 4/15
19/19 [==============================] - 0s 3ms/step - loss: 51.1218 - accuracy: 0.2739
Epoch 5/15
19/19 [==============================] - 0s 3ms/step - loss: 37.5949 - accuracy: 0.2739
Epoch 6/15
19/19 [==============================] - 0s 3ms/step - loss: 24.9959 - accuracy: 0.2739
Epoch 7/15
19/19 [==============================] - 0s 3ms/step - loss: 13.8553 - accuracy: 0.2739
Epoch 8/15
19/19 [==============================] - 0s 3ms/step - loss: 6.0397 - accuracy: 0.2739
Epoch 9/15
19/19 [==============================] - 0s 3ms/step - loss: 1.9379 - accuracy: 0.4455
Epoch 10/15
19/19 [==============================] - 0s 3ms/step - loss: 1.0329 - accuracy: 0.6337
Epoch 11/15
19/19 [==============================] - 0s 3ms/step - loss: 0.8738 - accuracy: 0.6601
Epoch 12/15
19/19 [==============================] - 0s 3ms/step - loss: 0.8148 - accuracy: 0.6865
Epoch 13/15
19/19 [==============================] - 0s 3ms/step - loss: 0.7816 - accuracy: 0.6898
Epoch 14/15
19/19 [==============================] - 0s 3ms/step - loss: 0.7593 - accuracy: 0.6898
Epoch 15/15
19/19 [==============================] - 0s 3ms/step - loss: 0.7429 - accuracy: 0.6997

<tensorflow.python.keras.callbacks.History at 0x7f8f3a80a438>