![]() | ![]() | ![]() | ![]() |
Tutorial ini memberikan contoh cara memuat dataframe pandas ke dalamtf.data.Dataset
.
Tutorial ini menggunakan kumpulan data kecil yang disediakan oleh Cleveland Clinic Foundation for Heart Disease. Ada beberapa ratus baris di CSV. Setiap baris menggambarkan pasien, dan setiap kolom menggambarkan atribut. Kami akan menggunakan informasi ini untuk memprediksi apakah seorang pasien menderita penyakit jantung, yang dalam kumpulan data ini merupakan tugas klasifikasi biner.
Membaca data menggunakan panda
import pandas as pd
import tensorflow as tf
Unduh file csv yang berisi kumpulan data hati.
csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/heart.csv')
Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/heart.csv 16384/13273 [=====================================] - 0s 0us/step
Baca file csv menggunakan pandas.
df = pd.read_csv(csv_file)
df.head()
df.dtypes
age int64 sex int64 cp int64 trestbps int64 chol int64 fbs int64 restecg int64 thalach int64 exang int64 oldpeak float64 slope int64 ca int64 thal object target int64 dtype: object
Ubah kolom thal
yang merupakan object
dalam kerangka data menjadi nilai numerik diskrit.
df['thal'] = pd.Categorical(df['thal'])
df['thal'] = df.thal.cat.codes
df.head()
Muat data menggunakantf.data.Dataset
Gunakan tf.data.Dataset.from_tensor_slices
untuk membaca nilai dari dataframe pandas.
Salah satu keuntungan menggunakantf.data.Dataset
adalah memungkinkan Anda untuk menulis pipeline data yang sederhana dan sangat efisien. Baca panduan data pemuatan untuk mengetahui lebih lanjut.
target = df.pop('target')
dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values))
for feat, targ in dataset.take(5):
print ('Features: {}, Target: {}'.format(feat, targ))
Features: [ 63. 1. 1. 145. 233. 1. 2. 150. 0. 2.3 3. 0. 2. ], Target: 0 Features: [ 67. 1. 4. 160. 286. 0. 2. 108. 1. 1.5 2. 3. 3. ], Target: 1 Features: [ 67. 1. 4. 120. 229. 0. 2. 129. 1. 2.6 2. 2. 4. ], Target: 0 Features: [ 37. 1. 3. 130. 250. 0. 0. 187. 0. 3.5 3. 0. 3. ], Target: 0 Features: [ 41. 0. 2. 130. 204. 0. 2. 172. 0. 1.4 1. 0. 3. ], Target: 0
Karena pd.Series
mengimplementasikan protokol __array__
ia dapat digunakan secara transparan hampir di mana pun Anda akan menggunakan np.array
atau tf.Tensor
.
tf.constant(df['thal'])
<tf.Tensor: shape=(303,), dtype=int8, numpy= array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 4, 2, 4, 3, 4, 3, 4, 4, 2, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 4, 4, 2, 3, 3, 4, 3, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 4, 4, 4, 3, 3, 4, 3, 4, 4, 3, 4, 3, 3, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 3, 3, 2, 4, 4, 2, 3, 3, 4, 4, 3, 4, 3, 3, 4, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 2, 4, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 2, 2, 4, 3, 4, 2, 4, 3, 3, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 2, 2, 4, 3, 4, 3, 2, 4, 3, 3, 2, 4, 4, 4, 4, 3, 0, 3, 3, 3, 3, 1, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 4, 3, 3, 4, 4, 4, 4, 3, 3, 4, 3, 4, 3, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 3, 2, 4, 4, 4, 4], dtype=int8)>
Kocok dan kelompokkan kumpulan data.
train_dataset = dataset.shuffle(len(df)).batch(1)
Buat dan latih model
def get_compiled_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
model = get_compiled_model()
model.fit(train_dataset, epochs=15)
Epoch 1/15 WARNING:tensorflow:Layer dense is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because its dtype defaults to floatx. If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2. To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor. 303/303 [==============================] - 1s 2ms/step - loss: 5.5209 - accuracy: 0.5974 Epoch 2/15 303/303 [==============================] - 1s 2ms/step - loss: 1.0674 - accuracy: 0.6502 Epoch 3/15 303/303 [==============================] - 1s 2ms/step - loss: 0.7267 - accuracy: 0.6931 Epoch 4/15 303/303 [==============================] - 1s 2ms/step - loss: 0.6754 - accuracy: 0.7261 Epoch 5/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5856 - accuracy: 0.7591 Epoch 6/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5496 - accuracy: 0.7723 Epoch 7/15 303/303 [==============================] - 1s 2ms/step - loss: 0.7267 - accuracy: 0.7327 Epoch 8/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5309 - accuracy: 0.7789 Epoch 9/15 303/303 [==============================] - 1s 2ms/step - loss: 0.6029 - accuracy: 0.7558 Epoch 10/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5191 - accuracy: 0.7525 Epoch 11/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5402 - accuracy: 0.7855 Epoch 12/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5585 - accuracy: 0.7525 Epoch 13/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5182 - accuracy: 0.7855 Epoch 14/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5443 - accuracy: 0.7723 Epoch 15/15 303/303 [==============================] - 1s 2ms/step - loss: 0.5150 - accuracy: 0.7855 <tensorflow.python.keras.callbacks.History at 0x7f8fdf5749e8>
Alternatif untuk kolom fitur
Meneruskan kamus sebagai input ke model semudah membuat kamus yang cocok dari lapisan tf.keras.layers.Input
, menerapkan pra-pemrosesan dan menumpuknya menggunakan api fungsional . Anda dapat menggunakan ini sebagai alternatif untuk kolom fitur .
inputs = {key: tf.keras.layers.Input(shape=(), name=key) for key in df.keys()}
x = tf.stack(list(inputs.values()), axis=-1)
x = tf.keras.layers.Dense(10, activation='relu')(x)
output = tf.keras.layers.Dense(1)(x)
model_func = tf.keras.Model(inputs=inputs, outputs=output)
model_func.compile(optimizer='adam',
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
Cara termudah untuk mempertahankan struktur kolom pd.DataFrame
saat digunakan dengan tf.data
adalah dengan mengonversi pd.DataFrame
menjadi dict
, dan mengiris kamus itu.
dict_slices = tf.data.Dataset.from_tensor_slices((df.to_dict('list'), target.values)).batch(16)
for dict_slice in dict_slices.take(1):
print (dict_slice)
({'age': <tf.Tensor: shape=(16,), dtype=int32, numpy= array([63, 67, 67, 37, 41, 56, 62, 57, 63, 53, 57, 56, 56, 44, 52, 57], dtype=int32)>, 'sex': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)>, 'cp': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 4, 4, 3, 2, 2, 4, 4, 4, 4, 4, 2, 3, 2, 3, 3], dtype=int32)>, 'trestbps': <tf.Tensor: shape=(16,), dtype=int32, numpy= array([145, 160, 120, 130, 130, 120, 140, 120, 130, 140, 140, 140, 130, 120, 172, 150], dtype=int32)>, 'chol': <tf.Tensor: shape=(16,), dtype=int32, numpy= array([233, 286, 229, 250, 204, 236, 268, 354, 254, 203, 192, 294, 256, 263, 199, 168], dtype=int32)>, 'fbs': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0], dtype=int32)>, 'restecg': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 0], dtype=int32)>, 'thalach': <tf.Tensor: shape=(16,), dtype=int32, numpy= array([150, 108, 129, 187, 172, 178, 160, 163, 147, 155, 148, 153, 142, 173, 162, 174], dtype=int32)>, 'exang': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'oldpeak': <tf.Tensor: shape=(16,), dtype=float32, numpy= array([2.3, 1.5, 2.6, 3.5, 1.4, 0.8, 3.6, 0.6, 1.4, 3.1, 0.4, 1.3, 0.6, 0. , 0.5, 1.6], dtype=float32)>, 'slope': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([3, 2, 2, 3, 1, 1, 3, 1, 2, 3, 2, 2, 2, 1, 1, 1], dtype=int32)>, 'ca': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 3, 2, 0, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'thal': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3], dtype=int32)>}, <tf.Tensor: shape=(16,), dtype=int64, numpy=array([0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0])>)
model_func.fit(dict_slices, epochs=15)
Epoch 1/15 19/19 [==============================] - 0s 3ms/step - loss: 98.0478 - accuracy: 0.2739 Epoch 2/15 19/19 [==============================] - 0s 3ms/step - loss: 81.2776 - accuracy: 0.2739 Epoch 3/15 19/19 [==============================] - 0s 3ms/step - loss: 65.6498 - accuracy: 0.2739 Epoch 4/15 19/19 [==============================] - 0s 3ms/step - loss: 51.1218 - accuracy: 0.2739 Epoch 5/15 19/19 [==============================] - 0s 3ms/step - loss: 37.5949 - accuracy: 0.2739 Epoch 6/15 19/19 [==============================] - 0s 3ms/step - loss: 24.9959 - accuracy: 0.2739 Epoch 7/15 19/19 [==============================] - 0s 3ms/step - loss: 13.8553 - accuracy: 0.2739 Epoch 8/15 19/19 [==============================] - 0s 3ms/step - loss: 6.0397 - accuracy: 0.2739 Epoch 9/15 19/19 [==============================] - 0s 3ms/step - loss: 1.9379 - accuracy: 0.4455 Epoch 10/15 19/19 [==============================] - 0s 3ms/step - loss: 1.0329 - accuracy: 0.6337 Epoch 11/15 19/19 [==============================] - 0s 3ms/step - loss: 0.8738 - accuracy: 0.6601 Epoch 12/15 19/19 [==============================] - 0s 3ms/step - loss: 0.8148 - accuracy: 0.6865 Epoch 13/15 19/19 [==============================] - 0s 3ms/step - loss: 0.7816 - accuracy: 0.6898 Epoch 14/15 19/19 [==============================] - 0s 3ms/step - loss: 0.7593 - accuracy: 0.6898 Epoch 15/15 19/19 [==============================] - 0s 3ms/step - loss: 0.7429 - accuracy: 0.6997 <tensorflow.python.keras.callbacks.History at 0x7f8f3a80a438>