Se usó la API de Cloud Translation para traducir esta página.
Switch to English

Cargue un pandas.

Ver en TensorFlow.org Ejecutar en Google Colab Ver código fuente en GitHub Descargar cuaderno

Este tutorial proporciona un ejemplo de cómo cargar marcos de datos de pandas en un tf.data.Dataset .

Este tutorial utiliza un pequeño conjunto de datos proporcionado por la Cleveland Clinic Foundation for Heart Disease. Hay varios cientos de filas en el CSV. Cada fila describe a un paciente y cada columna describe un atributo. Utilizaremos esta información para predecir si un paciente tiene una enfermedad cardíaca, que en este conjunto de datos es una tarea de clasificación binaria.

Leer datos usando pandas

 import pandas as pd
import tensorflow as tf
 

Descargue el archivo csv que contiene el conjunto de datos del corazón.

 csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/applied-dl/heart.csv')
 
Downloading data from https://storage.googleapis.com/applied-dl/heart.csv
16384/13273 [=====================================] - 0s 0us/step

Lea el archivo csv con pandas.

 df = pd.read_csv(csv_file)
 
 df.head()
 
 df.dtypes
 
age           int64
sex           int64
cp            int64
trestbps      int64
chol          int64
fbs           int64
restecg       int64
thalach       int64
exang         int64
oldpeak     float64
slope         int64
ca            int64
thal         object
target        int64
dtype: object

Convierta thal columna que es un object en el marco de datos a un valor numérico discreto.

 df['thal'] = pd.Categorical(df['thal'])
df['thal'] = df.thal.cat.codes
 
 df.head()
 

Cargar datos usando tf.data.Dataset

Use tf.data.Dataset.from_tensor_slices para leer los valores de un marco de datos de pandas.

Una de las ventajas de usar tf.data.Dataset es que le permite escribir canalizaciones de datos simples y altamente eficientes. Lea la guía de carga de datos para obtener más información.

 target = df.pop('target')
 
 dataset = tf.data.Dataset.from_tensor_slices((df.values, target.values))
 
 for feat, targ in dataset.take(5):
  print ('Features: {}, Target: {}'.format(feat, targ))
 
Features: [ 63.    1.    1.  145.  233.    1.    2.  150.    0.    2.3   3.    0.

   2. ], Target: 0
Features: [ 67.    1.    4.  160.  286.    0.    2.  108.    1.    1.5   2.    3.
   3. ], Target: 1
Features: [ 67.    1.    4.  120.  229.    0.    2.  129.    1.    2.6   2.    2.
   4. ], Target: 0
Features: [ 37.    1.    3.  130.  250.    0.    0.  187.    0.    3.5   3.    0.
   3. ], Target: 0
Features: [ 41.    0.    2.  130.  204.    0.    2.  172.    0.    1.4   1.    0.
   3. ], Target: 0

Dado que un pd.Series implementa el protocolo __array__ , se puede usar de forma transparente en casi cualquier lugar donde usaría un np.array o un tf.Tensor .

 tf.constant(df['thal'])
 
<tf.Tensor: shape=(303,), dtype=int8, numpy=
array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3,
       3, 4, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 4, 2, 4, 3, 4, 3, 4, 4,
       2, 3, 3, 4, 3, 3, 4, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 4,
       4, 2, 3, 3, 4, 3, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 4, 4, 4,
       3, 3, 4, 3, 4, 4, 3, 4, 3, 3, 3, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4,
       3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 3, 3, 2, 4, 4, 2, 3, 3, 4, 4, 3, 4,
       3, 3, 4, 2, 4, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4,
       4, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 3, 2,
       4, 4, 3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 2, 2, 4, 3, 4, 2, 4, 3,
       3, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 2, 2, 4, 3, 4, 3, 2, 4, 3, 3, 2,
       4, 4, 4, 4, 3, 0, 3, 3, 3, 3, 1, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 4,
       3, 3, 4, 4, 4, 4, 3, 3, 4, 3, 4, 3, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3,
       3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 3, 2, 4, 4, 4, 4], dtype=int8)>

Mezcle y procese por lotes el conjunto de datos.

 train_dataset = dataset.shuffle(len(df)).batch(1)
 

Crea y entrena un modelo

 def get_compiled_model():
  model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1)
  ])

  model.compile(optimizer='adam',
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=['accuracy'])
  return model
 
 model = get_compiled_model()
model.fit(train_dataset, epochs=15)
 
Epoch 1/15
WARNING:tensorflow:Layer dense is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2.  The layer has dtype float32 because it's dtype defaults to floatx.

If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2.

To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

303/303 [==============================] - 1s 2ms/step - loss: 1.4957 - accuracy: 0.6997
Epoch 2/15
303/303 [==============================] - 1s 2ms/step - loss: 0.9019 - accuracy: 0.7063
Epoch 3/15
303/303 [==============================] - 1s 2ms/step - loss: 0.8484 - accuracy: 0.7096
Epoch 4/15
303/303 [==============================] - 1s 2ms/step - loss: 0.7417 - accuracy: 0.7063
Epoch 5/15
303/303 [==============================] - 1s 2ms/step - loss: 0.6505 - accuracy: 0.7294
Epoch 6/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5874 - accuracy: 0.7360
Epoch 7/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5485 - accuracy: 0.7756
Epoch 8/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5133 - accuracy: 0.7525
Epoch 9/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5041 - accuracy: 0.7492
Epoch 10/15
303/303 [==============================] - 1s 2ms/step - loss: 0.4815 - accuracy: 0.7690
Epoch 11/15
303/303 [==============================] - 1s 2ms/step - loss: 0.4798 - accuracy: 0.7591
Epoch 12/15
303/303 [==============================] - 1s 2ms/step - loss: 0.5250 - accuracy: 0.7954
Epoch 13/15
303/303 [==============================] - 1s 2ms/step - loss: 0.4593 - accuracy: 0.7723
Epoch 14/15
303/303 [==============================] - 1s 2ms/step - loss: 0.4517 - accuracy: 0.7987
Epoch 15/15
303/303 [==============================] - 1s 2ms/step - loss: 0.4612 - accuracy: 0.8053

<tensorflow.python.keras.callbacks.History at 0x7f10f818c748>

Alternativa para presentar columnas

Pasar un diccionario como entrada a un modelo es tan fácil como crear un diccionario coincidente de capas tf.keras.layers.Input , aplicar cualquier preprocesamiento y tf.keras.layers.Input usando la API funcional . Puede usar esto como una alternativa para presentar columnas .

 inputs = {key: tf.keras.layers.Input(shape=(), name=key) for key in df.keys()}
x = tf.stack(list(inputs.values()), axis=-1)

x = tf.keras.layers.Dense(10, activation='relu')(x)
output = tf.keras.layers.Dense(1)(x)

model_func = tf.keras.Model(inputs=inputs, outputs=output)

model_func.compile(optimizer='adam',
                   loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                   metrics=['accuracy'])
 

La forma más fácil de preservar la estructura de columnas de un pd.DataFrame cuando se usa con tf.data es convertir el pd.DataFrame a un dict , y cortar ese diccionario.

 dict_slices = tf.data.Dataset.from_tensor_slices((df.to_dict('list'), target.values)).batch(16)
 
 for dict_slice in dict_slices.take(1):
  print (dict_slice)
 
({'age': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([63, 67, 67, 37, 41, 56, 62, 57, 63, 53, 57, 56, 56, 44, 52, 57],
      dtype=int32)>, 'sex': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)>, 'cp': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 4, 4, 3, 2, 2, 4, 4, 4, 4, 4, 2, 3, 2, 3, 3], dtype=int32)>, 'trestbps': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([145, 160, 120, 130, 130, 120, 140, 120, 130, 140, 140, 140, 130,
       120, 172, 150], dtype=int32)>, 'chol': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([233, 286, 229, 250, 204, 236, 268, 354, 254, 203, 192, 294, 256,
       263, 199, 168], dtype=int32)>, 'fbs': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0], dtype=int32)>, 'restecg': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([2, 2, 2, 0, 2, 0, 2, 0, 2, 2, 0, 2, 2, 0, 0, 0], dtype=int32)>, 'thalach': <tf.Tensor: shape=(16,), dtype=int32, numpy=
array([150, 108, 129, 187, 172, 178, 160, 163, 147, 155, 148, 153, 142,
       173, 162, 174], dtype=int32)>, 'exang': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'oldpeak': <tf.Tensor: shape=(16,), dtype=float32, numpy=
array([2.3, 1.5, 2.6, 3.5, 1.4, 0.8, 3.6, 0.6, 1.4, 3.1, 0.4, 1.3, 0.6,

       0. , 0.5, 1.6], dtype=float32)>, 'slope': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([3, 2, 2, 3, 1, 1, 3, 1, 2, 3, 2, 2, 2, 1, 1, 1], dtype=int32)>, 'ca': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([0, 3, 2, 0, 0, 0, 2, 0, 1, 0, 0, 0, 1, 0, 0, 0], dtype=int32)>, 'thal': <tf.Tensor: shape=(16,), dtype=int32, numpy=array([2, 3, 4, 3, 3, 3, 3, 3, 4, 4, 2, 3, 2, 4, 4, 3], dtype=int32)>}, <tf.Tensor: shape=(16,), dtype=int64, numpy=array([0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0])>)

 model_func.fit(dict_slices, epochs=15)
 
Epoch 1/15
19/19 [==============================] - 0s 3ms/step - loss: 26.7893 - accuracy: 0.2739
Epoch 2/15
19/19 [==============================] - 0s 3ms/step - loss: 12.2734 - accuracy: 0.2739
Epoch 3/15
19/19 [==============================] - 0s 3ms/step - loss: 2.7571 - accuracy: 0.5083
Epoch 4/15
19/19 [==============================] - 0s 3ms/step - loss: 2.0496 - accuracy: 0.7162
Epoch 5/15
19/19 [==============================] - 0s 3ms/step - loss: 1.7324 - accuracy: 0.6436
Epoch 6/15
19/19 [==============================] - 0s 3ms/step - loss: 1.6612 - accuracy: 0.6238
Epoch 7/15
19/19 [==============================] - 0s 2ms/step - loss: 1.5567 - accuracy: 0.6469
Epoch 8/15
19/19 [==============================] - 0s 2ms/step - loss: 1.4756 - accuracy: 0.6403
Epoch 9/15
19/19 [==============================] - 0s 3ms/step - loss: 1.3848 - accuracy: 0.6535
Epoch 10/15
19/19 [==============================] - 0s 3ms/step - loss: 1.2977 - accuracy: 0.6535
Epoch 11/15
19/19 [==============================] - 0s 3ms/step - loss: 1.2133 - accuracy: 0.6601
Epoch 12/15
19/19 [==============================] - 0s 3ms/step - loss: 1.1288 - accuracy: 0.6667
Epoch 13/15
19/19 [==============================] - 0s 3ms/step - loss: 1.0468 - accuracy: 0.6634
Epoch 14/15
19/19 [==============================] - 0s 2ms/step - loss: 0.9671 - accuracy: 0.6799
Epoch 15/15
19/19 [==============================] - 0s 3ms/step - loss: 0.8915 - accuracy: 0.6865

<tensorflow.python.keras.callbacks.History at 0x7f10f818a080>