ML Community Day is November 9! Join us for updates from TensorFlow, JAX, and more Learn more

penguins

  • Description:

Measurements for three penguin species observed in the Palmer Archipelago, Antarctica.

These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. The data were originally imported from the Environmental Data Initiative (EDI) Data Portal, and are available for use by CC0 license ("No Rights Reserved") in accordance with the Palmer Station Data Policy. This copy was imported from Allison Horst's GitHub repository.

@Manual{,
  title = {palmerpenguins: Palmer Archipelago (Antarctica) penguin data},
  author = {Allison Marie Horst and Alison Presmanes Hill and Kristen B Gorman},
  year = {2020},
  note = {R package version 0.1.0},
  doi = {10.5281/zenodo.3960218},
  url = {https://allisonhorst.github.io/palmerpenguins/},
}

penguins/processed (default config)

  • Config description: penguins/processed is a drop-in replacement for the iris dataset. It contains 4 normalised numerical features presented as a single tensor, no missing values and the class label (species) is presented as an integer (n = 334).

  • Download size: 25.05 KiB

  • Dataset size: 17.61 KiB

  • Splits:

Split Examples
'train' 334
  • Features:
FeaturesDict({
    'features': Tensor(shape=(4,), dtype=tf.float32),
    'species': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
})

penguins/simple

  • Config description: penguins/simple has been processed from the raw dataset, with simplified class labels derived from text fields, missing values marked as NaN/NA and retains only 7 significant features (n = 344).

  • Download size: 13.20 KiB

  • Dataset size: 56.10 KiB

  • Splits:

Split Examples
'train' 344
  • Features:
FeaturesDict({
    'body_mass_g': tf.float32,
    'culmen_depth_mm': tf.float32,
    'culmen_length_mm': tf.float32,
    'flipper_length_mm': tf.float32,
    'island': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'sex': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'species': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
})
  • Supervised keys (See as_supervised doc): ({'culmen_depth_mm': 'culmen_depth_mm', 'culmen_length_mm': 'culmen_length_mm', 'body_mass_g': 'body_mass_g', 'flipper_length_mm': 'flipper_length_mm', 'sex': 'sex', 'island': 'island', 'species': 'species'}, 'species')

  • Examples (tfds.as_dataframe):

penguins/raw

  • Config description: penguins/raw is the original, unprocessed copy from @allisonhorst, containing all 17 features, presented either as numeric types or as raw text (n = 344).

  • Download size: 49.72 KiB

  • Dataset size: 164.51 KiB

  • Splits:

Split Examples
'train' 344
  • Features:
FeaturesDict({
    'Body Mass (g)': tf.float32,
    'Clutch Completion': Text(shape=(), dtype=tf.string),
    'Comments': Text(shape=(), dtype=tf.string),
    'Culmen Depth (mm)': tf.float32,
    'Culmen Length (mm)': tf.float32,
    'Date Egg': Text(shape=(), dtype=tf.string),
    'Delta 13 C (o/oo)': tf.float32,
    'Delta 15 N (o/oo)': tf.float32,
    'Flipper Length (mm)': tf.float32,
    'Individual ID': Text(shape=(), dtype=tf.string),
    'Island': Text(shape=(), dtype=tf.string),
    'Region': Text(shape=(), dtype=tf.string),
    'Sample Number': tf.int32,
    'Sex': Text(shape=(), dtype=tf.string),
    'Species': Text(shape=(), dtype=tf.string),
    'Stage': Text(shape=(), dtype=tf.string),
    'studyName': Text(shape=(), dtype=tf.string),
})