forest_fires

  • Description:

This is a regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.

Data Set Information:

In [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The best RMSE was attained by the naive mean predictor. An analysis to the regression error curve (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect, the SVM model predicts better small fires, which are the majority.

Attribute Information:

For more information, read [Cortez and Morais, 2007].

  1. X - x-axis spatial coordinate within the Montesinho park map: 1 to 9
  2. Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9
  3. month - month of the year: 'jan' to 'dec'
  4. day - day of the week: 'mon' to 'sun'
  5. FFMC - FFMC index from the FWI system: 18.7 to 96.20
  6. DMC - DMC index from the FWI system: 1.1 to 291.3
  7. DC - DC index from the FWI system: 7.9 to 860.6
  8. ISI - ISI index from the FWI system: 0.0 to 56.10
  9. temp - temperature in Celsius degrees: 2.2 to 33.30
  10. RH - relative humidity in %: 15.0 to 100
  11. wind - wind speed in km/h: 0.40 to 9.40
  12. rain - outside rain in mm/m2 : 0.0 to 6.4
  13. area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).
Split Examples
'train' 517
  • Feature structure:
FeaturesDict({
    'area': float32,
    'features': FeaturesDict({
        'DC': float32,
        'DMC': float32,
        'FFMC': float32,
        'ISI': float32,
        'RH': float32,
        'X': uint8,
        'Y': uint8,
        'day': ClassLabel(shape=(), dtype=int64, num_classes=7),
        'month': ClassLabel(shape=(), dtype=int64, num_classes=12),
        'rain': float32,
        'temp': float32,
        'wind': float32,
    }),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
area Tensor float32
features FeaturesDict
features/DC Tensor float32
features/DMC Tensor float32
features/FFMC Tensor float32
features/ISI Tensor float32
features/RH Tensor float32
features/X Tensor uint8
features/Y Tensor uint8
features/day ClassLabel int64
features/month ClassLabel int64
features/rain Tensor float32
features/temp Tensor float32
features/wind Tensor float32
  • Citation:
@misc{Dua:2019 ,
author = "Dua, Dheeru and Graff, Casey",
year = "2017",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences" }

@article{cortez2007data,
  title={A data mining approach to predict forest fires using meteorological data},
  author={Cortez, Paulo and Morais, Anibal de Jesus Raimundo},
  year={2007},
  publisher={Associa{\c{c} }{\~a}o Portuguesa para a Intelig{\^e}ncia Artificial (APPIA)}
}