- Description:
This is a regression task, where the aim is to predict the burned area of forest fires, in the northeast region of Portugal, by using meteorological and other data.
Data Set Information:
In [Cortez and Morais, 2007], the output 'area' was first transformed with a ln(x+1) function. Then, several Data Mining methods were applied. After fitting the models, the outputs were post-processed with the inverse of the ln(x+1) transform. Four different input setups were used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value: 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The best RMSE was attained by the naive mean predictor. An analysis to the regression error curve (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect, the SVM model predicts better small fires, which are the majority.
Attribute Information:
For more information, read [Cortez and Morais, 2007].
- X - x-axis spatial coordinate within the Montesinho park map: 1 to 9
- Y - y-axis spatial coordinate within the Montesinho park map: 2 to 9
- month - month of the year: 'jan' to 'dec'
- day - day of the week: 'mon' to 'sun'
- FFMC - FFMC index from the FWI system: 18.7 to 96.20
- DMC - DMC index from the FWI system: 1.1 to 291.3
- DC - DC index from the FWI system: 7.9 to 860.6
- ISI - ISI index from the FWI system: 0.0 to 56.10
- temp - temperature in Celsius degrees: 2.2 to 33.30
- RH - relative humidity in %: 15.0 to 100
- wind - wind speed in km/h: 0.40 to 9.40
- rain - outside rain in mm/m2 : 0.0 to 6.4
- area - the burned area of the forest (in ha): 0.00 to 1090.84 (this output variable is very skewed towards 0.0, thus it may make sense to model with the logarithm transform).
Homepage: https://archive.ics.uci.edu/ml/datasets/Forest+Fires
Source code:
tfds.structured.ForestFires
Versions:
0.0.1
(default): No release notes.
Download size:
24.88 KiB
Dataset size:
162.07 KiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'train' |
517 |
- Feature structure:
FeaturesDict({
'area': float32,
'features': FeaturesDict({
'DC': float32,
'DMC': float32,
'FFMC': float32,
'ISI': float32,
'RH': float32,
'X': uint8,
'Y': uint8,
'day': ClassLabel(shape=(), dtype=int64, num_classes=7),
'month': ClassLabel(shape=(), dtype=int64, num_classes=12),
'rain': float32,
'temp': float32,
'wind': float32,
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
area | Tensor | float32 | ||
features | FeaturesDict | |||
features/DC | Tensor | float32 | ||
features/DMC | Tensor | float32 | ||
features/FFMC | Tensor | float32 | ||
features/ISI | Tensor | float32 | ||
features/RH | Tensor | float32 | ||
features/X | Tensor | uint8 | ||
features/Y | Tensor | uint8 | ||
features/day | ClassLabel | int64 | ||
features/month | ClassLabel | int64 | ||
features/rain | Tensor | float32 | ||
features/temp | Tensor | float32 | ||
features/wind | Tensor | float32 |
Supervised keys (See
as_supervised
doc):('area', 'features')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@misc{Dua:2019 ,
author = "Dua, Dheeru and Graff, Casey",
year = "2017",
title = "{UCI} Machine Learning Repository",
url = "http://archive.ics.uci.edu/ml",
institution = "University of California, Irvine, School of Information and Computer Sciences" }
@article{cortez2007data,
title={A data mining approach to predict forest fires using meteorological data},
author={Cortez, Paulo and Morais, Anibal de Jesus Raimundo},
year={2007},
publisher={Associa{\c{c} }{\~a}o Portuguesa para a Intelig{\^e}ncia Artificial (APPIA)}
}