kddcup99

  • Description:

This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between 'bad' connections, called intrusions or attacks, and 'good' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

Split Examples
'test' 311,029
'train' 4,898,431
  • Feature structure:
FeaturesDict({
    'count': tf.int32,
    'diff_srv_rate': tf.float32,
    'dst_bytes': tf.int32,
    'dst_host_count': tf.int32,
    'dst_host_diff_srv_rate': tf.float32,
    'dst_host_rerror_rate': tf.float32,
    'dst_host_same_src_port_rate': tf.float32,
    'dst_host_same_srv_rate': tf.float32,
    'dst_host_serror_rate': tf.float32,
    'dst_host_srv_count': tf.int32,
    'dst_host_srv_diff_host_rate': tf.float32,
    'dst_host_srv_rerror_rate': tf.float32,
    'dst_host_srv_serror_rate': tf.float32,
    'duration': tf.int32,
    'flag': ClassLabel(shape=(), dtype=tf.int64, num_classes=11),
    'hot': tf.int32,
    'is_guest_login': tf.bool,
    'is_hot_login': tf.bool,
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=40),
    'land': tf.bool,
    'logged_in': tf.bool,
    'num_access_files': tf.int32,
    'num_compromised': tf.int32,
    'num_failed_logins': tf.int32,
    'num_file_creations': tf.int32,
    'num_outbound_cmds': tf.int32,
    'num_root': tf.int32,
    'num_shells': tf.int32,
    'protocol_type': ClassLabel(shape=(), dtype=tf.int64, num_classes=3),
    'rerror_rate': tf.float32,
    'root_shell': tf.bool,
    'same_srv_rate': tf.float32,
    'serror_rate': tf.float32,
    'service': ClassLabel(shape=(), dtype=tf.int64, num_classes=71),
    'src_bytes': tf.int32,
    'srv_count': tf.int32,
    'srv_diff_host_rate': tf.float32,
    'srv_rerror_rate': tf.float32,
    'srv_serror_rate': tf.float32,
    'su_attempted': tf.int32,
    'urgent': tf.int32,
    'wrong_fragment': tf.int32,
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
count Tensor tf.int32
diff_srv_rate Tensor tf.float32
dst_bytes Tensor tf.int32
dst_host_count Tensor tf.int32
dst_host_diff_srv_rate Tensor tf.float32
dst_host_rerror_rate Tensor tf.float32
dst_host_same_src_port_rate Tensor tf.float32
dst_host_same_srv_rate Tensor tf.float32
dst_host_serror_rate Tensor tf.float32
dst_host_srv_count Tensor tf.int32
dst_host_srv_diff_host_rate Tensor tf.float32
dst_host_srv_rerror_rate Tensor tf.float32
dst_host_srv_serror_rate Tensor tf.float32
duration Tensor tf.int32
flag ClassLabel tf.int64
hot Tensor tf.int32
is_guest_login Tensor tf.bool
is_hot_login Tensor tf.bool
label ClassLabel tf.int64
land Tensor tf.bool
logged_in Tensor tf.bool
num_access_files Tensor tf.int32
num_compromised Tensor tf.int32
num_failed_logins Tensor tf.int32
num_file_creations Tensor tf.int32
num_outbound_cmds Tensor tf.int32
num_root Tensor tf.int32
num_shells Tensor tf.int32
protocol_type ClassLabel tf.int64
rerror_rate Tensor tf.float32
root_shell Tensor tf.bool
same_srv_rate Tensor tf.float32
serror_rate Tensor tf.float32
service ClassLabel tf.int64
src_bytes Tensor tf.int32
srv_count Tensor tf.int32
srv_diff_host_rate Tensor tf.float32
srv_rerror_rate Tensor tf.float32
srv_serror_rate Tensor tf.float32
su_attempted Tensor tf.int32
urgent Tensor tf.int32
wrong_fragment Tensor tf.int32
  • Citation:
@misc{Dua:2019 ,
  author = "Dua, Dheeru and Graff, Casey",
  year = 2017,
  title = "{UCI} Machine Learning Repository",
  url = "http://archive.ics.uci.edu/ml",
  institution = "University of California, Irvine, School of Information and
Computer Sciences"
}