tf.estimator.BoostedTreesRegressor

A Regressor for Tensorflow Boosted Trees models.

Used in the notebooks

Used in the tutorials

feature_columns An iterable containing all the feature columns used by the model. All items in the set should be instances of classes derived from FeatureColumn.
n_batches_per_layer the number of batches to collect statistics per layer. The total number of batches is total number of data divided by batch size.
model_dir Directory to save model parameters, graph and etc. This can also be used to load checkpoints from the directory into a estimator to continue training a previously saved model.
label_dimension Number of regression targets per example.
weight_column A string or a NumericColumn created by tf.fc_old.numeric_column defining feature column representing weights. It is used to downweight or boost examples during training. It will be multiplied by the loss of the example. If it is a string, it is used as a key to fetch weight tensor from the features. If it is a NumericColumn, raw tensor is fetched by key weight_column.key, then weight_column.normalizer_fn is applied on it to get weight tensor.
n_trees number trees to be created.
max_depth maximum depth of the tree to grow.
learning_rate shrinkage parameter to be used when a tree added to the model.
l1_regularization regularization multiplier applied to the absolute weights of the tree leafs.
l2_regularization regularization multiplier applied to the square weights of the tree leafs.
tree_complexity regularization factor to penalize trees with more leaves.
min_node_weight min_node_weight: minimum hessian a node must have for a split to be considered. The value will be compared with sum(leaf_hessian)/(batch_size * n_batches_per_layer).
config RunConfig object to configure the runtime settings.
center_bias Whether bias centering needs to occur. Bias centering refers to the first node in the very first tree returning the prediction that is aligned with the original labels distribution. For example, for regression problems, the first node will return the mean of the labels. For binary classification problems, it will return a logit for a prior probability of label 1.
pruning_mode one of none, pre, post to indicate no pruning, pre- pruning (do not split a node if not enough gain is observed) and post pruning (build the tree up to a max depth and then prune branches with negative gain). For pre and post pruning, you MUST provide tree_complexity>0.
quantile_sketch_epsilon float between 0 and 1. Error bound for quantile computation. This is only used for float feature columns, and the number of buckets generated per float feature is 1/quantile_sketch_epsilon.
train_in_memory bool, when true, it assumes the dataset is in memory, i.e., input_fn should return the entire dataset as a single batch, n_batches_per_layer should be set as 1, num_worker_replicas should be 1, and num_ps_replicas should be 0 in tf.Estimator.RunConfig.

ValueError when wrong arguments are given or unsupported functionalities are requested.

Eager Compatibility

Estimators can be used while eager execution is enabled. Note that input_fn and all hooks are executed inside a graph context, so they have to be written to be compatible with graph mode. Note that input_fn code using tf.data generally works in both graph and eager modes.

config

model_dir

model_fn Returns the model_fn which is bound to self.params.
params

Methods

eval_dir

View source

Shows the directory name where evaluation metrics are dumped.

Args
name Name of the evaluation if user needs to run multiple evaluations on different data sets, such as on training data vs test data. Metrics for different evaluations are saved in separate folders, and appear separately in tensorboard.

Returns
A string which is the path of directory contains evaluation metrics.

evaluate

View source

Evaluates the model given evaluation data input_fn.

For each step, calls input_fn, which returns one batch of data. Evaluates until:

Args
input_fn A function that constructs the input data for evaluation. See Premade Estimators for more information. The function should construct and return one of the following:

  • A tf.data.Dataset object: Outputs of Dataset object must be a tuple (features, labels) with same constraints as below.
  • A tuple (features, labels): Where features is a tf.Tensor or a dictionary of string feature name to Tensor and labels is a Tensor or a dictionary of string label name to Tensor. Both features and labels are consumed by model_fn. They should satisfy the expectation of model_fn from inputs.
steps Number of steps for which to evaluate model. If None, evaluates until input_fn raises an end-of-input exception.
hooks List of tf.train.SessionRunHook subclass instances. Used for callbacks inside the evaluation call.
checkpoint_path Path of a specific checkpoint to evaluate. If None, the latest checkpoint in model_dir is used. If there are no checkpoints in model_dir, evaluation is run with newly initialized Variables instead of ones restored from checkpoint.
name Name of the evaluation if user needs to run multiple evaluations on different data sets, such as on training data vs test data. Metrics for different evaluations are saved in separate folders, and appear separately in tensorboard.