Defined in tensorflow/contrib/metrics/python/metrics/

Computes the approximately best F1-score across different thresholds.

The f1_score function applies a range of thresholds to the predictions to convert them from [0, 1] to bool. Precision and recall are computed by comparing them to the labels. The F1-Score is then defined as 2 * precision * recall / (precision + recall). The best one across the thresholds is returned.

Disclaimer: In practice it may be desirable to choose the best threshold on the validation set and evaluate the F1 score with this threshold on a separate test set. Or it may be desirable to use a fixed threshold (e.g. 0.5).

This function internally creates four local variables, true_positives, true_negatives, false_positives and false_negatives that are used to compute the pairs of recall and precision values for a linearly spaced set of thresholds from which the best f1-score is derived.

This value is ultimately returned as f1-score, an idempotent operation that computes the F1-score (computed using the aforementioned variables). The num_thresholds variable controls the degree of discretization with larger numbers of thresholds more closely approximating the true best F1-score.

For estimation of the metric over a stream of data, the function creates an update_op operation that updates these variables and returns the F1-score.

Example usage with a custom estimator: def model_fn(features, labels, mode): predictions = make_predictions(features) loss = make_loss(predictions, labels) train_op = total_loss=loss, optimizer='Adam') eval_metric_ops = {'f1': f1_score(labels, predictions)} return tf.estimator.EstimatorSpec( mode=mode, predictions=predictions, loss=loss, train_op=train_op, eval_metric_ops=eval_metric_ops, export_outputs=export_outputs) estimator = tf.estimator.Estimator(model_fn=model_fn)

If weights is None, weights default to 1. Use weights of 0 to mask values.


  • labels: A Tensor whose shape matches predictions. Will be cast to bool.
  • predictions: A floating point Tensor of arbitrary shape and whose values are in the range [0, 1].
  • weights: Optional Tensor whose rank is either 0, or the same rank as labels, and must be broadcastable to labels (i.e., all dimensions must be either 1, or the same as the corresponding labels dimension).
  • num_thresholds: The number of thresholds to use when discretizing the roc curve.
  • metrics_collections: An optional list of collections that f1_score should be added to.
  • updates_collections: An optional list of collections that update_op should be added to.
  • name: An optional variable_scope name.


  • f1_score: A scalar Tensor representing the current best f1-score across different thresholds.
  • update_op: An operation that increments the true_positives, true_negatives, false_positives and false_negatives variables appropriately and whose value matches the f1_score.


  • ValueError: If predictions and labels have mismatched shapes, or if weights is not None and its shape doesn't match predictions, or if either metrics_collections or updates_collections are not a list or tuple.