Join us at TensorFlow World, Oct 28-31. Use code TF20 for 20% off select passes. Register now


View source on GitHub

Program to build a graph based on dense input features (embeddings).


python [flags] input_features.tfr ... output_graph.tsv

This program reads input instances from one or more TFRecord files, each containing tf.train.Example protos. Each input example is expected to contain at least these 2 features:

  • id: A singleton bytes_list feature that identifies each Example.
  • embedding: A float_list feature that contains the (dense) embedding of each example.

id and embedding are not necessarily the literal feature names; if your features have different names, you can use the --id_feature_name and --embedding_feature_name flags to specify them, respectively.

The program then computes the cosine similarity between all pairs of input examples based on their associated embeddings. An edge is written to the output_graph.tsv file for each pair whose similarity is at least as large as the value of the --similarity_threshold flag's value. Each output edge is represented by a line in the output_graph.tsv file with the following form:


All edges in the output will be symmetric (i.e., if edge A--w-->B exists in the output, then so will edge B--w-->A).

For details about this program's flags, run python --help.