|View source on GitHub|
Program to build a graph based on dense input features (embeddings).
python build_graph.py [flags] input_features.tfr ... output_graph.tsv
This program reads input instances from one or more TFRecord files, each
tf.train.Example protos. Each input example is expected to
contain at least these 2 features:
id: A singleton
bytes_listfeature that identifies each Example.
float_listfeature that contains the (dense) embedding of each example.
embedding are not necessarily the literal feature names; if your
features have different names, you can use the
--embedding_feature_name flags to specify them, respectively.
The program then computes the cosine similarity between all pairs of input
examples based on their associated embeddings. An edge is written to the
output_graph.tsv file for each pair whose similarity is at least as large as
the value of the
--similarity_threshold flag's value. Each output edge is
represented by a line in the output_graph.tsv file with the following form:
All edges in the output will be symmetric (i.e., if edge
A--w-->B exists in
the output, then so will edge
For details about this program's flags, run
python build_graph.py --help.