|View source on GitHub|
Builds a graph based on dense embeddings and persists it in TSV format.
nsl.tools.build_graph( embedding_files, output_graph_path, similarity_threshold=0.8, id_feature_name='id', embedding_feature_name='embedding' )
Used in the tutorials:
This function reads input instances from one or more TFRecord files, each
tf.train.Example protos. Each input example is expected to
contain at least the following 2 features:
id: A singleton
bytes_listfeature that identifies each example.
float_listfeature that contains the (dense) embedding of each example.
embedding are not necessarily the literal feature names; if your
features have different names, you can specify them using the
embedding_feature_name arguments, respectively.
This function then computes the cosine similarity between all pairs of input
examples based on their associated embeddings. An edge is written to the TSV
file named by
output_graph_path for each pair whose similarity is at least
as large as
similarity_threshold. Each output edge is represented by a TSV
line in the
output_graph_path file with the following form:
All edges in the output will be symmetric (i.e., if edge
A--w-->B exists in
the output, then so will edge
Note that this function can also be invoked as a binary from a shell. Sample usage:
python -m neural_structured_learning.tools.build_graph [flags]
For details about this program's flags, run:
python -m neural_structured_learning.tools.build_graph --help
embedding_files: A list of names of TFRecord files containing
tf.train.Exampleobjects, which in turn contain dense embeddings.
output_graph_path: Name of the file to which the output graph in TSV format should be written.
similarity_threshold: Threshold used to determine which edges to retain in the resulting graph.
id_feature_name: The name of the feature in the input
tf.train.Exampleobjects representing the ID of examples.
embedding_feature_name: The name of the feature in the input
tf.train.Exampleobjects representing the embedding of examples.