Retrieves K highest scoring items and their ids from a large dataset.

Inherits From: TopK

Used to efficiently retrieve top K query-candidate scores from a dataset, along with the top scoring candidates' identifiers.

k Number of top scores to retrieve.
handle_incomplete_batches When True, candidate batches smaller than k will be correctly handled at the price of some performance. As an alternative, consider using the drop_remainer option when batching the candidate dataset.
num_parallel_calls Degree of parallelism when computing scores. Defaults to autotuning.
sorted_order If the resulting scores should be returned in sorted order. setting this to False may result in a small increase in performance.

ValueError if candidate elements are not tuples.



View source

Computes K highest scores and candidate indices for a given query.

query_embeddings [query_batch_size, embedding_dim] tensor of query embeddings.
k Number of elements to retrieve. If not set, will default to the k set in the constructor.

Tuple of [query_batch_size, k] tensor of top scores for each query and [query_batch_size, k] tensor of indices for highest scoring candidates.


View source

Sets the dataset of candidates over which to compute streaming top K.

candidates Matrix (or dataset) of candidate embeddings.
identifiers Optional tensor (or dataset) of candidate identifiers. If given these will be return to identify top candidates when performing searches. If not given, indices into the candidates datset will be given instead.

Self for chaining.