|View source on GitHub|
class CosinePenalizedPlackettLuce: A distribution that samples items based on scores and cosine similarity.
class DescendingScoreRankingPolicy: A policy that is deterministically ranks elements based on their scores.
class DescendingScoreSampler: Base neural network module class.
class NoPenaltyPlackettLuce: Identical to PlackettLuce, with input signature modified to our needs.
class NoPenaltyRankingPolicy: A class implementing ranking policies in TF Agents.
class PenalizeCosineDistanceRankingPolicy: A Ranking policy that penalizes scores based on cosine distance.
class PenalizedPlackettLuce: A distribution that samples permutations and penalizes item scores.
class RankingPolicy: A class implementing ranking policies in TF Agents.