Module: tf_agents.bandits.policies.ranking_policy

Ranking policy.


class CosinePenalizedPlackettLuce: A distribution that samples items based on scores and cosine similarity.

class DescendingScoreRankingPolicy: A policy that is deterministically ranks elements based on their scores.

class DescendingScoreSampler: Base neural network module class.

class NoPenaltyPlackettLuce: Identical to PlackettLuce, with input signature modified to our needs.

class NoPenaltyRankingPolicy: A class implementing ranking policies in TF Agents.

class PenalizeCosineDistanceRankingPolicy: A Ranking policy that penalizes scores based on cosine distance.

class PenalizedPlackettLuce: A distribution that samples permutations and penalizes item scores.

class RankingPolicy: A class implementing ranking policies in TF Agents.