tf_privacy.DPQuery

Interface for differentially private query mechanisms.

Differential privacy is achieved by processing records to bound sensitivity, accumulating the processed records (usually by summing them) and then adding noise to the aggregated result. The process can be repeated to compose applications of the same mechanism, possibly with different parameters.

The DPQuery interface specifies a functional approach to this process. A global state maintains state that persists across applications of the mechanism. For each application, the following steps are performed:

  1. Use the global state to derive parameters to use for the next sample of records.
  2. Initialize a sample state that will accumulate processed records.
  3. For each record: a. Process the record. b. Accumulate the record into the sample state.
  4. Get the result of the mechanism, possibly updating the global state to use in the next application.
  5. Derive metrics from the global state.

Here is an example using the GaussianSumQuery. Assume there is some function records_for_round(round) that returns an iterable of records to use on some round.

dp_query = tensorflow_privacy.GaussianSumQuery(
    l2_norm_clip=1.0, stddev=1.0)
global_state = dp_query.initial_global_state()

for round in range(num_rounds):
  sample_params = dp_query.derive_sample_params(global_state)
  sample_state = dp_query.initial_sample_state()
  for record in records_for_round(round):
    sample_state = dp_query.accumulate_record(
        sample_params, sample_state, record)

  result, global_state = dp_query.get_noised_result(
      sample_state, global_state)
  metrics = dp_query.derive_metrics(global_state)

  # Do something with result and metrics...

Methods

accumulate_preprocessed_record

View source

Accumulates a single preprocessed record into the sample state.

This method is intended to only do simple aggregation, typically just a sum. In the future, we might remove this method and replace it with a way to declaratively specify the type of aggregation required.

Args
sample_state The current sample state. In standard DP-SGD training, the accumulated sum of previous clipped microbatch gradients.
preprocessed_record The preprocessed record to accumulate.

Returns
The updated sample state.

accumulate_record

View source

Accumulates a single record into the sample state.

This is a helper method that simply delegates to preprocess_record and accumulate_preprocessed_record for the common case when both of those functions run on a single device. Typically this will be a simple sum.

Args
params The parameters for the sample. In standard DP-SGD training, the clipping norm for the sample's microbatch gradients (i.e., a maximum norm magnitude to which each gradient is clipped)
sample_state The current sample state. In standard DP-SGD training, the accumulated sum of previous clipped microbatch gradients.
record The record to accumulate. In standard DP-SGD training, the gradient computed for the examples in one microbatch, which may be the gradient for just one example (for size 1 microbatches).

Returns
The updated sample state. In standard DP-SGD training, the set of previous microbatch gradients with the addition of the record argument.

derive_metrics

View source

Derives metric information from the current global state.

Any metrics returned should be derived only from privatized quantities.

Args
global_state The global state from which to derive metrics.

Returns
A collections.OrderedDict mapping string metric names to tensor values.

derive_sample_params

View source

Given the global state, derives parameters to use for the next sample.

For example, if the mechanism needs to clip records to bound the norm, the clipping norm should be part of the sample params. In a distributed context, this is the part of the state that would be sent to the workers so they can process records.

Args
global_state The current global state.

Returns
Parameters to use to process records in the next sample.

get_noised_result

View source

Gets the query result after all records of sample have been accumulated.

The global state can also be updated for use in the next application of the DP mechanism.

Args
sample_state The sample state after all records have been accumulated. In standard DP-SGD training, the accumulated sum of clipped microbatch gradients (in the special case of microbatches of size 1, the clipped per-example gradients).
global_state The global state, storing long-term privacy bookkeeping.

Returns
A tuple (result, new_global_state, event) where:

  • result is the result of the query,
  • new_global_state is the updated global state, and
  • event is the DpEvent that occurred. In standard DP-SGD training, the result is a gradient update comprising a noised average of the clipped gradients in the sample state---with the noise and averaging performed in a manner that guarantees differential privacy.

initial_global_state

View source

Returns the initial global state for the DPQuery.

The global state contains any state information that changes across repeated applications of the mechanism. The default implementation returns just an empty tuple for implementing classes that do not have any persistent state.

This object must be processable via tf.nest.map_structure.

Returns
The global state.

initial_sample_state

View source

Returns an initial state to use for the next sample.

For typical DPQuery classes that are aggregated by summation, this should return a nested structure of zero tensors of the appropriate shapes, to which processed records will be aggregated.

Args
template A nested structure of tensors, TensorSpecs, or numpy arrays used as a template to create the initial sample state. It is assumed that the leaves of the structure are python scalars or some type that has properties shape and dtype.

Returns: An initial sample state.

merge_sample_states

View source

Merges two sample states into a single state.

This can be useful if aggregation is performed hierarchically, where multiple sample states are used to accumulate records and then hierarchically merged into the final accumulated state. Typically this will be a simple sum.

Args
sample_state_1 The first sample state to merge.
sample_state_2 The second sample state to merge.

Returns
The merged sample state.

preprocess_record

View source

Preprocesses a single record.

This preprocessing is applied to one client's record, e.g. selecting vectors and clipping them to a fixed L2 norm. This method can be executed in a separate TF session, or even on a different machine, so it should not depend on any TF inputs other than those provided as input arguments. In particular, implementations should avoid accessing any TF tensors or variables that are stored in self.

Args
params The parameters for the sample. In standard DP-SGD training, the clipping norm for the sample's microbatch gradients (i.e., a maximum norm magnitude to which each gradient is clipped)
record The record to be processed. In standard DP-SGD training, the gradient computed for the examples in one microbatch, which may be the gradient for just one example (for size 1 microbatches).

Returns
A structure of tensors to be aggregated.