tfx.orchestration.kubeflow.v2.components.experimental.ai_platform_training_component.create_ai_platform_training

Creates a pipeline step that launches a AIP training job.

The generated TFX component will have a component spec specified dynamically, through inputs/outputs/parameters in the following format:

  • inputs: A mapping from input name to the upstream channel connected. The artifact type of the channel will be automatically inferred.
  • outputs: A mapping from output name to the associated artifact type.
  • parameters: A mapping from execution property names to its associated value. Only primitive typed values are supported. Note that RuntimeParameter is not supported yet.

For example:

create_ai_platform_training( ... inputs: { # Assuming there is an upstream node example_gen, with an output # 'examples' of the type Examples. 'examples': example_gen.outputs['examples'], }, outputs: { 'model': standard_artifacts.Model, }, parameters: { 'n_steps': 100, 'optimizer': 'sgd', } ... )

will generate a component instance with a component spec equivalent to:

class MyComponentSpec(ComponentSpec): INPUTS = { 'examples': ChannelParameter(type=standard_artifacts.Examples) } OUTPUTS = { 'model': ChannelParameter(type=standard_artifacts.Model) } PARAMETERS = { 'n_steps': ExecutionParameter(type=int), 'optimizer': ExecutionParameter(type=str) }

with its input 'examples' is connected to the example_gen output, and execution properties specified as 100 and 'sgd' respectively.

Example usage of the component: # A single node training job. my_train = create_ai_platform_training( name='my_training_step', project_id='my-project', region='us-central1', image_uri='gcr.io/my-project/caip-training-test:latest', 'args': [ '--examples', placeholders.InputUriPlaceholder('examples'), '--n-steps', placeholders.InputValuePlaceholder('n_step'), '--output-location', placeholders.OutputUriPlaceholder('model') ] scale_tier='BASIC_GPU', inputs={'examples': example_gen.outputs['examples']}, outputs={ 'model': standard_artifacts.Model }, parameters={'n_step': 100} )

# More complex setting can be expressed by providing training_input # directly. my_distributed_train = create_ai_platform_training( name='my_training_step', project_id='my-project', training_input={ 'scaleTier': 'CUSTOM', 'region': 'us-central1', 'masterType': 'n1-standard-8', 'masterConfig': { 'imageUri': 'gcr.io/my-project/my-dist-training:latest' }, 'workerType': 'n1-standard-8', 'workerCount': 8, 'workerConfig': { 'imageUri': 'gcr.io/my-project/my-dist-training:latest' }, 'args': [ '--examples', placeholders.InputUriPlaceholder('examples'), '--n-steps', placeholders.InputValuePlaceholder('n_step'), '--output-location', placeholders.OutputUriPlaceholder('model') ] }, inputs={'examples': example_gen.outputs['examples']}, outputs={'model': Model}, parameters={'n_step': 100} )

name name of the component. This is needed to construct the component spec and component class dynamically as well.
project_id the GCP project under which the AIP training job will be running.
region GCE region where the AIP training job will be running.
job_id the unique ID of the job. Default to 'tfx_%Y%m%d%H%M%S'
image_uri the GCR location of the container image, which will be used to execute the training program. If the same field is specified in training_input, the latter overrides image_uri.
args command line arguments that will be passed into the training program. Users can use placeholder semantics as in tfx.dsl.component.experimental.container_component to wire the args with component inputs/outputs/parameters.
scale_tier Cloud ML resource requested by the job. See https://cloud.google.com/ai-platform/training/docs/reference/rest/v1/projects.jobs#ScaleTier
training_input full training job spec. This field overrides other specifications if applicable. This field follows the TrainingInput schema.
labels user-specified label attached to the job.
inputs the dict of component inputs.
outputs the dict of component outputs.
parameters the dict of component parameters, aka, execution properties.

A component instance that represents the AIP job in the DSL.

ValueError when image_uri is missing and masterConfig is not specified in training_input, or when region is missing and training_input does not provide region either.
TypeError when non-primitive parameters are specified.