tfx.orchestration.metadata.Metadata

View source on GitHub

Class Metadata

Helper class to handle metadata I/O.

__init__

View source

__init__(connection_config)

Initialize self. See help(type(self)) for accurate signature.

Properties

store

Returns underlying MetadataStore.

Raises:

  • RuntimeError: if this instance is not in enter state.

Methods

__enter__

View source

__enter__()

__exit__

View source

__exit__(
    exc_type,
    exc_value,
    exc_tb
)

check_artifact_state

View source

check_artifact_state(
    artifact,
    expected_states
)

fetch_previous_result_artifacts

View source

fetch_previous_result_artifacts(
    output_dict,
    execution_id
)

Fetches output with artifact ids produced by a previous run.

Args:

  • output_dict: a dict from name to a list of output Artifact objects.
  • execution_id: the id of the execution that produced the outputs.

Returns:

Original output_dict with artifact id inserted.

Raises:

  • RuntimeError: path change without clean metadata.

get_all_artifacts

View source

get_all_artifacts()

get_artifacts_by_type

View source

get_artifacts_by_type(type_name)

get_artifacts_by_uri

View source

get_artifacts_by_uri(uri)

get_execution_states

View source

get_execution_states(
    pipeline_name,
    run_id
)

Get components execution states.

Args:

  • pipeline_name: name of the pipeline.
  • run_id: identifier of the target pipeline run.

Returns:

A Dict of component id to its state mapping.

previous_execution

View source

previous_execution(
    input_artifacts,
    exec_properties,
    pipeline_info,
    component_info
)

Gets eligible previous execution that takes the same inputs.

An eligible execution should take the same inputs, execution properties and with the same pipeline and component properties.

Args:

  • input_artifacts: inputs used by the run.
  • exec_properties: execution properties used by the run.
  • pipeline_info: info of the current pipeline run.
  • component_info: info of the current component.

Returns:

Execution id of previous run that takes the input dict. None if not found.

publish_artifacts

View source

publish_artifacts(raw_artifact_list)

Publish a list of artifacts if any is not already published.

publish_execution

View source

publish_execution(
    execution_id,
    input_dict,
    output_dict,
    state=EXECUTION_STATE_COMPLETE
)

Publish an execution with input and output artifacts info.

Args:

  • execution_id: id of execution to be published.
  • input_dict: inputs artifacts used by the execution with id ready.
  • output_dict: output artifacts produced by the execution without id.
  • state: optional state of the execution, default to be EXECUTION_STATE_COMPLETE.

Returns:

Updated outputs with artifact ids.

Raises:

  • RuntimeError: If any output artifact already has id set.

register_execution

View source

register_execution(
    exec_properties,
    pipeline_info,
    component_info,
    run_context_id=None
)

Create a new execution in metadata.

Args:

  • exec_properties: the execution properties of the execution.
  • pipeline_info: optional pipeline info of the execution.
  • component_info: optional component info of the execution.
  • run_context_id: context id for current run, link it with execution if provided.

Returns:

execution id of the new execution.

register_run_context_if_not_exists

View source

register_run_context_if_not_exists(pipeline_info)

Create or get the context for current pipeline run.

Args:

  • pipeline_info: pipeline information for current run.

Returns:

context id of the current run.

search_artifacts

View source

search_artifacts(
    artifact_name,
    pipeline_name,
    run_id,
    producer_component_id
)

Search artifacts that matches given info.

Args:

  • artifact_name: the name of the artifact that set by producer component. The name is logged both in artifacts and the events when the execution being published.
  • pipeline_name: the name of the pipeline that produces the artifact
  • run_id: the run id of the pipeline run that produces the artifact
  • producer_component_id: the id of the component that produces the artifact

Returns:

A list of Artifacts that matches the given info

Raises:

  • RuntimeError: when no matching execution is found given producer info.

update_artifact_state

View source

update_artifact_state(
    artifact,
    new_state
)