Attend the Women in ML Symposium on December 7

# 自定义算子

## 示例：自定义 `Sin` 算子

### 创建 TensorFlow 模型

``````import tensorflow as tf

# Define training dataset and variables
x = [-8, 0.5, 2, 2.2, 201]
y = [-0.6569866 ,  0.99749499,  0.14112001, -0.05837414,  0.80641841]
offset = tf.Variable(0.0)

# Define a simple model which just contains a custom operator named `Sin`
@tf.function
def sin(x):
return tf.sin(x + offset, name="Sin")

# Train model
optimizer = tf.optimizers.Adam(0.01)
def train(x, y):
with tf.GradientTape() as t:
predicted_y = sin(x)
loss = tf.reduce_sum(tf.square(predicted_y - y))
grads = t.gradient(loss, [offset])
optimizer.apply_gradients(zip(grads, [offset]))

for i in range(1000):
train(x, y)

print("The actual offset is: 1.0")
print("The predicted offset is:", offset.numpy())
``````
``````The actual offset is: 1.0
The predicted offset is: 1.0000001
``````

``````Error:
Some of the operators in the model are not supported by the standard TensorFlow
Lite runtime...... Here is
a list of operators for which you will need custom implementations: Sin.
``````

### 转换为 TensorFlow Lite 模型

```converter = tf.lite.TFLiteConverter.from_concrete_functions([sin.get_concrete_function(x)], sin)
<b>converter.allow_custom_ops = True</b>
tflite_model = converter.convert()
```

``````Error:
Didn't find custom operator for name 'Sin'
Registration failed.
``````

### 创建并注册算子

``````typedef struct {
void* (*init)(TfLiteContext* context, const char* buffer, size_t length);
void (*free)(TfLiteContext* context, void* buffer);
TfLiteStatus (*prepare)(TfLiteContext* context, TfLiteNode* node);
TfLiteStatus (*invoke)(TfLiteContext* context, TfLiteNode* node);
} TfLiteRegistration;
``````

``````namespace tflite {
namespace ops {
namespace custom {
TfLiteRegistration* Register_MY_CUSTOM_OP() {
static TfLiteRegistration r = {my_custom_op::Init,
my_custom_op::Free,
my_custom_op::Prepare,
my_custom_op::Eval};
return &r;
}
}  // namespace custom
}  // namespace ops
}  // namespace tflite
``````

### 在 TensorFlow Lite 运行时中定义内核

``````TfLiteStatus SinPrepare(TfLiteContext* context, TfLiteNode* node) {
using namespace tflite;
TF_LITE_ENSURE_EQ(context, NumInputs(node), 1);
TF_LITE_ENSURE_EQ(context, NumOutputs(node), 1);

const TfLiteTensor* input = GetInput(context, node, 0);
TfLiteTensor* output = GetOutput(context, node, 0);

int num_dims = NumDimensions(input);

TfLiteIntArray* output_size = TfLiteIntArrayCreate(num_dims);
for (int i=0; i<num_dims; ++i) {
output_size->data[i] = input->dims->data[i];
}

return context->ResizeTensor(context, output, output_size);
}

TfLiteStatus SinEval(TfLiteContext* context, TfLiteNode* node) {
using namespace tflite;
const TfLiteTensor* input = GetInput(context, node,0);
TfLiteTensor* output = GetOutput(context, node,0);

float* input_data = input->data.f;
float* output_data = output->data.f;

size_t count = 1;
int num_dims = NumDimensions(input);
for (int i = 0; i < num_dims; ++i) {
count *= input->dims->data[i];
}

for (size_t i=0; i<count; ++i) {
output_data[i] = sin(input_data[i]);
}
return kTfLiteOk;
}

TfLiteRegistration* Register_SIN() {
static TfLiteRegistration r = {nullptr, nullptr, SinPrepare, SinEval};
return &r;
}
``````

### 在内核库中注册算子

`OpResolver` 类会将算子代码和名称翻译成实际代码，其定义如下：

``````class OpResolver {
virtual TfLiteRegistration* FindOp(tflite::BuiltinOperator op) const = 0;
virtual TfLiteRegistration* FindOp(const char* op) const = 0;
virtual void AddBuiltin(tflite::BuiltinOperator op, TfLiteRegistration* registration) = 0;
virtual void AddCustom(const char* op, TfLiteRegistration* registration) = 0;
};
``````

``````tflite::ops::builtin::BuiltinOpResolver resolver;
``````

``````resolver.AddCustom("Sin", Register_SIN());
``````

## 最佳做法

1. 谨慎优化内存分配和取消分配。在 `Prepare` 中分配内存比在 `Invoke` 中分配更高效，并且最好在循环之前而非在每次迭代中分配内存。使用临时张量数据，而不要自己分配内存（请参阅第 2 项）。使用指针/引用而不是无节制地进行复制。

2. 如果某个数据结构在整个运算期间持续存在，建议使用临时张量预分配内存。您可能需要使用 OpData 结构来引用其他函数中的张量索引。请参阅卷积内核中的示例。示例代码段如下：

``````auto* op_data = reinterpret_cast<OpData*>(node->user_data);
TfLiteIntArrayFree(node->temporaries);
node->temporaries = TfLiteIntArrayCreate(1);
node->temporaries->data[0] = op_data->temp_tensor_index;
TfLiteTensor* temp_tensor = &context->tensors[op_data->temp_tensor_index];
temp_tensor->type =  kTfLiteFloat32;
temp_tensor->allocation_type = kTfLiteArenaRw;
``````
3. 如果不想让它浪费太多内存，最好使用静态固定大小的数组（或在 `Resize` 中预分配的 `std::vector`），而不要使用在执行的每次迭代时动态分配的 `std::vector`

4. 避免实例化尚不存在的标准库容器模板，因为它们会影响二进制文件的大小。例如，如果运算中需要在其他内核中不存在的 `std::map`，可以使用具有直接索引映射的 `std::vector`，同时保持较小的二进制文件大小。请查看其他内核使用的内容以获得深入见解（或询问）。

5. 检查指向由 `malloc` 返回的内存的指针。如果此指针是 `nullptr`，则不应使用该指针执行任何运算。如果在函数内 `malloc` 并出现退出错误，请在退出前释放内存。

6. 使用 `TF_LITE_ENSURE(context, condition)` 检查特定条件。使用 `TF_LITE_ENSURE` 时，您的代码不得将内存挂起（即，应该在分配任何可能泄漏的资源之前使用这些宏）。

[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"没有我需要的信息" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"太复杂/步骤太多" },{ "type": "thumb-down", "id": "outOfDate", "label":"内容需要更新" },{ "type": "thumb-down", "id": "translationIssue", "label":"翻译问题" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"示例/代码问题" },{ "type": "thumb-down", "id": "otherDown", "label":"其他" }]
[{ "type": "thumb-up", "id": "easyToUnderstand", "label":"易于理解" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"解决了我的问题" },{ "type": "thumb-up", "id": "otherUp", "label":"其他" }]