TensorFlow I/O

TensorFlow 14 मई को Google I/O पर वापस आ गया है! अभी पंजीकरण करें

इस पेज का अनुवाद Cloud Translation API से किया गया है.

विकास

दस्तावेज़ में विकास परिवेश स्थापित करने और विभिन्न प्लेटफार्मों पर स्रोत से tensorflow-io पैकेज बनाने के लिए आवश्यक जानकारी शामिल है। एक बार सेटअप पूरा हो जाने पर नए ऑप्स जोड़ने के दिशानिर्देशों के लिए कृपया STYLE_GUIDE देखें।

आईडीई सेटअप

TensorFlow I/O विकसित करने के लिए विज़ुअल स्टूडियो कोड को कॉन्फ़िगर करने के निर्देशों के लिए, कृपया इस दस्तावेज़ को देखें।

एक प्रकार का वृक्ष

TensorFlow I/O का कोड बेज़ल बिल्डिफ़ायर, क्लैंग फॉर्मेट, ब्लैक और प्युपग्रेड के अनुरूप है। कृपया स्रोत कोड की जाँच करने और लिंट समस्याओं की पहचान करने के लिए निम्नलिखित कमांड का उपयोग करें:

# Install Bazel version specified in .bazelversion
$ curl -OL https://github.com/bazelbuild/bazel/releases/download/$(cat .bazelversion)/bazel-$(cat .bazelversion)-installer-darwin-x86_64.sh
$ sudo bash -x -e bazel-$(cat .bazelversion)-installer-darwin-x86_64.sh
$ bazel run //tools/lint:check

बेज़ल बिल्डिफ़ायर और क्लैंग फ़ॉर्मेट के लिए, निम्न कमांड स्वचालित रूप से किसी भी लिंट त्रुटियों की पहचान करेगा और उन्हें ठीक करेगा:

$ bazel run //tools/lint:lint

वैकल्पिक रूप से, यदि आप केवल अलग-अलग लिंटर का उपयोग करके लिंट चेक करना चाहते हैं, तो आप उपरोक्त आदेशों को चुनिंदा रूप से black , pyupgrade , bazel या clang पास कर सकते हैं।

उदाहरण के लिए, black विशिष्ट लिंट की जाँच निम्न का उपयोग करके की जा सकती है:

$ bazel run //tools/lint:check -- black

बेज़ेल बिल्डिफायर और क्लैंग फॉर्मेट का उपयोग करके लिंट फिक्स किया जा सकता है:

$ bazel run //tools/lint:lint -- bazel clang

व्यक्तिगत पायथन फ़ाइल के लिए black और pyupgrade का उपयोग करके लिंट की जांच की जा सकती है:

$ bazel run //tools/lint:check -- black pyupgrade -- tensorflow_io/python/ops/version_ops.py

लिंट का उपयोग करके ब्लैक और पीयूअपग्रेड के साथ एक व्यक्तिगत पायथन फ़ाइल को ठीक करें:

$ bazel run //tools/lint:lint -- black pyupgrade --  tensorflow_io/python/ops/version_ops.py

अजगर

मैक ओएस

MacOS Catalina 10.15.7 पर, सिस्टम द्वारा प्रदान किए गए Python 3.8.2 के साथ टेंसरफ्लो-io बनाना संभव है। ऐसा करने के लिए tensorflow और bazel दोनों की आवश्यकता होती है।

#!/usr/bin/env bash

# Disable arm64 build by specifying only x86_64 arch.
# Only needed for macOS's system default python 3.8.2 on macOS 10.15.7
export ARCHFLAGS="-arch x86_64"

# Use following command to check if Xcode is correctly installed:
xcodebuild -version

# Show macOS's default python3
python3 --version

# Install Bazel version specified in .bazelversion
curl -OL https://github.com/bazelbuild/bazel/releases/download/$(cat .bazelversion)/bazel-$(cat .bazelversion)-installer-darwin-x86_64.sh
sudo bash -x -e bazel-$(cat .bazelversion)-installer-darwin-x86_64.sh

# Install tensorflow and configure bazel
sudo ./configure.sh

# Add any optimization on bazel command, e.g., --compilation_mode=opt,
#   --copt=-msse4.2, --remote_cache=, etc.
# export BAZEL_OPTIMIZATION=

# Build shared libraries
bazel build -s --verbose_failures $BAZEL_OPTIMIZATION //tensorflow_io/... //tensorflow_io_gcs_filesystem/...

# Once build is complete, shared libraries will be available in
# `bazel-bin/tensorflow_io/core`, `bazel-bin/tensorflow_io/python/ops` and
# it is possible to run tests with `pytest`, e.g.:
sudo python3 -m pip install pytest
TFIO_DATAPATH=bazel-bin python3 -m pytest -s -v tests/test_serialization.py

समस्याओं का निवारण

यदि Xcode स्थापित है, लेकिन $ xcodebuild -version अपेक्षित आउटपुट प्रदर्शित नहीं कर रहा है, तो आपको कमांड के साथ Xcode कमांड लाइन को सक्षम करने की आवश्यकता हो सकती है:

$ xcode-select -s /Applications/Xcode.app/Contents/Developer .

परिवर्तनों को प्रभावी करने के लिए टर्मिनल पुनरारंभ की आवश्यकता हो सकती है।

नमूना आउटपुट:

$ xcodebuild -version
Xcode 12.2
Build version 12B45b

लिनक्स

Linux पर टेंसरफ़्लो-io का विकास macOS के समान है। आवश्यक पैकेज जीसीसी, जी++, गिट, बेज़ेल और पायथन 3 हैं। हालांकि, डिफ़ॉल्ट सिस्टम स्थापित संस्करणों के अलावा जीसीसी या पायथन के नए संस्करणों की आवश्यकता हो सकती है।

उबंटू 20.04

Ubuntu 20.04 के लिए gcc/g++, git, और Python 3 की आवश्यकता है। निम्नलिखित निर्भरताएँ स्थापित करेगा और Ubuntu 20.04 पर साझा लाइब्रेरी का निर्माण करेगा:

#!/usr/bin/env bash

# Install gcc/g++, git, unzip/curl (for bazel), and python3
sudo apt-get -y -qq update
sudo apt-get -y -qq install gcc g++ git unzip curl python3-pip

# Install Bazel version specified in .bazelversion
curl -sSOL <a href="https://github.com/bazelbuild/bazel/releases/download/">https://github.com/bazelbuild/bazel/releases/download/</a>\\((cat .bazelversion)/bazel-\\)(cat .bazelversion)-installer-linux-x86_64.sh
sudo bash -x -e bazel-$(cat .bazelversion)-installer-linux-x86_64.sh

# Upgrade pip
sudo python3 -m pip install -U pip

# Install tensorflow and configure bazel
sudo ./configure.sh

# Alias python3 to python, needed by bazel
sudo ln -s /usr/bin/python3 /usr/bin/python

# Add any optimization on bazel command, e.g., --compilation_mode=opt,
#   --copt=-msse4.2, --remote_cache=, etc.
# export BAZEL_OPTIMIZATION=

# Build shared libraries
bazel build -s --verbose_failures $BAZEL_OPTIMIZATION //tensorflow_io/... //tensorflow_io_gcs_filesystem/...

# Once build is complete, shared libraries will be available in
# `bazel-bin/tensorflow_io/core`, `bazel-bin/tensorflow_io/python/ops` and
# it is possible to run tests with `pytest`, e.g.:
sudo python3 -m pip install pytest
TFIO_DATAPATH=bazel-bin python3 -m pytest -s -v tests/test_serialization.py

सेंटोस 8

CentOS 8 के लिए साझा लाइब्रेरी बनाने के चरण उपरोक्त Ubuntu 20.04 के समान हैं

sudo yum install -y python3 python3-devel gcc gcc-c++ git unzip which make

इसके बजाय gcc/g++, git, unzip/who (bazel के लिए), और Python3 को स्थापित करने के लिए उपयोग किया जाना चाहिए।

सेंटओएस 7

CentOS 7 पर, डिफ़ॉल्ट पायथन और gcc संस्करण टेंसरफ्लो-io की साझा लाइब्रेरी (.so) बनाने के लिए बहुत पुराने हैं। इसके बजाय डेवलपर टूलसेट और rh-python36 द्वारा प्रदान की गई gcc का उपयोग किया जाना चाहिए। साथ ही, CentOS पर स्थापित libstdc++ बनाम devtoolset के नए gcc संस्करण की विसंगति से बचने के लिए libstdc++ को स्थिर रूप से लिंक करना होगा।

इसके अलावा, फ़ाइल सिस्टम प्लगइन्स के लिए स्थिर रूप से जुड़े पुस्तकालयों में प्रतीकों के दोहराव से बचने के लिए एक विशेष ध्वज --//tensorflow_io/core:static_build को बेज़ल को पास करना होगा।

निम्नलिखित bagel, devtoolset-9, rh-python36 स्थापित करेगा, और साझा लाइब्रेरी बनाएगा:

#!/usr/bin/env bash

# Install centos-release-scl, then install gcc/g++ (devtoolset), git, and python 3
sudo yum install -y centos-release-scl
sudo yum install -y devtoolset-9 git rh-python36 make

# Install Bazel version specified in .bazelversion
curl -sSOL <a href="https://github.com/bazelbuild/bazel/releases/download/">https://github.com/bazelbuild/bazel/releases/download/</a>\\((cat .bazelversion)/bazel-\\)(cat .bazelversion)-installer-linux-x86_64.sh
sudo bash -x -e bazel-$(cat .bazelversion)-installer-linux-x86_64.sh

# Upgrade pip
scl enable rh-python36 devtoolset-9 \
    'python3 -m pip install -U pip'

# Install tensorflow and configure bazel with rh-python36
scl enable rh-python36 devtoolset-9 \
    './configure.sh'

# Add any optimization on bazel command, e.g., --compilation_mode=opt,
#   --copt=-msse4.2, --remote_cache=, etc.
# export BAZEL_OPTIMIZATION=

# Build shared libraries, notice the passing of --//tensorflow_io/core:static_build
BAZEL_LINKOPTS="-static-libstdc++ -static-libgcc" BAZEL_LINKLIBS="-lm -l%:libstdc++.a" \
  scl enable rh-python36 devtoolset-9 \
    'bazel build -s --verbose_failures $BAZEL_OPTIMIZATION --//tensorflow_io/core:static_build //tensorflow_io/...'

# Once build is complete, shared libraries will be available in
# `bazel-bin/tensorflow_io/core`, `bazel-bin/tensorflow_io/python/ops` and
# it is possible to run tests with `pytest`, e.g.:
scl enable rh-python36 devtoolset-9 \
    'python3 -m pip install pytest'

TFIO_DATAPATH=bazel-bin \
  scl enable rh-python36 devtoolset-9 \
    'python3 -m pytest -s -v tests/test_serialization.py'

डाक में काम करनेवाला मज़दूर

पायथन विकास के लिए, यहां एक संदर्भ Dockerfile का उपयोग स्रोत से TensorFlow I/O पैकेज ( tensorflow-io ) बनाने के लिए किया जा सकता है। इसके अतिरिक्त, पूर्व-निर्मित डेवेल छवियों का भी उपयोग किया जा सकता है:

# Pull (if necessary) and start the devel container
\\( docker run -it --rm --name tfio-dev --net=host -v \\){PWD}:/v -w /v tfsigio/tfio:latest-devel bash

# Inside the docker container, ./configure.sh will install TensorFlow or use existing install
(tfio-dev) root@docker-desktop:/v$ ./configure.sh

# Clean up exisiting bazel build's (if any)
(tfio-dev) root@docker-desktop:/v$ rm -rf bazel-*

# Build TensorFlow I/O C++. For compilation optimization flags, the default (-march=native)
# optimizes the generated code for your machine's CPU type.
# Reference: <a href="https://www.tensorflow.orginstall/source#configuration_options">https://www.tensorflow.orginstall/source#configuration_options</a>).

# NOTE: Based on the available resources, please change the number of job workers to:
# -j 4/8/16 to prevent bazel server terminations and resource oriented build errors.

(tfio-dev) root@docker-desktop:/v$ bazel build -j 8 --copt=-msse4.2 --copt=-mavx --compilation_mode=opt --verbose_failures --test_output=errors --crosstool_top=//third_party/toolchains/gcc7_manylinux2010:toolchain //tensorflow_io/... //tensorflow_io_gcs_filesystem/...


# Run tests with PyTest, note: some tests require launching additional containers to run (see below)
(tfio-dev) root@docker-desktop:/v$ pytest -s -v tests/
# Build the TensorFlow I/O package
(tfio-dev) root@docker-desktop:/v$ python setup.py bdist_wheel

निर्माण सफल होने के बाद एक पैकेज फ़ाइल dist/tensorflow_io-*.whl उत्पन्न की जाएगी।

पायथन व्हील्स

निम्नलिखित कमांड के साथ बेज़ेल बिल्ड पूरा होने के बाद पायथन व्हील्स बनाना संभव है:

$ python setup.py bdist_wheel --data bazel-bin

.whl फ़ाइल dist निर्देशिका में उपलब्ध होगी। ध्यान दें कि आवश्यक शेयर ऑब्जेक्ट का पता लगाने के लिए setup.py के लिए बेज़ल बाइनरी निर्देशिका bazel-bin --data आर्ग्स के साथ पारित किया जाना चाहिए, क्योंकि bazel-bin tensorflow_io पैकेज निर्देशिका के बाहर है।

वैकल्पिक रूप से, स्रोत इंस्टॉल इसके साथ किया जा सकता है:

$ TFIO_DATAPATH=bazel-bin python -m pip install .

TFIO_DATAPATH=bazel-bin उसी कारण से पारित किया गया।

ध्यान दें -e के साथ इंस्टॉल करना उपरोक्त से अलग है।

$ TFIO_DATAPATH=bazel-bin python -m pip install -e .

TFIO_DATAPATH=bazel-bin के साथ भी साझा ऑब्जेक्ट स्वचालित रूप से इंस्टॉल नहीं होगा। इसके बजाय, इंस्टॉल के बाद हर बार प्रोग्राम चलाने TFIO_DATAPATH=bazel-bin पास करना होगा:

$ TFIO_DATAPATH=bazel-bin python

>>> import tensorflow_io as tfio
>>> ...

परिक्षण

कुछ परीक्षणों को चलाने से पहले एक परीक्षण कंटेनर लॉन्च करने या संबंधित टूल का एक स्थानीय उदाहरण शुरू करने की आवश्यकता होती है। उदाहरण के लिए, काफ्का से संबंधित परीक्षण चलाने के लिए जो काफ्का, ज़ूकीपर और स्कीमा-रजिस्ट्री का एक स्थानीय उदाहरण शुरू करेगा, इसका उपयोग करें:

# Start the local instances of kafka, zookeeper and schema-registry
$ bash -x -e tests/test_kafka/kafka_test.sh

# Run the tests
$ TFIO_DATAPATH=bazel-bin pytest -s -vv tests/test_kafka.py

Elasticsearch या MongoDB जैसे टूल से जुड़े Datasets परीक्षण करने के लिए सिस्टम पर डॉकर का उपलब्ध होना आवश्यक है। ऐसे परिदृश्यों में, उपयोग करें:

# Start elasticsearch within docker container
$ bash tests/test_elasticsearch/elasticsearch_test.sh start

# Run the tests
$ TFIO_DATAPATH=bazel-bin pytest -s -vv tests/test_elasticsearch.py

# Stop and remove the container
$ bash tests/test_elasticsearch/elasticsearch_test.sh stop

इसके अतिरिक्त, tensorflow-io की कुछ विशेषताओं का परीक्षण करने के लिए आपको किसी अतिरिक्त टूल को स्पिन करने की आवश्यकता नहीं है क्योंकि डेटा tests निर्देशिका में ही प्रदान किया गया है। उदाहरण के लिए, parquet डेटासेट से संबंधित परीक्षण चलाने के लिए, इसका उपयोग करें:

# Just run the test
$ TFIO_DATAPATH=bazel-bin pytest -s -vv tests/test_parquet.py

आर

हम यहां आपके लिए एक संदर्भ Dockerfile प्रदान करते हैं ताकि आप परीक्षण के लिए सीधे R पैकेज का उपयोग कर सकें। आप इसे इसके माध्यम से बना सकते हैं:

$ docker build -t tfio-r-dev -f R-package/scripts/Dockerfile .

कंटेनर के अंदर, आप अपना आर सत्र शुरू कर सकते हैं, एक उदाहरण Hadoop SequenceFile string.seq से SequenceFileDataset तुरंत चालू कर सकते हैं, और फिर डेटासेट पर tfdatasets पैकेज द्वारा प्रदान किए गए किसी भी परिवर्तन फ़ंक्शन का उपयोग निम्न की तरह कर सकते हैं:

library(tfio)
dataset <- sequence_file_dataset("R-package/tests/testthat/testdata/string.seq") %>%
    dataset_repeat(2)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})