AI - tensorflow 編譯 (Ubuntu 22.04.2 LTS)

因為要使用的 PC 沒有 AVX、AVX2 所以要自己編譯

不然,執行 shell 會看到這個錯

Illegal instruction (core dumped)


再打 dmesg 會看到這個錯

[155313.630777] traps: ttt.py[177439] trap invalid opcode ip:7f7d5c39f6ea sp:7ffffbbdca40
error:0 in libtensorflow_framework.so.2[7f7d5b402000+12e4000]


要是沒有前面的問題,使用簡易安裝即可

pip3 install tensorflow
pip3 install tensorflow==2.13.0rc2  
 // 也可以指定安裝版本



執行系統


CPUIntel(R) Atom(TM) CPU C3958 @ 2.00GHz (16核心)
Memory16G
作業系統:Ubuntu 22.04.2 LTS


官方說明 - 從原始碼開始建構


https://www.tensorflow.org/install/source?hl=zh-tw



安裝 bazelisk

官方網址,可看到最新版本 (原來只要下載就可以執行了...)

https://github.com/bazelbuild/bazelisk
https://github.com/bazelbuild/bazelisk/releases



Linux

    wget https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-linux-amd64
    chmod +x bazelisk-linux-amd64
    mv bazelisk-linux-amd64 /usr/local/bin/bazel


Mac (據說這個是給MAC用的,但我沒試過)

wget https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-darwin-amd64



編譯 tensorflow


git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout v2.13.0

註:tensorflow 2.13.0 用的是 https://releases.bazel.build/5.3.0/release/bazel-5.3.0-linux-x86_64

找到最佳的 march

gcc -march=native -Q --help=target | grep march


設定 tensorflow 環境

./configure

在出現下列這行時,輸入剛剛的最佳march,其他用預設值即可(可能這個也可以使用預設值)

Please specify optimization flags to use during compilation when bazel option "--config=opt"
is specified [Default is -Wno-sign-compare]: -march=goldmont


編譯參數

/usr/local/bin/bazel build -c opt --copt=-march=goldmont --copt=-msse3 --copt=-msse4.1 \
--copt=-msse4.2 --copt=-mpclmul --copt=-mpopcnt --copt=-maes --copt=-mno-avx \
--copt=-mno-avx2 //tensorflow/tools/pip_package:build_pip_package --local_ram_resources=2048

(參考別人可以過的參數。記憶體不足要記得加 --local_ram_resources=2048,我16G還是得下...)

結果終於成功了 (試了三天,每次編譯要3~4個小時)

INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=178
INFO: Reading rc options for 'build' from /home/nusoft/tensorflow/.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /home/nusoft/tensorflow/.bazelrc:
'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from /home/nusoft/tensorflow/.tf_configure.bazelrc:
'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/bin/python3
INFO: Reading rc options for 'build' from /home/nusoft/tensorflow/.bazelrc:
'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils,tensorflow/core/tfrt/utils/debug
INFO: Found applicable config definition build:short_logs in file /home/nusoft/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /home/nusoft/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:linux in file /home/nusoft/tensorflow/.bazelrc: --define=build_with_onednn_v2=true --host_copt=-w --copt=-Wno-all --copt=-Wno-extra --copt=-Wno-deprecated --copt=-Wno-deprecated-declarations --copt=-Wno-ignored-attributes --copt=-Wno-array-bounds --copt=-Wunused-result --copt=-Werror=unused-result --copt=-Wswitch --copt=-Werror=switch --copt=-Wno-error=unused-but-set-variable --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++17 --host_cxxopt=-std=c++17 --config=dynamic_kernels --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /home/nusoft/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (612 packages loaded, 38194 targets configured).
INFO: Found 1 target...
Target //tensorflow/tools/pip_package:build_pip_package up-to-date:
bazel-bin/tensorflow/tools/pip_package/build_pip_package
INFO: Elapsed time: 14931.350s, Critical Path: 607.39s
INFO: 9784 processes: 40 internal, 9744 local.
INFO: Build completed successfully, 9784 total actions


建立whl檔+安裝+測試

    ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
    pip3 install tensorflow-version-tags.whl
    python3 -c "import tensorflow as tf; print(tf.__version__)"


clean 下法

    /usr/local/bin/bazel clean



錯誤歷程


ERROR: /home/nusoft/tensorflow/tensorflow/compiler/mlir/tensorflow/BUILD:
457:11: Compiling tensorflow/compiler/mlir/tensorflow/ir/tf_ops_n_z.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 210 arguments skipped)
gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target
//tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3393.686s
, Critical Path: 2151.43s
INFO:
3080 processes: 533 internal, 2547 local.
FAILED: Build did NOT complete successfully

(這是記憶體不足,參數要多 --local_ram_resources=2048,也可以試看看  --jobs=5)



ERROR: /home/nusoft/tensorflow/tensorflow/compiler/mlir/tosa/BUILD:
125:11: Compiling tensorflow/compiler/mlir/tosa/transforms/legalize_tf.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 323 arguments skipped)
gcc: fatal error: Killed signal terminated program cc1plus
compilation terminated.
Target
//tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1054.380s
, Critical Path: 536.06s
INFO:
173 processes: 18 internal, 155 local.
FAILED: Build did NOT complete successfully

(一樣是記憶體不足,看來死掉的地方每次都不一樣 )


ERROR: /home/nusoft/tensorflow/tensorflow/BUILD:1646:19: Action tensorflow/_api/v2/v2.py failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)
2023-08-10 04:04:29.289891: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions:
SSE3 SSE4.1 SSE4.2, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File
"/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/python/tools/api/generator/create_python_api.py", line 22, in <module>
from tensorflow.python.tools.api.generator import doc_srcs
File
"/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/python/__init__.py", line 37, in <module>
from tensorflow.python.eager import context
File
"/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/python/eager/context.py", line 29, in <module>
from tensorflow.core.framework import function_pb2
File
"/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/core/framework/function_pb2.py", line 5, in <module>
from google.protobuf.internal import builder as _builder
ImportError: cannot import name
'builder' from 'google.protobuf.internal' (/home/nusoft/hs/lib/python3.10/site-packages/google/protobuf/internal/__init__.py)
Target
//tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/nusoft/tensorflow/tensorflow/python/tools/BUILD:
100:10 Middleman _middlemen/tensorflow_Spython_Stools_Simport_Upb_Uto_Utensorboard-runfiles failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)
INFO: Elapsed time: 6614.055s
, Critical Path: 1471.55s
INFO:
3662 processes: 356 internal, 3306 local.
FAILED: Build did NOT complete successfully

(感覺參數要多 --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 )


ERROR: /home/nusoft/tensorflow/tensorflow/BUILD:1646:19: Action tensorflow/_api/v2/v2.py failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/python/tools/api/generator/create_python_api.py", line 22, in <module>
from tensorflow.python.tools.api.generator import doc_srcs
File "/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/python/__init__.py", line 37, in <module>
from tensorflow.python.eager import context
File "/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/python/eager/context.py", line 29, in <module>
from tensorflow.core.framework import function_pb2
File "/root/.cache/bazel/_bazel_root/aec442bb1877aaa4d66dd2a84688b4d5/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/create_tensorflow.python_api_tf_python_api_gen_v2.runfiles/org_tensorflow/tensorflow/core/framework/function_pb2.py", line 5, in <module>
from google.protobuf.internal import builder as _builder
ImportError:
cannot import name 'builder' from 'google.protobuf.internal' (/home/nusoft/hs/lib/python3.10/site-packages/google/protobuf/internal/__init__.py)
Target
//tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/nusoft/tensorflow/tensorflow/lite/python/BUILD:
72:10 Middleman _middlemen/tensorflow_Slite_Spython_Stflite_Uconvert-runfiles failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)
INFO: Elapsed time: 14405.599s
, Critical Path: 862.67s
INFO:
8431 processes: 19 internal, 8412 local.
FAILED: Build did NOT complete successfully

(不確定是不是這行的問題,但是單獨下python3 -c "from google.protobuf.internal import builder as _builder" 是好的啊....。又試了幾種參數,再編譯結果還是一樣...)

更新protobuf看看... 指令:pip3 install --upgrade protobuf

$
pip3 install --upgrade protobuf
Requirement already satisfied: protobuf in /usr/local/lib/python3
.10/dist-packages (4.23.4)
Collecting protobuf
Downloading protobuf-4.24.0-cp37-abi3-manylinux2014_x86_64.whl (
311 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
311.6/311.6 KB 1.1 MB/s eta 0:00:00
Installing collected packages: protobuf
Attempting uninstall: protobuf
Found existing installation: protobuf 4.23.4
Uninstalling protobuf-4.23.4:
Successfully uninstalled protobuf-4.23.4
Successfully installed protobuf-4.24.0


有人說可能是  packaging 的問題,看起來我這裡並不是,因為無法更新了

$ pip3 install --upgrade packaging
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (23.1)


我最後成功前的動作是


1. 重新下載tensorflow v2.13.0
2. ./configure --config=opt 改成 -march=goldmont
3. pip3 install --upgrade protobuf
# /usr/local/bin/bazel build -c opt --copt=-march=goldmont --copt=-msse3 --copt=-msse4.1 --copt=-msse4.2 --copt=-mpclmul --copt=-mpopcnt --copt=-maes --copt=-mno-avx --copt=-mno-avx2 //tensorflow/tools/pip_package:build_pip_package --local_ram_resources=2048

(這三件事我是同時做完,打算同時測試的,結果成功了。我也不知道是哪一步做了才可以的,覺得第3點的幾率比較大些...)


祝你幸運囉~




沒有留言:

張貼留言