Category: IT

Setting Up an OpenCV CUDA Dev Environment with Docker and Debugging via VSCode

Docker OpenCV CUDA Python VSCode RTX 3060

To use CUDA acceleration with OpenCV, pip install opencv-python simply won’t cut it. CUDA support requires building OpenCV from source. In this post, I’ll show you how to set up a clean, isolated development environment using Docker — without polluting your host system — and debug your code directly inside the container using VSCode.

Why Can’t We Just Use pip install?

The opencv-python package on PyPI is a generic build with no CUDA support. CUDA features must be enabled at compile time by linking against the CUDA libraries.

Method	CUDA Support	Notes
pip install opencv-python	❌	Generic build, CPU only
Build from source (host)	✅	Pollutes host environment
Docker + build from source	✅	Isolated, clean — recommended

1 Prerequisites

Make sure the following are installed on your host machine:

NVIDIA Driver (verify with nvidia-smi)
Docker
nvidia-container-toolkit

# Install nvidia-container-toolkit
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker

# Verify GPU is accessible inside Docker
docker run --gpus all --rm nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi

💡 Note
You do NOT need to install the CUDA Toolkit (nvcc) on your host. It’s already included inside the Docker image.

2 Run the NVIDIA Official CUDA Image

docker run --gpus all -it \
  --name opencv-cuda \
  -v /your/project/path:/workspace \
  nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04 \
  bash

Verify GPU and nvcc inside the container:

nvcc --version
nvidia-smi

✅ Expected Output

nvcc: NVIDIA (R) Cuda compiler driver
Cuda compilation tools, release 12.8

NVIDIA GeForce RTX 3060  |  CUDA Version: 12.8

3 Install Dependencies

If apt is slow, switch to a faster mirror first:

# Switch to a faster mirror (optional)
sed -i 's/archive.ubuntu.com/mirror.kakao.com/g' /etc/apt/sources.list
sed -i 's/security.ubuntu.com/mirror.kakao.com/g' /etc/apt/sources.list

apt update && apt install -y \
  python3 python3-pip python3-dev \
  cmake git g++ \
  libgtk2.0-dev pkg-config \
  libavcodec-dev libavformat-dev libswscale-dev

pip3 install numpy

4 Build OpenCV from Source

cd /workspace

git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git

cd opencv && mkdir build && cd build

cmake .. \
  -D WITH_CUDA=ON \
  -D OPENCV_CUDA_ARCH_BIN="8.6" \
  -D CUDA_ARCH_BIN="8.6" \
  -D CUDA_ARCH_PTX="" \
  -D OPENCV_EXTRA_MODULES_PATH=/workspace/opencv_contrib/modules \
  -D WITH_CUBLAS=ON \
  -D BUILD_opencv_python3=ON \
  -D CMAKE_BUILD_TYPE=Release

💡 CUDA_ARCH_BIN by GPU
RTX 3060 → 8.6 | RTX 3090 → 8.6 | RTX 4090 → 8.9 | RTX 5070 Ti → 8.9 ~ 9.0

After cmake completes, verify these lines in the output:

--   NVIDIA CUDA:   YES (ver 12.8, CUFFT CUBLAS)  ✅
--     NVIDIA GPU arch:  86                        ✅
--   cuDNN:          YES (ver 9.7.0)               ✅
--   Python 3:
--     Libraries:    /usr/lib/.../libpython3.10.so ✅
--     numpy:        .../numpy/_core/include       ✅

Build and install (takes 30min ~ 1hr):

make -j$(nproc)
make install

5 Verify the Build

python3 -c "
import cv2
print('OpenCV version:', cv2.__version__)
print('CUDA devices:', cv2.cuda.getCudaEnabledDeviceCount())
"

✅ Success

OpenCV version: 4.14.0-pre
CUDA devices: 1

6 Save the Container as an Image

If you exit the container, everything will be lost (due to –rm). Commit it as a reusable image from a new host terminal:

# On the host
docker ps  # Get container ID
docker commit <container_id> opencv-cuda:latest

# Verify
docker images | grep opencv-cuda

⚠️ Image Size
The resulting image will be around 13GB. Using a multi-stage Dockerfile build can reduce this to 4–5GB.

7 Debug Inside the Container with VSCode

Install these two VSCode extensions:

Remote – SSH (already installed)
Dev Containers (install additionally)

Start the container:

docker run --gpus all -it \
  --name opencv-cuda \
  -v /your/project/path:/workspace \
  opencv-cuda:latest bash

In VSCode: click the blue button at the bottom-left → Attach to Running Container → select opencv-cuda

Create .vscode/launch.json:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python Debugger: Current File",
      "type": "debugpy",
      "request": "launch",
      "program": "${file}",
      "console": "integratedTerminal"
    }
  ]
}

8 CPU vs GPU Speed Comparison

First Attempt — Single GaussianBlur

import cv2
import numpy as np
import time

img = np.random.randint(0, 255, (4096, 4096, 3), dtype=np.uint8)

# CPU
start = time.time()
result_cpu = cv2.GaussianBlur(img, (21, 21), 0)
cpu_time = time.time() - start
print(f"CPU time: {cpu_time:.4f}s")

# GPU
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img)
start = time.time()
gpu_filter = cv2.cuda.createGaussianFilter(cv2.CV_8UC3, cv2.CV_8UC3, (21, 21), 0)
result_gpu = gpu_filter.apply(gpu_img)
result_gpu.download()
gpu_time = time.time() - start
print(f"GPU time: {gpu_time:.4f}s")

Result

CPU time: 0.0437s
GPU time: 0.0982s
GPU is 0.4x slower 😅

The GPU was slower because the upload/download overhead exceeded the actual computation time for a single operation.

Second Attempt — Chained Operations (with Error)

laplacian_filter = cv2.cuda.createLaplacianFilter(
    cv2.CV_8UC3, cv2.CV_8UC3  # ← 3-channel attempt
)

❌ Error

OpenCV Error: (-215:Assertion failed) scn == 1 || scn == 4
in function 'LinearFilter'

Cause: The CUDA Laplacian filter only supports 1-channel (grayscale) or 4-channel images. 3-channel BGR images are not supported.

Fixed Code — Grayscale + 5 Chained Operations

import cv2
import numpy as np
import time

img = np.random.randint(0, 255, (4096, 4096, 3), dtype=np.uint8)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # Convert to 1 channel

# CPU
start = time.time()
result = img_gray.copy()
for _ in range(5):
    result = cv2.GaussianBlur(result, (21, 21), 0)
    result = cv2.Laplacian(result, cv2.CV_8U)
    result = cv2.GaussianBlur(result, (21, 21), 0)
cpu_time = time.time() - start
print(f"CPU time: {cpu_time:.4f}s")

# GPU — upload once, compute 15 times, download once
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img_gray)

gaussian_filter = cv2.cuda.createGaussianFilter(cv2.CV_8UC1, cv2.CV_8UC1, (21, 21), 0)
laplacian_filter = cv2.cuda.createLaplacianFilter(cv2.CV_8UC1, cv2.CV_8UC1)

start = time.time()
gpu_result = gpu_img
for _ in range(5):
    gpu_result = gaussian_filter.apply(gpu_result)
    gpu_result = laplacian_filter.apply(gpu_result)
    gpu_result = gaussian_filter.apply(gpu_result)
result = gpu_result.download()
gpu_time = time.time() - start
print(f"GPU time: {gpu_time:.4f}s")
print(f"Speedup: {cpu_time/gpu_time:.1f}x")

✅ Final Result

CPU time: 0.1283s
GPU time: 0.0553s
Speedup: 2.3x 🎉

Key Takeaways

Point	Detail
pip install opencv	No CUDA — must build from source
Why Docker	Isolated environment, host stays clean
GPU slower than CPU	upload/download overhead > computation time
GPU faster than CPU	More chained operations = better GPU efficiency
Laplacian error	CUDA only supports CV_8UC1 (grayscale), not BGR
VSCode debugging	Dev Containers lets you F5-debug inside a container

April 16, 2026

OpenCV CUDA 개발환경 Docker로 구축하고 VSCode에서 디버깅까지

Docker OpenCV CUDA Python VSCode RTX 3060

OpenCV에서 CUDA 가속을 쓰려면 pip install opencv-python으로는 안 됩니다. CUDA 지원은 소스에서 직접 빌드해야 활성화됩니다. 하지만 호스트 시스템을 더럽히지 않고, Docker를 이용해 깔끔하게 환경을 구축하고 VSCode에서 디버깅까지 하는 방법을 정리했습니다.

왜 pip install로는 안 되나?

PyPI에 올라온 opencv-python은 범용 빌드라 CUDA가 빠져 있습니다. OpenCV는 빌드 시점에 CUDA 라이브러리를 링크해서 컴파일해야 CUDA 기능이 활성화됩니다.

방법	CUDA 지원	비고
pip install opencv-python	❌	범용 빌드, CPU만
소스 직접 빌드	✅	호스트 환경 오염
Docker + 소스 빌드	✅	격리, 깔끔, 추천

1 환경 준비

호스트에 다음이 설치되어 있어야 합니다.

NVIDIA 드라이버 (nvidia-smi로 확인)
Docker
nvidia-container-toolkit

# nvidia-container-toolkit 설치
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker

# GPU가 Docker에서 보이는지 확인
docker run --gpus all --rm nvidia/cuda:12.8.0-base-ubuntu22.04 nvidia-smi

💡 참고
CUDA Toolkit(nvcc)은 호스트에 설치하지 않아도 됩니다. Docker 이미지 안에 포함되어 있습니다.

2 NVIDIA 공식 CUDA 이미지로 컨테이너 실행

docker run --gpus all -it \
  --name opencv-cuda \
  -v /your/project/path:/workspace \
  nvidia/cuda:12.8.0-cudnn-devel-ubuntu22.04 \
  bash

컨테이너 안에서 GPU 및 nvcc 확인:

nvcc --version
nvidia-smi

✅ 정상 출력

nvcc: NVIDIA (R) Cuda compiler driver
Cuda compilation tools, release 12.8

NVIDIA GeForce RTX 3060  |  CUDA Version: 12.8

3 의존성 설치

apt 속도가 느리다면 카카오 미러로 변경합니다.

# 카카오 미러로 변경 (한국 사용자 권장)
sed -i 's/archive.ubuntu.com/mirror.kakao.com/g' /etc/apt/sources.list
sed -i 's/security.ubuntu.com/mirror.kakao.com/g' /etc/apt/sources.list

apt update && apt install -y \
  python3 python3-pip python3-dev \
  cmake git g++ \
  libgtk2.0-dev pkg-config \
  libavcodec-dev libavformat-dev libswscale-dev

pip3 install numpy

4 OpenCV 소스 빌드

cd /workspace

git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git

cd opencv && mkdir build && cd build

cmake .. \
  -D WITH_CUDA=ON \
  -D OPENCV_CUDA_ARCH_BIN="8.6" \
  -D CUDA_ARCH_BIN="8.6" \
  -D CUDA_ARCH_PTX="" \
  -D OPENCV_EXTRA_MODULES_PATH=/workspace/opencv_contrib/modules \
  -D WITH_CUBLAS=ON \
  -D BUILD_opencv_python3=ON \
  -D CMAKE_BUILD_TYPE=Release

💡 CUDA_ARCH_BIN GPU별 값
RTX 3060 → 8.6 | RTX 3090 → 8.6 | RTX 4090 → 8.9 | RTX 5070 Ti → 8.9 ~ 9.0

cmake 완료 후 반드시 아래 항목 확인:

--   NVIDIA CUDA:   YES (ver 12.8, CUFFT CUBLAS)  ✅
--     NVIDIA GPU arch:  86                        ✅
--   cuDNN:          YES (ver 9.7.0)               ✅
--   Python 3:
--     Libraries:    /usr/lib/.../libpython3.10.so ✅
--     numpy:        .../numpy/_core/include       ✅

빌드 및 설치 (30분~1시간 소요):

make -j$(nproc)
make install

5 빌드 확인

python3 -c "
import cv2
print('OpenCV version:', cv2.__version__)
print('CUDA devices:', cv2.cuda.getCudaEnabledDeviceCount())
"

✅ 성공 출력

OpenCV version: 4.14.0-pre
CUDA devices: 1

6 이미지 저장

컨테이너를 나가면 빌드한 내용이 사라집니다. 호스트의 새 터미널에서 커밋해 이미지로 저장합니다.

# 호스트에서
docker ps  # 컨테이너 ID 확인
docker commit <container_id> opencv-cuda:latest

# 확인
docker images | grep opencv-cuda

⚠️ 이미지 크기
빌드 결과물이 모두 포함되어 약 13GB 정도 됩니다. 멀티스테이지 빌드를 사용하면 4~5GB로 줄일 수 있습니다.

7 VSCode Dev Containers로 디버깅 연결

VSCode 확장 2개를 설치합니다.

Remote – SSH (기존)
Dev Containers (추가 설치)

컨테이너를 실행한 후:

docker run --gpus all -it \
  --name opencv-cuda \
  -v /your/project/path:/workspace \
  opencv-cuda:latest bash

VSCode 왼쪽 하단 파란 버튼 → Attach to Running Container → opencv-cuda 선택

.vscode/launch.json 생성:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python Debugger: Current File",
      "type": "debugpy",
      "request": "launch",
      "program": "${file}",
      "console": "integratedTerminal"
    }
  ]
}

8 CPU vs GPU 속도 비교 테스트

첫 번째 시도 – 단순 GaussianBlur 1회

import cv2
import numpy as np
import time

img = np.random.randint(0, 255, (4096, 4096, 3), dtype=np.uint8)

# CPU
start = time.time()
result_cpu = cv2.GaussianBlur(img, (21, 21), 0)
cpu_time = time.time() - start
print(f"CPU 시간: {cpu_time:.4f}초")

# GPU
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img)
start = time.time()
gpu_filter = cv2.cuda.createGaussianFilter(cv2.CV_8UC3, cv2.CV_8UC3, (21, 21), 0)
result_gpu = gpu_filter.apply(gpu_img)
result_gpu.download()
gpu_time = time.time() - start
print(f"GPU 시간: {gpu_time:.4f}초")

결과

CPU 시간: 0.0437초
GPU 시간: 0.0982초
속도 차이: 0.4배 (GPU가 더 느림 😅)

GPU가 더 느린 이유는 upload/download 전송 오버헤드가 연산 시간보다 크기 때문입니다.

두 번째 시도 – 연속 연산 (Laplacian 에러 발생)

laplacian_filter = cv2.cuda.createLaplacianFilter(
    cv2.CV_8UC3, cv2.CV_8UC3  # ← 3채널로 생성 시도
)

❌ 에러 발생

OpenCV Error: (-215:Assertion failed) scn == 1 || scn == 4
in function 'LinearFilter'

원인: CUDA Laplacian 필터는 1채널(그레이스케일) 또는 4채널만 지원합니다. 3채널(BGR) 컬러 이미지는 지원하지 않습니다.

수정 코드 – 그레이스케일 변환 후 연속 연산 5회

import cv2
import numpy as np
import time

img = np.random.randint(0, 255, (4096, 4096, 3), dtype=np.uint8)
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # 1채널 변환

# CPU
start = time.time()
result = img_gray.copy()
for _ in range(5):
    result = cv2.GaussianBlur(result, (21, 21), 0)
    result = cv2.Laplacian(result, cv2.CV_8U)
    result = cv2.GaussianBlur(result, (21, 21), 0)
cpu_time = time.time() - start
print(f"CPU 시간: {cpu_time:.4f}초")

# GPU - 업로드 1번, 연산 15번, 다운로드 1번
gpu_img = cv2.cuda_GpuMat()
gpu_img.upload(img_gray)

gaussian_filter = cv2.cuda.createGaussianFilter(cv2.CV_8UC1, cv2.CV_8UC1, (21, 21), 0)
laplacian_filter = cv2.cuda.createLaplacianFilter(cv2.CV_8UC1, cv2.CV_8UC1)

start = time.time()
gpu_result = gpu_img
for _ in range(5):
    gpu_result = gaussian_filter.apply(gpu_result)
    gpu_result = laplacian_filter.apply(gpu_result)
    gpu_result = gaussian_filter.apply(gpu_result)
result = gpu_result.download()
gpu_time = time.time() - start
print(f"GPU 시간: {gpu_time:.4f}초")
print(f"속도 차이: {cpu_time/gpu_time:.1f}배")

✅ 최종 결과

CPU 시간: 0.1283초
GPU 시간: 0.0553초
속도 차이: 2.3배 🎉

핵심 정리

포인트	내용
pip install opencv	CUDA 미포함, 소스 빌드 필요
Docker 사용 이유	호스트 시스템 오염 없이 격리된 환경 구축
GPU가 느린 경우	upload/download 오버헤드 > 연산 시간
GPU가 빠른 경우	연산을 많이 연속으로 할수록 유리
Laplacian 에러	CUDA는 1채널(CV_8UC1)만 지원, BGR 불가
VSCode 디버깅	Dev Containers로 컨테이너 안에서 F5 디버깅 가능