{"id":16,"date":"2026-04-16T11:20:39","date_gmt":"2026-04-16T11:20:39","guid":{"rendered":"https:\/\/blog-api.minpox.com\/?p=16"},"modified":"2026-04-16T11:20:39","modified_gmt":"2026-04-16T11:20:39","slug":"setting-up-an-opencv-cuda-dev-environment-with-docker-and-debugging-via-vscode","status":"publish","type":"post","link":"https:\/\/blog-api.minpox.com\/?p=16","title":{"rendered":"Setting Up an OpenCV CUDA Dev Environment with Docker and Debugging via VSCode"},"content":{"rendered":"\n<style>\n  .blog-wrap { font-family: 'Segoe UI', sans-serif; color: #1a1a1a; line-height: 1.8; max-width: 860px; margin: 0 auto; }\n  .blog-wrap h2 { font-size: 1.5rem; font-weight: 700; margin: 2.5rem 0 1rem; padding-left: 12px; border-left: 4px solid #2563eb; color: #1e3a8a; }\n  .blog-wrap h3 { font-size: 1.1rem; font-weight: 700; margin: 1.8rem 0 0.6rem; color: #1e40af; }\n  .blog-wrap p { margin: 0.8rem 0; }\n  .blog-wrap pre { background: #0f172a; color: #e2e8f0; padding: 1.2rem 1.5rem; border-radius: 8px; overflow-x: auto; font-size: 0.88rem; line-height: 1.7; margin: 1rem 0; }\n  .blog-wrap code { font-family: 'Fira Code', 'Courier New', monospace; }\n  .blog-wrap .inline-code { background: #e0e7ff; color: #3730a3; padding: 2px 6px; border-radius: 4px; font-size: 0.9em; font-family: monospace; }\n  .blog-wrap .result-box { background: #f0fdf4; border: 1px solid #86efac; border-radius: 8px; padding: 1rem 1.5rem; margin: 1rem 0; }\n  .blog-wrap .error-box { background: #fff1f2; border: 1px solid #fca5a5; border-radius: 8px; padding: 1rem 1.5rem; margin: 1rem 0; }\n  .blog-wrap .info-box { background: #eff6ff; border: 1px solid #93c5fd; border-radius: 8px; padding: 1rem 1.5rem; margin: 1rem 0; }\n  .blog-wrap table { width: 100%; border-collapse: collapse; margin: 1rem 0; font-size: 0.95rem; }\n  .blog-wrap th { background: #1e3a8a; color: white; padding: 0.6rem 1rem; text-align: left; }\n  .blog-wrap td { padding: 0.6rem 1rem; border-bottom: 1px solid #e2e8f0; }\n  .blog-wrap tr:nth-child(even) td { background: #f8fafc; }\n  .blog-wrap .tag { display: inline-block; background: #dbeafe; color: #1e40af; font-size: 0.8rem; padding: 2px 10px; border-radius: 20px; margin-right: 6px; margin-bottom: 4px; }\n  .blog-wrap .step-num { display: inline-block; background: #2563eb; color: white; width: 26px; height: 26px; border-radius: 50%; text-align: center; line-height: 26px; font-size: 0.85rem; font-weight: 700; margin-right: 8px; }\n  .blog-wrap hr { border: none; border-top: 1px solid #e2e8f0; margin: 2rem 0; }\n<\/style>\n\n<div class=\"blog-wrap\">\n\n  <p>\n    <span class=\"tag\">Docker<\/span>\n    <span class=\"tag\">OpenCV<\/span>\n    <span class=\"tag\">CUDA<\/span>\n    <span class=\"tag\">Python<\/span>\n    <span class=\"tag\">VSCode<\/span>\n    <span class=\"tag\">RTX 3060<\/span>\n  <\/p>\n\n  <p>\n    To use CUDA acceleration with OpenCV, <span class=\"inline-code\">pip install opencv-python<\/span> simply won&#8217;t cut it.\n    CUDA support requires building OpenCV from source. In this post, I&#8217;ll show you how to set up a clean,\n    isolated development environment using Docker \u2014 without polluting your host system \u2014 and debug your code\n    directly inside the container using VSCode.\n  <\/p>\n\n  <hr>\n\n  <h2>Why Can&#8217;t We Just Use pip install?<\/h2>\n  <p>\n    The <span class=\"inline-code\">opencv-python<\/span> package on PyPI is a generic build with no CUDA support.\n    CUDA features must be enabled at compile time by linking against the CUDA libraries.\n  <\/p>\n  <table>\n    <tr><th>Method<\/th><th>CUDA Support<\/th><th>Notes<\/th><\/tr>\n    <tr><td>pip install opencv-python<\/td><td>\u274c<\/td><td>Generic build, CPU only<\/td><\/tr>\n    <tr><td>Build from source (host)<\/td><td>\u2705<\/td><td>Pollutes host environment<\/td><\/tr>\n    <tr><td>Docker + build from source<\/td><td>\u2705<\/td><td>Isolated, clean \u2014 recommended<\/td><\/tr>\n  <\/table>\n\n  <hr>\n\n  <h2><span class=\"step-num\">1<\/span> Prerequisites<\/h2>\n  <p>Make sure the following are installed on your host machine:<\/p>\n  <ul>\n    <li>NVIDIA Driver (verify with <span class=\"inline-code\">nvidia-smi<\/span>)<\/li>\n    <li>Docker<\/li>\n    <li>nvidia-container-toolkit<\/li>\n  <\/ul>\n  <pre><code># Install nvidia-container-toolkit\nsudo apt install nvidia-container-toolkit\nsudo systemctl restart docker\n\n# Verify GPU is accessible inside Docker\ndocker run --gpus all --rm nvidia\/cuda:12.8.0-base-ubuntu22.04 nvidia-smi<\/code><\/pre>\n\n  <div class=\"info-box\">\n    <strong>\ud83d\udca1 Note<\/strong><br>\n    You do NOT need to install the CUDA Toolkit (nvcc) on your host. It&#8217;s already included inside the Docker image.\n  <\/div>\n\n  <hr>\n\n  <h2><span class=\"step-num\">2<\/span> Run the NVIDIA Official CUDA Image<\/h2>\n  <pre><code>docker run --gpus all -it \\\n  --name opencv-cuda \\\n  -v \/your\/project\/path:\/workspace \\\n  nvidia\/cuda:12.8.0-cudnn-devel-ubuntu22.04 \\\n  bash<\/code><\/pre>\n\n  <p>Verify GPU and nvcc inside the container:<\/p>\n  <pre><code>nvcc --version\nnvidia-smi<\/code><\/pre>\n\n  <div class=\"result-box\">\n    <strong>\u2705 Expected Output<\/strong>\n    <pre style=\"background:transparent; color:#166534; padding:0; margin:0.5rem 0 0;\">nvcc: NVIDIA (R) Cuda compiler driver\nCuda compilation tools, release 12.8\n\nNVIDIA GeForce RTX 3060  |  CUDA Version: 12.8<\/pre>\n  <\/div>\n\n  <hr>\n\n  <h2><span class=\"step-num\">3<\/span> Install Dependencies<\/h2>\n  <p>If apt is slow, switch to a faster mirror first:<\/p>\n  <pre><code># Switch to a faster mirror (optional)\nsed -i 's\/archive.ubuntu.com\/mirror.kakao.com\/g' \/etc\/apt\/sources.list\nsed -i 's\/security.ubuntu.com\/mirror.kakao.com\/g' \/etc\/apt\/sources.list\n\napt update && apt install -y \\\n  python3 python3-pip python3-dev \\\n  cmake git g++ \\\n  libgtk2.0-dev pkg-config \\\n  libavcodec-dev libavformat-dev libswscale-dev\n\npip3 install numpy<\/code><\/pre>\n\n  <hr>\n\n  <h2><span class=\"step-num\">4<\/span> Build OpenCV from Source<\/h2>\n  <pre><code>cd \/workspace\n\ngit clone https:\/\/github.com\/opencv\/opencv.git\ngit clone https:\/\/github.com\/opencv\/opencv_contrib.git\n\ncd opencv && mkdir build && cd build\n\ncmake .. \\\n  -D WITH_CUDA=ON \\\n  -D OPENCV_CUDA_ARCH_BIN=\"8.6\" \\\n  -D CUDA_ARCH_BIN=\"8.6\" \\\n  -D CUDA_ARCH_PTX=\"\" \\\n  -D OPENCV_EXTRA_MODULES_PATH=\/workspace\/opencv_contrib\/modules \\\n  -D WITH_CUBLAS=ON \\\n  -D BUILD_opencv_python3=ON \\\n  -D CMAKE_BUILD_TYPE=Release<\/code><\/pre>\n\n  <div class=\"info-box\">\n    <strong>\ud83d\udca1 CUDA_ARCH_BIN by GPU<\/strong><br>\n    RTX 3060 \u2192 <span class=\"inline-code\">8.6<\/span> &nbsp;|&nbsp;\n    RTX 3090 \u2192 <span class=\"inline-code\">8.6<\/span> &nbsp;|&nbsp;\n    RTX 4090 \u2192 <span class=\"inline-code\">8.9<\/span> &nbsp;|&nbsp;\n    RTX 5070 Ti \u2192 <span class=\"inline-code\">8.9 ~ 9.0<\/span>\n  <\/div>\n\n  <p>After cmake completes, verify these lines in the output:<\/p>\n  <div class=\"result-box\">\n    <pre style=\"background:transparent; color:#166534; padding:0; margin:0;\">--   NVIDIA CUDA:   YES (ver 12.8, CUFFT CUBLAS)  \u2705\n--     NVIDIA GPU arch:  86                        \u2705\n--   cuDNN:          YES (ver 9.7.0)               \u2705\n--   Python 3:\n--     Libraries:    \/usr\/lib\/...\/libpython3.10.so \u2705\n--     numpy:        ...\/numpy\/_core\/include       \u2705<\/pre>\n  <\/div>\n\n  <p>Build and install (takes 30min ~ 1hr):<\/p>\n  <pre><code>make -j$(nproc)\nmake install<\/code><\/pre>\n\n  <hr>\n\n  <h2><span class=\"step-num\">5<\/span> Verify the Build<\/h2>\n  <pre><code>python3 -c \"\nimport cv2\nprint('OpenCV version:', cv2.__version__)\nprint('CUDA devices:', cv2.cuda.getCudaEnabledDeviceCount())\n\"<\/code><\/pre>\n\n  <div class=\"result-box\">\n    <strong>\u2705 Success<\/strong>\n    <pre style=\"background:transparent; color:#166534; padding:0; margin:0.5rem 0 0;\">OpenCV version: 4.14.0-pre\nCUDA devices: 1<\/pre>\n  <\/div>\n\n  <hr>\n\n  <h2><span class=\"step-num\">6<\/span> Save the Container as an Image<\/h2>\n  <p>If you exit the container, everything will be lost (due to <span class=\"inline-code\">&#8211;rm<\/span>). Commit it as a reusable image from a new host terminal:<\/p>\n  <pre><code># On the host\ndocker ps  # Get container ID\ndocker commit &lt;container_id&gt; opencv-cuda:latest\n\n# Verify\ndocker images | grep opencv-cuda<\/code><\/pre>\n\n  <div class=\"info-box\">\n    <strong>\u26a0\ufe0f Image Size<\/strong><br>\n    The resulting image will be around 13GB. Using a multi-stage Dockerfile build can reduce this to 4\u20135GB.\n  <\/div>\n\n  <hr>\n\n  <h2><span class=\"step-num\">7<\/span> Debug Inside the Container with VSCode<\/h2>\n  <p>Install these two VSCode extensions:<\/p>\n  <ul>\n    <li>Remote &#8211; SSH (already installed)<\/li>\n    <li><strong>Dev Containers<\/strong> (install additionally)<\/li>\n  <\/ul>\n  <p>Start the container:<\/p>\n  <pre><code>docker run --gpus all -it \\\n  --name opencv-cuda \\\n  -v \/your\/project\/path:\/workspace \\\n  opencv-cuda:latest bash<\/code><\/pre>\n\n  <p>In VSCode: click the blue button at the bottom-left \u2192 <strong>Attach to Running Container<\/strong> \u2192 select <span class=\"inline-code\">opencv-cuda<\/span><\/p>\n  <p>Create <span class=\"inline-code\">.vscode\/launch.json<\/span>:<\/p>\n  <pre><code>{\n  \"version\": \"0.2.0\",\n  \"configurations\": [\n    {\n      \"name\": \"Python Debugger: Current File\",\n      \"type\": \"debugpy\",\n      \"request\": \"launch\",\n      \"program\": \"${file}\",\n      \"console\": \"integratedTerminal\"\n    }\n  ]\n}<\/code><\/pre>\n\n  <hr>\n\n  <h2><span class=\"step-num\">8<\/span> CPU vs GPU Speed Comparison<\/h2>\n\n  <h3>First Attempt \u2014 Single GaussianBlur<\/h3>\n  <pre><code>import cv2\nimport numpy as np\nimport time\n\nimg = np.random.randint(0, 255, (4096, 4096, 3), dtype=np.uint8)\n\n# CPU\nstart = time.time()\nresult_cpu = cv2.GaussianBlur(img, (21, 21), 0)\ncpu_time = time.time() - start\nprint(f\"CPU time: {cpu_time:.4f}s\")\n\n# GPU\ngpu_img = cv2.cuda_GpuMat()\ngpu_img.upload(img)\nstart = time.time()\ngpu_filter = cv2.cuda.createGaussianFilter(cv2.CV_8UC3, cv2.CV_8UC3, (21, 21), 0)\nresult_gpu = gpu_filter.apply(gpu_img)\nresult_gpu.download()\ngpu_time = time.time() - start\nprint(f\"GPU time: {gpu_time:.4f}s\")<\/code><\/pre>\n\n  <div class=\"result-box\">\n    <strong>Result<\/strong>\n    <pre style=\"background:transparent; color:#166534; padding:0; margin:0.5rem 0 0;\">CPU time: 0.0437s\nGPU time: 0.0982s\nGPU is 0.4x slower \ud83d\ude05<\/pre>\n  <\/div>\n  <p>The GPU was slower because the <strong>upload\/download overhead exceeded the actual computation time<\/strong> for a single operation.<\/p>\n\n  <h3>Second Attempt \u2014 Chained Operations (with Error)<\/h3>\n  <pre><code>laplacian_filter = cv2.cuda.createLaplacianFilter(\n    cv2.CV_8UC3, cv2.CV_8UC3  # \u2190 3-channel attempt\n)<\/code><\/pre>\n\n  <div class=\"error-box\">\n    <strong>\u274c Error<\/strong>\n    <pre style=\"background:transparent; color:#991b1b; padding:0; margin:0.5rem 0 0;\">OpenCV Error: (-215:Assertion failed) scn == 1 || scn == 4\nin function 'LinearFilter'<\/pre>\n    <p style=\"margin:0.5rem 0 0;\"><strong>Cause:<\/strong> The CUDA Laplacian filter only supports 1-channel (grayscale) or 4-channel images. 3-channel BGR images are not supported.<\/p>\n  <\/div>\n\n  <h3>Fixed Code \u2014 Grayscale + 5 Chained Operations<\/h3>\n  <pre><code>import cv2\nimport numpy as np\nimport time\n\nimg = np.random.randint(0, 255, (4096, 4096, 3), dtype=np.uint8)\nimg_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  # Convert to 1 channel\n\n# CPU\nstart = time.time()\nresult = img_gray.copy()\nfor _ in range(5):\n    result = cv2.GaussianBlur(result, (21, 21), 0)\n    result = cv2.Laplacian(result, cv2.CV_8U)\n    result = cv2.GaussianBlur(result, (21, 21), 0)\ncpu_time = time.time() - start\nprint(f\"CPU time: {cpu_time:.4f}s\")\n\n# GPU \u2014 upload once, compute 15 times, download once\ngpu_img = cv2.cuda_GpuMat()\ngpu_img.upload(img_gray)\n\ngaussian_filter = cv2.cuda.createGaussianFilter(cv2.CV_8UC1, cv2.CV_8UC1, (21, 21), 0)\nlaplacian_filter = cv2.cuda.createLaplacianFilter(cv2.CV_8UC1, cv2.CV_8UC1)\n\nstart = time.time()\ngpu_result = gpu_img\nfor _ in range(5):\n    gpu_result = gaussian_filter.apply(gpu_result)\n    gpu_result = laplacian_filter.apply(gpu_result)\n    gpu_result = gaussian_filter.apply(gpu_result)\nresult = gpu_result.download()\ngpu_time = time.time() - start\nprint(f\"GPU time: {gpu_time:.4f}s\")\nprint(f\"Speedup: {cpu_time\/gpu_time:.1f}x\")<\/code><\/pre>\n\n  <div class=\"result-box\">\n    <strong>\u2705 Final Result<\/strong>\n    <pre style=\"background:transparent; color:#166534; padding:0; margin:0.5rem 0 0;\">CPU time: 0.1283s\nGPU time: 0.0553s\nSpeedup: 2.3x \ud83c\udf89<\/pre>\n  <\/div>\n\n  <hr>\n\n  <h2>Key Takeaways<\/h2>\n  <table>\n    <tr><th>Point<\/th><th>Detail<\/th><\/tr>\n    <tr><td>pip install opencv<\/td><td>No CUDA \u2014 must build from source<\/td><\/tr>\n    <tr><td>Why Docker<\/td><td>Isolated environment, host stays clean<\/td><\/tr>\n    <tr><td>GPU slower than CPU<\/td><td>upload\/download overhead > computation time<\/td><\/tr>\n    <tr><td>GPU faster than CPU<\/td><td>More chained operations = better GPU efficiency<\/td><\/tr>\n    <tr><td>Laplacian error<\/td><td>CUDA only supports CV_8UC1 (grayscale), not BGR<\/td><\/tr>\n    <tr><td>VSCode debugging<\/td><td>Dev Containers lets you F5-debug inside a container<\/td><\/tr>\n  <\/table>\n\n<\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"602\" src=\"https:\/\/blog-api.minpox.com\/wp-content\/uploads\/2026\/04\/\uc2a4\ud06c\ub9b0\uc0f7-2026-04-16-\uc624\ud6c4-8.08.00-1024x602.png\" alt=\"\" class=\"wp-image-14\" srcset=\"https:\/\/blog-api.minpox.com\/wp-content\/uploads\/2026\/04\/\uc2a4\ud06c\ub9b0\uc0f7-2026-04-16-\uc624\ud6c4-8.08.00-1024x602.png 1024w, https:\/\/blog-api.minpox.com\/wp-content\/uploads\/2026\/04\/\uc2a4\ud06c\ub9b0\uc0f7-2026-04-16-\uc624\ud6c4-8.08.00-300x176.png 300w, https:\/\/blog-api.minpox.com\/wp-content\/uploads\/2026\/04\/\uc2a4\ud06c\ub9b0\uc0f7-2026-04-16-\uc624\ud6c4-8.08.00-768x452.png 768w, https:\/\/blog-api.minpox.com\/wp-content\/uploads\/2026\/04\/\uc2a4\ud06c\ub9b0\uc0f7-2026-04-16-\uc624\ud6c4-8.08.00-1536x903.png 1536w, https:\/\/blog-api.minpox.com\/wp-content\/uploads\/2026\/04\/\uc2a4\ud06c\ub9b0\uc0f7-2026-04-16-\uc624\ud6c4-8.08.00-2048x1204.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Docker OpenCV CUDA Python VSCode RTX 3060 To use CUDA acceleration with OpenCV, pip install opencv-python simply won&#8217;t cut it. CUDA support requires building OpenCV from source. In this post, I&#8217;ll show you how to set up a clean, isolated development environment using Docker \u2014 without polluting your host system \u2014 and debug your code [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12],"tags":[17,5,16,18,20,19],"class_list":["post-16","post","type-post","status-publish","format-standard","hentry","category-it","tag-cuda","tag-docker","tag-opencv","tag-python","tag-rtx3060","tag-vscode"],"_links":{"self":[{"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=\/wp\/v2\/posts\/16","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16"}],"version-history":[{"count":1,"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=\/wp\/v2\/posts\/16\/revisions"}],"predecessor-version":[{"id":17,"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=\/wp\/v2\/posts\/16\/revisions\/17"}],"wp:attachment":[{"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog-api.minpox.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}