Windows 2025
Sponsored Link

NVIDIA CUDA : Install2024/12/09

 

Install NVIDIA CUDA (Compute Unified Device Architecture).

[1] Run PowerShell with Admin Privilege and work.
Download and Install C++ compiler first.
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.

PS C:\Users\Administrator> Start-BitsTransfer -Source "https://aka.ms/vs/17/release/vs_BuildTools.exe" -Destination "vs_BuildTools.exe" 

# install on silent mode
PS C:\Users\Administrator> ./vs_buildtools.exe `
--add Microsoft.Component.MSBuild `
--add Microsoft.VisualStudio.Component.CoreBuildTools `
--add Microsoft.VisualStudio.Component.VC.CoreBuildTools `
--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 `
--add Microsoft.VisualStudio.Component.VC.Redist.14.Latest `
--add Microsoft.VisualStudio.Component.VC.CoreIde `
--add Microsoft.VisualStudio.Component.Windows11SDK.22621 `
--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Core `
--add Microsoft.VisualStudio.Workload.MSBuildTools `
--add Microsoft.VisualStudio.Workload.VCTools `
--includeRecommended --quiet --wait 

# installation processes are running
PS C:\Users\Administrator> Get-Process -Name "vs_*", "setup*" 

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
    376      17     3520      16144       0.78   5668   0 vs_BuildTools
    914      66    29176      61820       4.92   6228   0 vs_setup_bootstrapper

# after finishing installation, processes above finish
PS C:\Users\Administrator> Get-Process -Name "vs_*" 


# C++ compiler is here
PS C:\Users\Administrator> Get-ChildItem "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\*\bin\Hostx64\x64\cl.exe" 

    Directory: C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx64\x64


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----         12/8/2024   5:51 PM         862792 cl.exe

# set Path to environment variables
PS C:\Users\Administrator> $currentPath = [Environment]::GetEnvironmentVariable("Path", "Machine") 
PS C:\Users\Administrator> $currentPath += ";C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.42.34433\bin\Hostx64\x64" 
PS C:\Users\Administrator> [Environment]::SetEnvironmentVariable("Path", $currentPath, "Machine") 

# reload environment variables
PS C:\Users\Administrator> $env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User") 

PS C:\Users\Administrator> cl.exe 
Microsoft (R) C/C++ Optimizing Compiler Version 19.42.34435 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]
[2] Download and Install CUDA.
Make sure the version of CUDA you'd like to install on the official site below.
⇒ https://developer.nvidia.com/cuda-toolkit-archive
PS C:\Users\Administrator> Start-BitsTransfer -Source "https://developer.download.nvidia.com/compute/cuda/12.6.3/local_installers/cuda_12.6.3_561.17_windows.exe" -Destination "cuda_12.6.3_561.17_windows.exe" 

# install on silent mode
PS C:\Users\Administrator> ./cuda_12.6.3_561.17_windows.exe -s 

# installation processes are running
PS C:\Users\Administrator> Get-Process -Name "cuda*", "setup*" 

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName
-------  ------    -----      -----     ------     --  -- -----------
    213      17   757964     760556      23.53   6992   0 cuda_12.6.3_561.17_windows

# after finishing installation, processes above finish
PS C:\Users\Administrator> Get-Process -Name "cuda*", "setup*" 


# reload environment variables
PS C:\Users\Administrator> $env:Path = [System.Environment]::GetEnvironmentVariable("Path","Machine") + ";" + [System.Environment]::GetEnvironmentVariable("Path","User") 

PS C:\Users\Administrator> nvcc --version 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Oct_30_01:18:48_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

PS C:\Users\Administrator> nvidia-smi 
Sun Dec  8 19:23:04 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 561.17                 Driver Version: 561.17         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060      WDDM  |   00000000:03:00.0 Off |                  N/A |
|  0%   40C    P8              9W /  170W |      18MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1416    C+G   C:\Windows\System32\dwm.exe                 N/A      |
|    0   N/A  N/A      4192    C+G   C:\Windows\explorer.exe                     N/A      |
+-----------------------------------------------------------------------------------------+
[3] Run sample program to verify installation.
PS C:\Users\Administrator> Start-BitsTransfer -Source "https://github.com/NVIDIA/cuda-samples/archive/refs/heads/master.zip" -Destination "master.zip" -Dynamic 
PS C:\Users\Administrator> Expand-Archive -Path ./master.zip 

PS C:\Users\Administrator> cd ./master/cuda-samples-master/Samples/1_Utilities/deviceQuery 
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> nvcc -I ../../../Common deviceQuery.cpp -o deviceQuery 
deviceQuery.cpp
   Creating library deviceQuery.lib and object deviceQuery.exp
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> ./deviceQuery.exe 
C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery\deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3060"
  CUDA Driver Version / Runtime Version          12.6 / 12.6
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 12288 MBytes (12884377600 bytes)
  (028) Multiprocessors, (128) CUDA Cores/MP:    3584 CUDA Cores
  GPU Max Clock rate:                            1777 MHz (1.78 GHz)
  Memory Clock rate:                             7501 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 2359296 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.6, CUDA Runtime Version = 12.6, NumDevs = 1
Result = PASS


PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> cd ~/master/cuda-samples-master/Samples/1_Utilities/bandwidthTest 
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\bandwidthTest> nvcc -I ../../../Common bandwidthTest.cu -o bandwidthTest 
bandwidthTest.cu
tmpxft_00001170_00000000-10_bandwidthTest.cudafe1.cpp
   Creating library bandwidthTest.lib and object bandwidthTest.exp
PS C:\Users\Administrator\master\cuda-samples-master\Samples\1_Utilities\deviceQuery> ./bandwidthTest.exe 
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: NVIDIA GeForce RTX 3060
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     12.3

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     13.0

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     320.6

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
Matched Content