Home Swift UNIX C Assembly Go Web MCU Research Non-Tech

FFmpeg CUDA Hardware-Accelerated Transcoding on Ubuntu

2025-01-14 | Research | #Words: 1484 | 中文原版

This article uses an Ubuntu environment. The version of FFmpeg installed directly via APT on Ubuntu supports CUDA acceleration. This demonstration uses this pre-installed version—your experience may vary if you compile FFmpeg yourself or install it from other sources.

For an introduction to FFmpeg and hardware acceleration on macOS, see Convert MKV to MP4 on macOS with FFmpeg: A Hardware-Accelerated Guide - ZhongUncle GitHub Pages.

If you’re already familiar with the transcoding process, skip the first two sections and jump directly to the “Using FFmpeg for Hardware-Accelerated Transcoding” section via the sidebar to view the commands.

Transcoding Workflow (Differences Between Decoding, Encoding, and Transcoding)

Any transcoding process consists of two steps: decoding and encoding. Simply put, decoding converts a video file into a displayable video stream, while encoding converts a video stream into a video file.

Here are two examples to illustrate decoding and encoding:

By default, FFmpeg uses software decoding and encoding without any special options. Hardware-accelerated decoding/encoding leverages dedicated chips designed for specific formats—offering faster performance and lower power consumption compared to general-purpose CPUs.

General knowledge: Purpose-built hardware chips are much faster and more power-efficient than CPU-based software processing.

Transcoding workflow diagram (decoding → encoding)

Differences Between Software and Hardware Decoding/Encoding

Software decoding/encoding uses the CPU to run specialized software that handles the decoding/encoding process. The main advantage is flexibility—support for new formats or custom encodings can be added via software updates. In contrast, hardware encoding/decoding is hardwired into the chip during manufacturing and cannot be modified. Typically, each new generation of hardware adds support for additional formats. For example:

980 Ti supported encoding formats
1080 Ti supported encoding formats (additional 3 formats vs 980 Ti)

Important note: While video codecs are often bundled with GPUs, they are separate from CUDA or rasterization units. For example, high-performance GPUs like the Tesla A100 do not support hardware encoding/decoding because they lack dedicated codec chips (though not all data center GPUs are this way—e.g., the V100 and P100 include them).

GPU hardware codec support explanation (Tesla A100 vs V100/P100)

Additionally, hardware codecs have limited capacity—processing 10 videos simultaneously will not be as fast as processing one. This phenomenon is demonstrated later in the article.

Using FFmpeg for Hardware-Accelerated Transcoding

Simple Start

Now that you understand the basics, let’s start transcoding. Below is the simplest command:

ffmpeg -c:v h264_cuvid -i input.mp4 -c:v h264_nvenc output.mp4

Explanation:

Note: This usage differs from the official FFmpeg documentation. Following the official command format may cause transcoding issues in practice.

Setting Bitrate and Frame Rate

Using the basic command above may result in changed bitrate or frame rate (e.g., 30fps → 25fps, 6Mbps → 2Mbps). To preserve or customize these parameters:

ffmpeg -c:v h264_cuvid -i input.mp4 -c:v h264_nvenc -b:v 6000k -r 30 output.mp4

Explanation:

With these settings, the transcoding speed reaches 16x—much faster than integrated GPU acceleration. The GPU utilization during transcoding is shown below:

GPU utilization during 16x transcoding (3060)

What Happens If You Omit the Hardware Decoder?

As mentioned earlier, FFmpeg requires explicit hardware decoder specification. If you omit it, FFmpeg will use CPU-based software decoding—resulting in high CPU usage but similar transcoding speed (sometimes slightly faster):

CPU utilization with software decoding (no hardware decoder specified)

Software decoding has its use cases: When repairing corrupted video encodings, software decoders are more likely to correctly handle damaged files than hardware decoders. For example, I had a video that failed to play after the 11-minute mark in some players. Using the hardware decoder resulted in the following error:

[h264_cuvid @ 0x55ed4090d540] cuvid decode callback error
Error while decoding stream #0:0: Generic error in an external library

The transcoded video was black. However, omitting the hardware decoder (using software decoding) allowed successful transcoding.

What Happens When Running Multiple Jobs on a Single Encoder?

As noted earlier, hardware codecs have limited capacity—processing multiple videos simultaneously does not scale linearly. For example, the 3060 achieves 16x speed for a single transcoding job. Running two jobs simultaneously results in a combined speed of roughly 16x:

Total transcoding speed with two simultaneous jobs (3060)

This may seem disappointing, but consider practical use cases: Live streaming and recording only require 1x speed. With 16x total capacity, you could theoretically run up to 16 concurrent streams (fewer in practice, as workloads vary—always leave a buffer). This is still a significant advantage for multi-tasking.

Encoding Quality

FFmpeg

Transcoding is typically used to change the video codec (e.g., H.265 → H.264) or file format. Transcoding between the same codec and format is less common, but tests show the quality is excellent—nearly indistinguishable from software transcoding.

In my previous blog about macOS FFmpeg hardware acceleration, I compared hardware transcoding (QSV) when converting MP4 to MOV and reducing the bitrate from 42Mbps to 10Mbps:

FFmpeg hardware vs software transcoding quality comparison (macOS QSV)

Results with CUDA hardware transcoding (3060):

FFmpeg CUDA transcoding quality (3060 NVENC)

The 3060’s NVENC encoder delivers better quality than the 8th-gen Intel QSV encoder—impressive, though it consumes more power.

OBS

Hardware encoders have a supported bitrate range—exceeding or falling below this range results in significant quality loss or encoding issues.

NVIDIA’s official recommended OBS streaming settings (resolution, bitrate, frame rate) are shown below:

NVIDIA official OBS streaming recommendations (resolution, bitrate, frame rate)

When playing Where Winds Meet, recording with OBS using NVENC H.264 encoding at 8000kbps produced quality nearly identical to software encoding. At 40Mbps, the quality was slightly lower but still acceptable. I’ll cover this in more detail in a separate blog—previously, I used an NVIDIA MX250 GPU, whose encoding speed and quality were much worse than QSV. Below are quick screenshots:

I used custom settings based on personal preference, not NVIDIA’s official recommendations (I was unaware of the official guidelines initially).

Scenes with dense objects (like wheat fields) are ideal for testing bitrate loss (colorful scenes also work well). As shown below, individual wheat ears are indistinguishable, but the overall image remains clear from a distance (the images below are re-compressed but still demonstrate acceptable quality):

NVENC encoding quality test (wheat field scene) at 40Mbps
NVENC encoding quality (close-up) Software encoding quality (close-up) for comparison

I hope these will help someone in need~

References/Further Reading