
手动解析 H.264 SPS/PPS 参数集
手动解析 H.264 SPS/PPS 参数集
引言
在处理 H.264 视频流时,SPS(序列参数集)和 PPS(图像参数集)包含了解码视频所需的关键信息。
当我们在 WebRTC 等场景中遇到解码的疑难问题时,手动解析这些参数集往往能帮助我们找出问题根源。
本文不是 H.264 流的深入教学,而是记录一下手动解析 SPS/PPS 的过程,解析过程可能会有些枯燥,但是完成一遍之后,可以举一反三,方便大家对照和学习解析机制。
如果你有需要,nalu.qer.im 是我做的在线解析 .h264 文件的网站,也依托了这些解析原理,欢迎使用。
必备知识
在开始解析前,需要简单了解以下基础概念:
(1)H.264 流格式基础:H.264 流由 NALU(Network Abstraction Layer Unit)组成。NAL
U 是不定长的,每个 NALU 前有起始码,标准起始码为 0x000001
或 0x00000001
,这样从每个起始码开始,直到下一个起始码或者结尾之前,则为一个 NALU。
(2)NALU 结构:去掉起始码后,NALU 本身由 NAL Header 和 NAL Payload 两部分组成。NAL Header 中指明了当前 NALU 的类型,本文将重点解析 SPS 和 PPS 类型的 NALU。
(3)防竞争字节:标准起始码为 0x000001
或 0x00000001
,但是如果 NALU 数据部分(Payload)中恰好也含有相同的序列,解码器可能错误地将其识别为起始码,导致解析错误。
编码时:编码时会先扫描一遍数据部分,当遇到两个连续的 0x00 字节(即 0x00 0x00),且第三个字节为 0x00、0x01、0x02 或 0x03 时,会在第三个字节前插入 0x03,使其变为 0x00 0x00 0x03 [原第三个字节]。如把 0x00 0x00 0x01 变为 0x00 0x00 0x03 0x01。
解码时:扫描数据部分,遇到 0x00 0x00 0x03 时,移除第三个字节(0x03)。如把 0x00 0x00 0x03 0x01 变为 0x00 0x00 0x01。
(4)按字节顺序读取:获取处理后的字节流后,我们按照H.264标准规定的顺序解析各参数。标准中对每个参数都有明确的位长度标注:
- 固定长度参数:如 u(1) 表示读取1位,u(8) 表示读取 8 位(1字节),读取后转换为十进制值
- 可变长度参数:标注为 ue(v) 或 se(v) 的参数需要按照指数哥伦布编码方式解析
指数哥伦布编码是H.264中广泛采用的一种高效变长编码方式,相比固定位数编码,它在表示大数值时能显著节省空间。主要分为两种类型:
无符号指数哥伦布编码 ue(v):用于编码非负整数
- 读取比特流中连续的0,直到遇到第一个1为止,计数0的个数为M
- 接着读取后续M个比特,转换为十进制数N
- 最终解码值 = 2^M - 1 + N
有符号指数哥伦布编码 se(v):用于编码带符号整数
- 先按 ue(v) 方法解码得到中间值 k
- 根据k的奇偶性确定最终值:
- k 为奇数:se(v) = (k+1)/2(正数)
- k 为偶数:se(v) = -k/2(负数)
解析实例 1:SPS 类型的 NALU
预处理:原始数据
下面是一个原始的 NALU,以十六进制展示为:
00 00 00 01 67 64 00 28 ac d9 40 78 02 27 e5 c0 44 00 00 03 00 04 00 00 03 00 e8 3c 60 c6 58
去掉防竞争字节(0x00 0x00 0x03 -> 0x00 0x00)后:
00 00 00 01 67 64 00 28 ac d9 40 78 02 27 e5 c0 44 00 00 00 04 00 00 00 e8 3c 60 c6 58
转换为比特流后:(转换工具)
0000000000000000000000000000000101100111011001000000000000101000101011001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000
解析:NAL 分隔符
分隔符 -> [00000000000000000000000000000001]01100111011001000000000000101000101011001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000
解析:NAL Header
NAL Header 一共 8 位,每一位的含义如下:
+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI| Type |
+---------------+
其中
F 为 forbidden_zero_bit:禁止位,初始为0,当网络发现NAL单元有比特错误时可设置该比特为1,以便接收方纠错或丢掉该单元。
NRI 为 nal_ref_idc:NAL 的重要性指示,标志该 NAL 单元的重要性,值越大,越重要,解码器在解码处理不过来的时候,可以丢掉重要性为 0 的 NALU。
Type 为 nal_unit_type:转换为十进制之后,不同的值代表不同的含义,下面是对应的表格

取 8 位: NAL Header -> [01100111]011001000000000000101000101011001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000
解析这 8 位: 0110 0111 0… … # forbidden_zero_bit -> 0 -> u(1) .11. … # nal_ref_idc —>u(2) -> 2 -> HIGHEST …0 0111 # nal_unit_type -> u(5) -> 7 -> SPS
解析:NAL Payload
(为了防止出错,所有的 bit 流我都完整的书写了,请理解)
H.264 标准协议中规定的 SPS 格式位于文档的 7.3.2.1.1 部分,以下解析按照此表进行

profile_idc -> u(8) -> [01100100]0000000000101000101011001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 100
constraint_set_flag + reserved_zero_2bits -> u(8) -> [00000000]00101000101011001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 0000 0000 0… … #constraint_set0_flag .0.. … #constraint_set1_flag ..0. … #constraint_set2_flag …0 … #constraint_set3_flag … 0… #constraint_set4_flag … .0.. #constraint_set5_flag … ..00 #reserved_zero_2bits
level_idc -> u(8) -> [00101000]101011001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 40(Supports 2Kx1K format. Enables Interlace support. 62914560 samples/sec)
seq_parameter_set_id -> ue(v) -> 1[]01011001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0
if (profile_idc == 100) -> true chroma_format_idc -> ue(v) -> 01[0]11001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=1,N=0 -> ans=2^M-1+N=1 if (chroma_format_idc == 3) -> false
bit_depth_luma_minus8 -> ue(v) -> 1[]1001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0
bit_depth_chroma_minus8 -> ue(v) -> 1[]001101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0
qpprime_y_zero_transform_bypass_flag -> ue(1) -> [0]01101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 0
seq_scaling_matrix_present_flag -> ue(1) -> [0]1101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 0
if (seq_scaling_matrix_present_flag) -> false log2_max_frame_num_minus4 -> ue(v) -> 1[]101100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0
pic_order_cnt_type -> ue(v) -> 1[]01100101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0
if (pic_order_cnt_type == 0) -> true log2_max_pic_order_cnt_lsb_minus4 -> ue(v) -> 01[1]00101000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=1,N=1 -> ans=2^M-1+N=2
max_num_ref_frames -> ue(v) -> 001[01]000000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=2,N=1 -> ans=2^M-1+N=4
gaps_in_frame_num_value_allowed_flag -> u(1) -> [0]00000011110000000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 0
pic_width_in_mbs_minus1 -> ue(v) -> 0000001[111000]0000001000100111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=6,N=56(111000) -> ans=2^M-1+N=119 -> frame_with = 16*pic_width_in_mbs = 16*(pic_width_in_mbs_minus1+1) = 16*(119+1)=1920
pic_height_in_map_units_minus1 -> ue(v) -> 0000001[000100]111111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=6,N=4(000100) -> ans=2^M-1+N=67 -> frame_height = 16*68 = 1088
frame_mbs_only_flag -> u(1) -> [1]11111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 1
if (!frame_mbs_only_flag) -> false
direct_8x8_inference_flag -> u(1) -> [1]1111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 1
frame_cropping_flag -> u(1) -> [1]111001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 1
if (frame_cropping_flag) -> true frame_crop_left_offset -> ue(v) -> 1[]11001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0 frame_crop_right_offset -> ue(v) -> 1[]1001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0 frame_crop_top_offset -> ue(v) -> 1[]001011100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0 frame_crop_bottom_offset -> ue(v) -> 001[01]1100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> M=2,N=1(01) -> ans=2^M-1+N=4
vui_parameters_present_flag -> u(1) -> [1]100000001000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 1
if (vui_parameters_present_flag) -> true vui_parameters() // E.1.1 VUI parameters syntax Page.461 aspect_ratio_info_present_flag -> u(1) -> [1]000000010001 00000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 1 if (aspect_ratio_info_present_flag) -> true aspect_ratio_idc -> u(8) -> [00000001]000100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 1 if (aspect_ratio_idc == 255) -> false overscan_info_present_flag -> u(1) -> [0]00100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 0 if (overscan_info_present_flag) -> false video_signal_type_present_flag -> u(1) -> [0]0100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 0 if (video_signal_type_present_flag) -> false chroma_loc_info_present_flag -> u(1) -> [0]100000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 0 timing_info_present_flag -> u(1) -> [1]00000000000000000000000000000001000000000000000000000000001110100000111100011000001100011001011000 -> 1 if (timing_info_present_flag) -> true num_units_in_tick -> u(32) -> [00000000000000000000000000000001]000000000000000000000000001110100000111100011000001100011001011000 -> 1 time_scale -> u(32) -> u(32) -> [00000000000000000000000000111010]0000111100011000001100011001011000 -> 58 fixed_frame_rate_flag -> u(1) -> [0]000111100011000001100011001011000 -> 0 nal_hrd_parameters_present_flag -> u(1) -> [0]00111100011000001100011001011000 -> 0 if (nal_hrd_parameters_present_flag) -> false vcl_hrd_parameters_present_flag -> u(1) -> [0]0111100011000001100011001011000 -> 0 if (vcl_hrd_parameters_present_flag) -> false if(nal_hrd_parameters_present_flag || vcl_hrd_parameters_present_flag) -> false pic_struct_present_flag -> u(1) -> [0]111100011000001100011001011000 -> 0 bitstream_restriction_flag -> u(1) -> [1]11100011000001100011001011000 -> 1 if(bitstream_restriction_flag ) -> true motion_vectors_over_pic_boundaries_flag -> u(1) -> [1]1100011000001100011001011000 -> 1 max_bytes_per_pic_denom -> ue(v) -> 1[]100011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0 max_bits_per_mb_denom -> ue(v) -> 1[]00011000001100011001011000 -> M=0,N=0 -> ans=2^M-1+N=0 log2_max_mv_length_horizontal -> ue(v) -> 0001[100]0001100011001011000 -> M=3,N=4 -> ans=2^M-1+N=11 log2_max_mv_length_vertical -> ue(v) -> 0001[100]011001011000 -> M=3,N=4 -> ans=2^M-1+N=11 max_num_reorder_frames -> ue(v) -> 01[1]001011000 -> M=1,N=1 -> ans=2^M-1+N=2 max_dec_frame_buffering -> ue(v) -> 001[01]1000 -> M=2,N=1 -> ans=2^M-1+N=4 rbsp_trailing_bits() // 推断 rbsp_stop_one_bit -> u(1) -> [1]000 -> 1 rbsp_alignment_zero_bit -> u(1) -> [0]00 -> 0 rbsp_alignment_zero_bit -> u(1) -> [0]0 -> 0 rbsp_alignment_zero_bit -> u(1) -> [0] -> 0
解析实例 2:PPS 类型的 NALU
预处理:原始数据
下面是一个原始的 NALU,以十六进制展示为:
00 00 00 01 68 eb e3 cb 22 c0
无防竞争字节。
转换为比特流后:
00000000000000000000000000000001011010001110101111100011110010110010001011000000
解析:NAL 分隔符
分隔符 -> [00000000000000000000000000000001]011010001110101111100011110010110010001011000000
解析:NAL Header
取 8 位: NAL Header -> [01101000]1110101111100011110010110010001011000000
0110 1000 0… … # forbidden_zero_bit -> u(1) .11. … # nal_ref_idc -> u(2) -> HIGHEST …0 1000 # nal_unit_type -> u(5) -> PPS
解析:NAL Payload
在 H.264 的协议文档中,PPS 的结构定义在 7.3.2.2 节中,具体的结构如下表所示:

pic_parameter_set_id -> ue(v) -> 1[]110101111100011110010110010001011000000 -> 0 seq_parameter_set_id -> ue(v) -> 1[]10101111100011110010110010001011000000 -> 0
entropy_coding_mode_flag -> [1]0101111100011110010110010001011000000 -> 1
bottom_field_pic_order_in_frame_present_flag -> [0]101111100011110010110010001011000000 -> 0
num_slice_groups_minus1 -> ue(v) -> 1[]01111100011110010110010001011000000 -> 0
if (num_slice_groups_minus1 > 0) -> false
num_ref_idx_l0_default_active_minus1 -> ue(v) -> 01[1]11100011110010110010001011000000 -> M=1,N=1 -> ans=2^M-1+N=2
num_ref_idx_l1_default_active_minus1 -> ue(v) -> 1[]1100011110010110010001011000000 -> M=0,N=0 -> ans=2^M-1+N=0
weighted_pred_flag -> u(1) -> [1]100011110010110010001011000000 -> 1
weighted_bipred_idc -> u(2) -> [10]0011110010110010001011000000 -> 2
pic_init_qp_minus26 -> se(v) -> 001[11]10010110010001011000000 -> M=2,N=3 -> code_num=2^M-1+N=6 -> (偶数)ans=-code_num / 2 = -3
pic_init_qs_minus26 -> se(v) -> 1[]0010110010001011000000 -> M=0,N=0 -> code_num=2^M-1+N=0 -> (偶数)ans=-code_num / 2 = 0
chroma_qp_index_offset -> se(v) -> 001[01]10010001011000000 -> M=2,N=1 -> code_num=2^M-1+N=4 -> (偶数)ans=-code_num / 2 = -2
deblocking_filter_control_present_flag -> u(1) -> [1]0010001011000000 -> 1
constrained_intra_pred_flag -> u(1) -> [0]010001011000000 -> 0
redundant_pic_cnt_present_flag -> u(1) -> [0]10001011000000 -> 0
transform_8x8_mode_flag -> u(1) -> [1]0001011000000 -> 1
pic_scaling_matrix_present_flag -> u(1) -> [0]001011000000 -> 0
if(pic_scaling_matrix_present_flag) -> false
second_chroma_qp_index_offset -> se(v) -> 001[01]1000000 -> M=2,N=1 -> code_num=2^M-1+N=4 -> (偶数)ans=-code_num / 2 = -2
rbsp_trailing_bits() rbsp_stop_one_bit -> u(1) -> [1]000000 -> 1 rbsp_alignment_zero_bit -> u(1) -> [0]00000 -> 0 rbsp_alignment_zero_bit -> u(1) -> [0]0000 -> 0 rbsp_alignment_zero_bit -> u(1) -> [0]000 -> 0 rbsp_alignment_zero_bit -> u(1) -> [0]00 -> 0 rbsp_alignment_zero_bit -> u(1) -> [0]0 -> 0 rbsp_alignment_zero_bit -> u(1) -> [0] -> 0
小结
终于解析完了上述两个实例,十分舒爽,从此遇到任何解析都不怕了(希望是)。感谢阅读,再补充一些解码视频所必需的关键信息:
SPS中的关键参数及其意义
-
分辨率相关参数:
pic_width_in_mbs_minus1
:确定视频宽度,实际像素宽度 = (pic_width_in_mbs_minus1 + 1) × 16pic_height_in_map_units_minus1
:确定视频高度,实际像素高度 = (pic_height_in_map_units_minus1 + 1) × 16frame_cropping_flag
和相关裁剪参数:用于微调最终显示的视频区域
-
视频特性参数:
profile_idc
:指示H.264编码配置文件(如Baseline、Main、High等)level_idc
:指示视频复杂度级别,影响最大分辨率、帧率和码率chroma_format_idc
:指示色彩空间格式(如YUV 4:2:0、4:2:2等)
-
时序参数:
timing_info_present_flag
:是否包含时序信息num_units_in_tick
和time_scale
:用于计算视频帧率,帧率 = time_scale / (2 × num_units_in_tick)
PPS中的关键参数及其意义
-
编码控制参数:
entropy_coding_mode_flag
:熵编码模式(0为CAVLC,1为CABAC)pic_init_qp_minus26
:初始量化参数,影响视频质量和压缩率deblocking_filter_control_present_flag
:是否启用去块滤波控制
-
参考帧参数:
num_ref_idx_l0/l1_default_active_minus1
:默认活动参考帧索引数量weighted_pred_flag
和weighted_bipred_idc
:加权预测相关标志
(全文完)