最近购买了一台 ThinkPad T14 Gen3 作为我的新笔记本,替代掉原来用的内存只有 8GB、SSD 已经快满了的 MBP。到手之后就把 Windows 11 格了装 Arch Linux(本来试了一下 NixOS,结果发现图形界面不知道为什么总是开不出来,最后暂时放弃了)。距离我上次把 Linux 桌面作为主力已经过去了将近 9 年的时间,相比于当时来讲,Linux 桌面的进步其实挺大的,虽然……可能还是没到所谓的「Linux 桌面元年」吧。

出了什么问题?

说到这台笔记本,我的测试是几乎所有硬件都是正常的——除了 Intel 的核显有时候会抽风:启动时候会在 dmesg 吐 call trace,休眠和睡眠恢复的时候也会在 dmesg 吐 call trace,睡眠几次之后 logout,i915 可能会让显示屏开开关关好几分钟才进 GDM 登录页(这时候登录也进不了 gnome-shell 了,只能重启),更重要的是:HDMI 口是坏的,插线完全没反应,也没有任何报错信息,就像这个接口完全不存在一样(虽然用 Type-C 转接头还是可以用上多显示器)。

目前 Arch Linux 最新的内核为 5.18.12.arch1-1。启动时候的 call trace 长成这个样子:

------------[ cut here ]------------
i915 0000:00:02.0: drm_WARN_ON(intel_dp->pps.vdd_wakeref)
WARNING: CPU: 4 PID: 391 at drivers/gpu/drm/i915/display/intel_pps.c:592 intel_pps_vdd_on_unlocked+0x29d/0x2b0 [i915]
Modules linked in: cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus iwlmvm snd_soc_core snd_compress ac97_bus mac80211 snd_pcm_dmaengine libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iwlwifi snd_hda_core snd_hwdep e1000e snd_pcm i2c_i801 intel_lpss_pci spi_intel_pci spi_intel intel_lpss iwlmei i2c_smbus snd_timer mei_me idma64 cfg80211 mei i915(+) thunderbolt processor_thermal_device_pci drm_buddy processor_thermal_device ucsi_acpi ttm processor_thermal_rfim typec_ucsi intel_vsec drm_dp_helper processor_thermal_mbox typec processor_thermal_rapl roles intel_rapl_common intel_gtt igen6_edac wmi mac_hid thinkpad_acpi ledtrig_audio platform_profile snd i2c_hid_acpi i2c_hid soundcore
 int3403_thermal int340x_thermal_zone tpm_crb tpm_tis tpm_tis_core tpm video rng_core intel_hid sparse_keymap int3400_thermal acpi_thermal_rel acpi_tad acpi_pad xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rfkill dm_multipath dm_mod sg crypto_user fuse acpi_call(OE) bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq serio_raw atkbd libps2 vivaldi_fmap nvme xhci_pci crc32c_intel nvme_core xhci_pci_renesas i8042 serio
CPU: 4 PID: 391 Comm: systemd-udevd Tainted: G           OE     5.18.12-arch1-1 #1 96418c890ae0efcbf26c551e98cb3d72a56d7da8
Hardware name: LENOVO 21AHA014CD/21AHA014CD, BIOS N3MET04W (1.01 ) 05/04/2022
RIP: 0010:intel_pps_vdd_on_unlocked+0x29d/0x2b0 [i915]
Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 4c 8d 4f d4 48 c7 c1 50 d2 d4 c0 4c 89 e2 48 c7 c7 79 2b d6 c0 48 89 c6 e8 ef 54 8b d4 <0f> 0b e9 f3 fd ff ff e8 87 d1 90 d4 0f 1f 80 00 00 00 00 66 0f 1f
RSP: 0018:ffffac5c00e536c0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff9bcf65b2a170 RCX: 0000000000000027
RDX: ffff9bda6f1216a8 RSI: 0000000000000001 RDI: ffff9bda6f1216a0
RBP: ffff9bcf59d70000 R08: 0000000000000000 R09: ffffac5c00e534d0
R10: 0000000000000003 R11: ffff9bda9f6fffe8 R12: ffff9bcf427345a0
R13: 0000000000000005 R14: ffff9bcf59d707d8 R15: 0000000000000000
FS:  00007fc45def2080(0000) GS:ffff9bda6f100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c86beaa018 CR3: 000000010587c002 CR4: 0000000000f70ee0
PKRU: 55555554
Call Trace:
 <TASK>
 ? intel_display_power_get+0x52/0x60 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 intel_dp_aux_xfer+0x127/0x7a0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 ? hrtimer_try_to_cancel+0x19/0x100
 intel_dp_aux_transfer+0x202/0x320 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 drm_dp_dpcd_access+0xaa/0x130 [drm_dp_helper feb423ecaa405180d8bd6147672ce2c99671a810]
 drm_dp_dpcd_write+0x8a/0xd0 [drm_dp_helper feb423ecaa405180d8bd6147672ce2c99671a810]
 intel_dp_set_power+0x67/0x190 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 intel_ddi_post_disable+0x44b/0x4a0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 intel_encoders_post_disable+0x7b/0x90 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 intel_old_crtc_state_disables+0x38/0xa0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 intel_atomic_commit_tail+0x3b8/0x18f0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 ? flush_workqueue+0x19d/0x420
 intel_atomic_commit+0x33d/0x390 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 intel_modeset_init+0x19d/0x280 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 i915_driver_probe+0x4af/0xda0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 ? intel_modeset_probe_defer+0x4c/0x60 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 ? i915_pci_probe+0x43/0x160 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 local_pci_probe+0x42/0x80
 pci_device_probe+0xc1/0x220
 ? sysfs_do_create_link_sd+0x6a/0xd0
 really_probe+0x19e/0x370
 __driver_probe_device+0xfc/0x170
 driver_probe_device+0x1f/0x90
 __driver_attach+0xbf/0x1a0
 ? __device_attach_driver+0xe0/0xe0
 bus_for_each_dev+0x84/0xd0
 bus_add_driver+0x15d/0x200
 driver_register+0x8d/0xe0
 i915_init+0x23/0x80 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 ? 0xffffffffc0e8c000
 do_one_initcall+0x5a/0x220
 do_init_module+0x4a/0x240
 __do_sys_init_module+0x138/0x1b0
 do_syscall_64+0x5c/0x90
 ? exc_page_fault+0x74/0x170
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc45e89899e
Code: 48 8b 0d fd a3 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca a3 0e 00 f7 d8 64 89 01 48
RSP: 002b:00007fff568057d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055c86bc79b80 RCX: 00007fc45e89899e
RDX: 00007fc45e9ff343 RSI: 000000000070502e RDI: 00007fc45c2b2010
RBP: 00007fc45e9ff343 R08: 0000000000261000 R09: 85ebca77c2b2ae63
R10: 0000000000015f91 R11: 0000000000000246 R12: 0000000000020000
R13: 000055c86bc757f0 R14: 000055c86bc79b80 R15: 000055c86bc95eb0
 </TASK>
---[ end trace 0000000000000000 ]---

而睡眠/休眠之后的 call trace:

------------[ cut here ]------------
i915 0000:00:02.0: i915 raw-wakerefs=1 wakelocks=1 on cleanup
WARNING: CPU: 1 PID: 2859 at drivers/gpu/drm/i915/intel_runtime_pm.c:629 intel_runtime_pm_driver_release+0x4e/0x60 [i915]
Modules linked in: uinput ccm rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic snd_ctl_led crc16 snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev snd_soc_dmic usbhid mc hid_multitouch iTCO_wdt spi_nor intel_pmc_bxt iTCO_vendor_support mtd mei_hdcp mei_pxp pmt_telemetry pmt_class intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate intel_uncore pcspkr psmouse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel
 soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus iwlmvm snd_soc_core snd_compress ac97_bus mac80211 snd_pcm_dmaengine libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iwlwifi snd_hda_core snd_hwdep e1000e snd_pcm i2c_i801 intel_lpss_pci spi_intel_pci spi_intel intel_lpss iwlmei i2c_smbus snd_timer mei_me idma64 cfg80211 mei i915 thunderbolt processor_thermal_device_pci drm_buddy processor_thermal_device ucsi_acpi ttm processor_thermal_rfim typec_ucsi intel_vsec drm_dp_helper processor_thermal_mbox typec processor_thermal_rapl roles intel_rapl_common intel_gtt igen6_edac wmi mac_hid thinkpad_acpi ledtrig_audio platform_profile snd i2c_hid_acpi i2c_hid soundcore int3403_thermal int340x_thermal_zone tpm_crb tpm_tis tpm_tis_core tpm video rng_core intel_hid sparse_keymap int3400_thermal acpi_thermal_rel
 acpi_tad acpi_pad xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rfkill dm_multipath dm_mod sg crypto_user fuse acpi_call(OE) bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq serio_raw atkbd libps2 vivaldi_fmap nvme xhci_pci crc32c_intel nvme_core xhci_pci_renesas i8042 serio
CPU: 1 PID: 2859 Comm: kworker/u32:19 Tainted: G        W  OE     5.18.12-arch1-1 #1 96418c890ae0efcbf26c551e98cb3d72a56d7da8
Hardware name: LENOVO 21AHA014CD/21AHA014CD, BIOS N3MET04W (1.01 ) 05/04/2022
Workqueue: events_unbound async_run_entry_fn
RIP: 0010:intel_runtime_pm_driver_release+0x4e/0x60 [i915]
Code: b7 d9 48 8b 6f 50 48 85 ed 75 03 48 8b 2f e8 b9 2a 62 d4 45 89 e0 89 d9 48 89 ea 48 89 c6 48 c7 c7 20 93 d2 c0 e8 5e f2 9d d4 <0f> 0b 5b 5d 41 5c c3 cc 66 2e 0f 1f 84 00 00 00 00 00 66 0f 1f 00
RSP: 0018:ffffac5c01857db8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000027
RDX: ffff9bda6f0616a8 RSI: 0000000000000001 RDI: ffff9bda6f0616a0
RBP: ffff9bcf427345a0 R08: ffffffff96ee31c0 R09: 0000000000000000
R10: ffffffffffffffff R11: ffff9bda9f72510a R12: 0000000000000001
R13: ffff9bcf59d764d0 R14: 0000000000000000 R15: ffff9bcf5d1edaa8
FS:  0000000000000000(0000) GS:ffff9bda6f040000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005581377c6396 CR3: 0000000838a10002 CR4: 0000000000f70ee0
PKRU: 55555554
Call Trace:
 <TASK>
 i915_drm_suspend_late+0xed/0x110 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
 ? pci_pm_poweroff_late+0x40/0x40
 dpm_run_callback+0x47/0x150
 __device_suspend_late+0xb4/0x230
 async_suspend_late+0x1e/0x90
 async_run_entry_fn+0x31/0x130
 process_one_work+0x1c4/0x380
 worker_thread+0x51/0x380
 ? rescuer_thread+0x3a0/0x3a0
 kthread+0xdb/0x110
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
---[ end trace 0000000000000000 ]---

听朋友吐槽说 i915 驱动的代码质量很糟糕,可能确实如此。去网上搜了一圈,报告相关错误的贴不在少数:

我也看过了 Arch Wiki 的 Intel Graphics 页面,没啥帮助。

编译新内核

根据 i915 wiki 的提示,先尝试看看新版内核有没有把这个 bug 修复好。参考 Build Guide 和 Arch Wiki 的 Kernel/Traditional compilation 编译 drm-tip 内核,并且在 /etc/default/grub 里面加 drm.debug=0x1e log_buf_len=1M 参数来输出更多的调试信息,结果还是不行。

之后看了一下 “Dual eDP support is missing” 这个 issue,发现提 issue 的人给了一个简单粗暴的 workaround:

--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8835,7 +8835,7 @@ static void intel_setup_outputs(struct drm_i915_private *dev_priv)
                intel_ddi_init(dev_priv, PORT_TC1);
        } else if (IS_ALDERLAKE_P(dev_priv)) {
                intel_ddi_init(dev_priv, PORT_A);
-               intel_ddi_init(dev_priv, PORT_B);
+               // intel_ddi_init(dev_priv, PORT_B);
                intel_ddi_init(dev_priv, PORT_TC1);
                intel_ddi_init(dev_priv, PORT_TC2);
                intel_ddi_init(dev_priv, PORT_TC3);

并且在打开了 drm.debug=0x1e 之后,有这样的提示:

i915 0000:00:02.0: [drm:intel_dp_aux_xfer [i915]] AUX B/DDI B/PHY B: timeout (status 0x7d4003ff)
i915 0000:00:02.0: [drm:intel_dp_aux_xfer [i915]] AUX B/DDI B/PHY B: timeout (status 0x7d4003ff)
...
i915 0000:00:02.0: [drm:drm_dp_dpcd_access [drm_dp_helper]] AUX B/DDI B/PHY B: Too many retries, giving up. First error: -110

我的电脑显然没有内置两个显示屏,所以测试了一下把这行注释掉,然后重新编译,重启,发现 i915 不再吐 call trace 了,但是 HDMI 还是不太正常。和 @rtxux 聊时候他说「盲猜 VBT 有问题」,于是我接下来就往这个方向去调试。

VBT

VBT 是 Video BIOS Table,似乎是 Intel 自己的一套东西。VBT 包含了显示硬件相关的配置信息。intel-gpu-tools 这个包里面有解析 VBT 的工具 intel_vbt_decode,我们可以来看一下 VBT 里面到底记录了什么内容:

# cat /sys/kernel/debug/dri/0/i915_vbt > /tmp/i915_vbt
# intel_vbt_decode --file=/tmp/i915_vbt

可以观察到 VBT 大致的结构:

  • VBT header,记录 VBT 的 signature、版本、checksum 等信息;
  • BDB header,BDB = BIOS Data Block。BDB 是真正记录显示信息的地方;
  • 接下来是多个 BDB。对于我的设备来讲,block 2 是 general definitions,包含了各个 child device 的信息,也是后面关注的地方。其他的 block 不太重要,忽略。

BDB block 2 的大致信息如下,只列出了最后用到的信息(其实看完下面的输出之后可能就能猜到是什么原因了,但是实际上 vbt decode 输出的项非常多,所以我当时没有发现某个重要的线索):

BDB block 2 - General definitions block:
	// ...
	Child device size: 39
	Child device count: 10
	Child device info:
		Device handle: 0x0008 (LFP 1 (eDP))
		Device type: 0x1806 (unknown)
			Internal connector
			DisplayPort output
			Digital output
		// ...
		DVO Port: 0x0a (DP-A)
		// ...
	Child device info:
		Device handle: 0x0080 (LFP 2 (eDP))
		Device type: 0x1806 (unknown)
			Internal connector
			DisplayPort output
			Digital output
		// ...
		DVO Port: 0x07 (DP-B)
		// ...
	Child device info:
		Device handle: 0x0004 (EFP 1 (HDMI/DVI/DP))
		Device type: 0x60d2 (DVI-D)
			Power management
			Hotplug signaling
			HDMI output
			Content protection
			High speed link
			TMDS/DVI signaling
			Digital output
		// ...
		DVO Port: 0x07 (DP-B)
        // ...
	Child device info:
		Device handle: 0x0040 (EFP 2 (HDMI/DVI/DP))
		Device type: 0x68c6 (DisplayPort)
			Power management
			Hotplug signaling
			Content protection
			High speed link
			DisplayPort output
			Digital output
		// ...
		DVO Port: 0x0d (unknown)
        // ...
	Child device info:
		Device handle: 0x0020 (EFP 3 (HDMI/DVI/DP))
		Device type: 0x68c6 (DisplayPort)
			Power management
			Hotplug signaling
			Content protection
			High speed link
			DisplayPort output
			Digital output
		// ...
		DVO Port: 0x0f (unknown)
        // ...
	Child device info:
		Device handle: 0x0010 (EFP 4 (HDMI/DVI/DP))
		Device type: 0x68c6 (DisplayPort)
			Power management
			Hotplug signaling
			Content protection
			High speed link
			DisplayPort output
			Digital output
		// ...
		DVO Port: 0x11 (unknown)
        // ...
	Child device info:
		Device handle: 0x0002 (unknown)
		Device type: 0x68c6 (DisplayPort)
			Power management
			Hotplug signaling
			Content protection
			High speed link
			DisplayPort output
			Digital output
		// ...
		DVO Port: 0x13 (unknown)
		// ...

开启调试输出后对应启动时的输出:

i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port A VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:1 LSPCON:0 USB-Type-C:0 TBT:0 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port A VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port B VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:1 LSPCON:0 USB-Type-C:0 TBT:0 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port B VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] More than one child device for port B in VBT, using the first.
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT DP max link rate: 810000
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT DP max link rate: 810000
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT DP max link rate: 810000
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT DP max link rate: 810000

VBT info 在内核中的输出代码如下:

is_dvi = intel_bios_encoder_supports_dvi(devdata);
is_dp = intel_bios_encoder_supports_dp(devdata);
is_crt = intel_bios_encoder_supports_crt(devdata);
is_hdmi = intel_bios_encoder_supports_hdmi(devdata);
is_edp = intel_bios_encoder_supports_edp(devdata);

supports_typec_usb = intel_bios_encoder_supports_typec_usb(devdata);
supports_tbt = intel_bios_encoder_supports_tbt(devdata);

drm_dbg_kms(&i915->drm,
        "Port %c VBT info: CRT:%d DVI:%d HDMI:%d DP:%d eDP:%d LSPCON:%d USB-Type-C:%d TBT:%d DSC:%d\n",
        port_name(port), is_crt, is_dvi, is_hdmi, is_dp, is_edp,
        HAS_LSPCON(i915) && child->lspcon,
        supports_typec_usb, supports_tbt,
        devdata->dsc != NULL);

但是可以看到,所有的 Port 对应输出 HDMI 支持都是 HDMI:0,表明 i915 认为它们不支持 HDMI 编码。是 intel_bios_encoder_supports_hdmi 的问题吗?我一开始是这么认为的,然后去看了一下它的实现:

bool
intel_bios_encoder_supports_dvi(const struct intel_bios_encoder_data *devdata)
{
	return devdata->child.device_type & DEVICE_TYPE_TMDS_DVI_SIGNALING;
}

bool
intel_bios_encoder_supports_hdmi(const struct intel_bios_encoder_data *devdata)
{
	return intel_bios_encoder_supports_dvi(devdata) &&
		(devdata->child.device_type & DEVICE_TYPE_NOT_HDMI_OUTPUT) == 0;
}

其中 DEVICE_TYPE_TMDS_DVI_SIGNALING 值为 1 << 4DEVICE_TYPE_NOT_HDMI_OUTPUT 值为 1 << 11。如果识别到了正确的设备,那么:

Python 3.10.5 (main, Jun  6 2022, 18:49:26) [GCC 12.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 0x60d2 & (1 << 4)
16
>>> 0x60d2 & (1 << 11)
0
>>> 0x1806 & (1 << 4)
0
>>> 0x1806 & (1 << 11)
2048
>>> 0x68c6 & (1 << 4)
0
>>> 0x68c6 & (1 << 11)
2048

对于上面 VBT dump 里面三类不同的子设备,如果 intel_bios_encoder_supports_hdmi 值不为 true,那么只有可能是 0x1806(eDP)或者 0x68c6(DP)。为了验证,我添加了一下输出:

drm_dbg_kms(&i915->drm,
        "Port %c Device type: %x",
        port_name(port), devdata->child.device_type);

编译安装重启后的日志验证了猜测:HDMI 设备(0x60d2)没有被 i915 识别到。一开始我在想:会不会是 i915 读 VBT 的时候出了问题?但是仔细读日志看到了这一行:

More than one child device for port B in VBT, using the first.

所以 port B 其实有不止一个设备?重新看 decode 的输出,发现那个不存在的第二个 eDP 和 HDMI 子设备占用了同一个 DVO Port(DVO = Digital Video Out)。去看了一下对应输出的内核源代码:

static void parse_ddi_port(struct intel_bios_encoder_data *devdata)
{
	struct drm_i915_private *i915 = devdata->i915;
	const struct child_device_config *child = &devdata->child;
	enum port port;

	port = dvo_port_to_port(i915, child->dvo_port);
	if (port == PORT_NONE)
		return;

	if (!is_port_valid(i915, port)) {
		drm_dbg_kms(&i915->drm,
			    "VBT reports port %c as supported, but that can't be true: skipping\n",
			    port_name(port));
		return;
	}

	if (i915->vbt.ports[port]) {
		drm_dbg_kms(&i915->drm,
		 	    "More than one child device for port %c in VBT, using the first.\n",
		 	    port_name(port));
		return;
	}

	sanitize_device_type(devdata, port);

	if (intel_bios_encoder_supports_dvi(devdata))
		sanitize_ddc_pin(devdata, port);

	if (intel_bios_encoder_supports_dp(devdata))
		sanitize_aux_ch(devdata, port);

	i915->vbt.ports[port] = devdata;
}

整个 i915 源代码只有这个会写入 i915->vbt.ports[port],而调用 parse_ddi_port() 的函数只有一个地方:

static void parse_ddi_ports(struct drm_i915_private *i915)
{
	struct intel_bios_encoder_data *devdata;
	enum port port;

	if (!has_ddi_port_info(i915))
		return;

	list_for_each_entry(devdata, &i915->vbt.display_devices, node)
		parse_ddi_port(devdata);

	for_each_port(port) {
		if (i915->vbt.ports[port])
			print_ddi_port(i915->vbt.ports[port], port);
	}
}

对 VBT 里面每个显示设备循环 parse。所以 parse_ddi_port() 做的事情就是生成 i915->vbt.ports(它的类型是 struct intel_bios_encoder_data *ports[I915_MAX_PORTS];)。而如果 ports 数组中对应的 port 已经有值了,那么说明之前已经有相同 port 的设备 parse 过了,i915 的行为就是跳过后面相同 ports 的设备。所以,对应这台笔记本的 VBT,i915 会去尝试初始化那个不存在的 eDP,而直接不去初始化 HDMI 子设备,而对于超时而无法初始化的 eDP,i915 的状态会被搞乱掉,从而导致了开头提到的一系列 bug。

怎么修?

对于终端用户来说,目前除了自己编译内核以外别无他法。对 VBT 很熟悉的人可能可以去自己改 BIOS 的 VBT,但是至少我不太敢这么玩。要改的代码很简单,位于 drivers/gpu/drm/i915/display/intel_bios.cparse_ddi_port() 函数:

if (i915->vbt.ports[port]) {
    // drm_dbg_kms(&i915->drm,
    // 	    "More than one child device for port %c in VBT, using the first.\n",
    // 	    port_name(port));
    // return;
    drm_dbg_kms(&i915->drm,
            "I know that i915 stucks with an nonexisting eDP, " 
            "thus although we have more than one child device for port %c in VBT, "
            "I would like to use the latter one and ignore the first!\n",
            port_name(port));
}

其实只要把这个 return; 去掉就行(让后面 HDMI 子设备的配置覆盖掉前面那个不对的 eDP 设备的配置)。另外别去注释掉 Port B 的加载(不然 HDMI 就没了)。这个解决方法也附在了对应的 issue 里面。我在 drm-tip 内核(5.19.0-rc7)上测试,目前还没有发现问题——我没有去看 5.18 和 5.15 LTS 的代码,但是应该也是类似的改法。

这个改法应该不能直接推到 i915 里面,因为说不准别的电脑会依赖于「对相同 port 的子设备使用首个设备」的逻辑。

同样,我也不太清楚联想这个 BIOS VBT 写得是否合理,说不准这个 VBT 是从别的型号的电脑上抄过来改的,然后没删掉第二个 eDP,然后 Intel 的 Windows 驱动能够正常处理这个情况呢。全局来讲最简单的办法恐怕是让联想推个新 BIOS 更新。如果让 i915 来处理,恐怕会变得更麻烦。

此外,尽管 UEFI 设置里可以开启 S3 睡眠,但是我只测试了修改后在 S0ix 下的情况,因为 S0ix——

不也挺好吗

不也挺好吗.jpg

我没有量化对比过 S3 和 S0ix 的耗电情况,只是 S0ix 睡眠恢复后似乎不会让任何东西坏掉,而 S3 睡眠会让触摸板一卡一卡的(好像可以通过重新加载 psmouse 来恢复),休眠(Hibernate)会让 sof 音频固件出问题(可以通过重新加载 snd_sof_pci_intel_tgl 来恢复)。