调试一则:让 i915 在 ThinkPad T14 Gen3 上正常工作
Update 1 (2022/10/30): T14 Gen3 的固件升级 N3MET08W 已经修复了这个问题,使用 fwupdmgr
更新即可。
最近购买了一台 ThinkPad T14 Gen3 作为我的新笔记本,替代掉原来用的内存只有 8GB、SSD 已经快满了的 MBP。到手之后就把 Windows 11 格了装 Arch Linux(本来试了一下 NixOS,结果发现图形界面不知道为什么总是开不出来,最后暂时放弃了)。距离我上次把 Linux 桌面作为主力已经过去了将近 9 年的时间,相比于当时来讲,Linux 桌面的进步其实挺大的,虽然……可能还是没到所谓的「Linux 桌面元年」吧。
出了什么问题?
说到这台笔记本,我的测试是几乎所有硬件都是正常的——除了 Intel 的核显有时候会抽风:启动时候会在 dmesg 吐 call trace,休眠和睡眠恢复的时候也会在 dmesg 吐 call trace,睡眠几次之后 logout,i915 可能会让显示屏开开关关好几分钟才进 GDM 登录页(这时候登录也进不了 gnome-shell 了,只能重启),更重要的是:HDMI 口是坏的,插线完全没反应,也没有任何报错信息,就像这个接口完全不存在一样(虽然用 Type-C 转接头还是可以用上多显示器)。
目前 Arch Linux 最新的内核为 5.18.12.arch1-1。启动时候的 call trace 长成这个样子:
------------[ cut here ]------------
i915 0000:00:02.0: drm_WARN_ON(intel_dp->pps.vdd_wakeref)
WARNING: CPU: 4 PID: 391 at drivers/gpu/drm/i915/display/intel_pps.c:592 intel_pps_vdd_on_unlocked+0x29d/0x2b0 [i915]
Modules linked in: cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus iwlmvm snd_soc_core snd_compress ac97_bus mac80211 snd_pcm_dmaengine libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iwlwifi snd_hda_core snd_hwdep e1000e snd_pcm i2c_i801 intel_lpss_pci spi_intel_pci spi_intel intel_lpss iwlmei i2c_smbus snd_timer mei_me idma64 cfg80211 mei i915(+) thunderbolt processor_thermal_device_pci drm_buddy processor_thermal_device ucsi_acpi ttm processor_thermal_rfim typec_ucsi intel_vsec drm_dp_helper processor_thermal_mbox typec processor_thermal_rapl roles intel_rapl_common intel_gtt igen6_edac wmi mac_hid thinkpad_acpi ledtrig_audio platform_profile snd i2c_hid_acpi i2c_hid soundcore
int3403_thermal int340x_thermal_zone tpm_crb tpm_tis tpm_tis_core tpm video rng_core intel_hid sparse_keymap int3400_thermal acpi_thermal_rel acpi_tad acpi_pad xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rfkill dm_multipath dm_mod sg crypto_user fuse acpi_call(OE) bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq serio_raw atkbd libps2 vivaldi_fmap nvme xhci_pci crc32c_intel nvme_core xhci_pci_renesas i8042 serio
CPU: 4 PID: 391 Comm: systemd-udevd Tainted: G OE 5.18.12-arch1-1 #1 96418c890ae0efcbf26c551e98cb3d72a56d7da8
Hardware name: LENOVO 21AHA014CD/21AHA014CD, BIOS N3MET04W (1.01 ) 05/04/2022
RIP: 0010:intel_pps_vdd_on_unlocked+0x29d/0x2b0 [i915]
Code: 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 4c 8d 4f d4 48 c7 c1 50 d2 d4 c0 4c 89 e2 48 c7 c7 79 2b d6 c0 48 89 c6 e8 ef 54 8b d4 <0f> 0b e9 f3 fd ff ff e8 87 d1 90 d4 0f 1f 80 00 00 00 00 66 0f 1f
RSP: 0018:ffffac5c00e536c0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff9bcf65b2a170 RCX: 0000000000000027
RDX: ffff9bda6f1216a8 RSI: 0000000000000001 RDI: ffff9bda6f1216a0
RBP: ffff9bcf59d70000 R08: 0000000000000000 R09: ffffac5c00e534d0
R10: 0000000000000003 R11: ffff9bda9f6fffe8 R12: ffff9bcf427345a0
R13: 0000000000000005 R14: ffff9bcf59d707d8 R15: 0000000000000000
FS: 00007fc45def2080(0000) GS:ffff9bda6f100000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c86beaa018 CR3: 000000010587c002 CR4: 0000000000f70ee0
PKRU: 55555554
Call Trace:
<TASK>
? intel_display_power_get+0x52/0x60 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
intel_dp_aux_xfer+0x127/0x7a0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
? hrtimer_try_to_cancel+0x19/0x100
intel_dp_aux_transfer+0x202/0x320 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
drm_dp_dpcd_access+0xaa/0x130 [drm_dp_helper feb423ecaa405180d8bd6147672ce2c99671a810]
drm_dp_dpcd_write+0x8a/0xd0 [drm_dp_helper feb423ecaa405180d8bd6147672ce2c99671a810]
intel_dp_set_power+0x67/0x190 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
intel_ddi_post_disable+0x44b/0x4a0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
intel_encoders_post_disable+0x7b/0x90 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
intel_old_crtc_state_disables+0x38/0xa0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
intel_atomic_commit_tail+0x3b8/0x18f0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
? flush_workqueue+0x19d/0x420
intel_atomic_commit+0x33d/0x390 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
intel_modeset_init+0x19d/0x280 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
i915_driver_probe+0x4af/0xda0 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
? intel_modeset_probe_defer+0x4c/0x60 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
? i915_pci_probe+0x43/0x160 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
local_pci_probe+0x42/0x80
pci_device_probe+0xc1/0x220
? sysfs_do_create_link_sd+0x6a/0xd0
really_probe+0x19e/0x370
__driver_probe_device+0xfc/0x170
driver_probe_device+0x1f/0x90
__driver_attach+0xbf/0x1a0
? __device_attach_driver+0xe0/0xe0
bus_for_each_dev+0x84/0xd0
bus_add_driver+0x15d/0x200
driver_register+0x8d/0xe0
i915_init+0x23/0x80 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
? 0xffffffffc0e8c000
do_one_initcall+0x5a/0x220
do_init_module+0x4a/0x240
__do_sys_init_module+0x138/0x1b0
do_syscall_64+0x5c/0x90
? exc_page_fault+0x74/0x170
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc45e89899e
Code: 48 8b 0d fd a3 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ca a3 0e 00 f7 d8 64 89 01 48
RSP: 002b:00007fff568057d8 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
RAX: ffffffffffffffda RBX: 000055c86bc79b80 RCX: 00007fc45e89899e
RDX: 00007fc45e9ff343 RSI: 000000000070502e RDI: 00007fc45c2b2010
RBP: 00007fc45e9ff343 R08: 0000000000261000 R09: 85ebca77c2b2ae63
R10: 0000000000015f91 R11: 0000000000000246 R12: 0000000000020000
R13: 000055c86bc757f0 R14: 000055c86bc79b80 R15: 000055c86bc95eb0
</TASK>
---[ end trace 0000000000000000 ]---
而睡眠/休眠之后的 call trace:
------------[ cut here ]------------
i915 0000:00:02.0: i915 raw-wakerefs=1 wakelocks=1 on cleanup
WARNING: CPU: 1 PID: 2859 at drivers/gpu/drm/i915/intel_runtime_pm.c:629 intel_runtime_pm_driver_release+0x4e/0x60 [i915]
Modules linked in: uinput ccm rfcomm cmac algif_hash algif_skcipher af_alg snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep btusb btrtl btbcm btintel btmtk bluetooth ecdh_generic snd_ctl_led crc16 snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes joydev mousedev uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev snd_soc_dmic usbhid mc hid_multitouch iTCO_wdt spi_nor intel_pmc_bxt iTCO_vendor_support mtd mei_hdcp mei_pxp pmt_telemetry pmt_class intel_rapl_msr think_lmi firmware_attributes_class wmi_bmof intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd rapl intel_cstate intel_uncore pcspkr psmouse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel
soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus iwlmvm snd_soc_core snd_compress ac97_bus mac80211 snd_pcm_dmaengine libarc4 snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec iwlwifi snd_hda_core snd_hwdep e1000e snd_pcm i2c_i801 intel_lpss_pci spi_intel_pci spi_intel intel_lpss iwlmei i2c_smbus snd_timer mei_me idma64 cfg80211 mei i915 thunderbolt processor_thermal_device_pci drm_buddy processor_thermal_device ucsi_acpi ttm processor_thermal_rfim typec_ucsi intel_vsec drm_dp_helper processor_thermal_mbox typec processor_thermal_rapl roles intel_rapl_common intel_gtt igen6_edac wmi mac_hid thinkpad_acpi ledtrig_audio platform_profile snd i2c_hid_acpi i2c_hid soundcore int3403_thermal int340x_thermal_zone tpm_crb tpm_tis tpm_tis_core tpm video rng_core intel_hid sparse_keymap int3400_thermal acpi_thermal_rel
acpi_tad acpi_pad xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rfkill dm_multipath dm_mod sg crypto_user fuse acpi_call(OE) bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq serio_raw atkbd libps2 vivaldi_fmap nvme xhci_pci crc32c_intel nvme_core xhci_pci_renesas i8042 serio
CPU: 1 PID: 2859 Comm: kworker/u32:19 Tainted: G W OE 5.18.12-arch1-1 #1 96418c890ae0efcbf26c551e98cb3d72a56d7da8
Hardware name: LENOVO 21AHA014CD/21AHA014CD, BIOS N3MET04W (1.01 ) 05/04/2022
Workqueue: events_unbound async_run_entry_fn
RIP: 0010:intel_runtime_pm_driver_release+0x4e/0x60 [i915]
Code: b7 d9 48 8b 6f 50 48 85 ed 75 03 48 8b 2f e8 b9 2a 62 d4 45 89 e0 89 d9 48 89 ea 48 89 c6 48 c7 c7 20 93 d2 c0 e8 5e f2 9d d4 <0f> 0b 5b 5d 41 5c c3 cc 66 2e 0f 1f 84 00 00 00 00 00 66 0f 1f 00
RSP: 0018:ffffac5c01857db8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000027
RDX: ffff9bda6f0616a8 RSI: 0000000000000001 RDI: ffff9bda6f0616a0
RBP: ffff9bcf427345a0 R08: ffffffff96ee31c0 R09: 0000000000000000
R10: ffffffffffffffff R11: ffff9bda9f72510a R12: 0000000000000001
R13: ffff9bcf59d764d0 R14: 0000000000000000 R15: ffff9bcf5d1edaa8
FS: 0000000000000000(0000) GS:ffff9bda6f040000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005581377c6396 CR3: 0000000838a10002 CR4: 0000000000f70ee0
PKRU: 55555554
Call Trace:
<TASK>
i915_drm_suspend_late+0xed/0x110 [i915 c225ea1470c00163fb5e68b44bc6b2ae3d333a49]
? pci_pm_poweroff_late+0x40/0x40
dpm_run_callback+0x47/0x150
__device_suspend_late+0xb4/0x230
async_suspend_late+0x1e/0x90
async_run_entry_fn+0x31/0x130
process_one_work+0x1c4/0x380
worker_thread+0x51/0x380
? rescuer_thread+0x3a0/0x3a0
kthread+0xdb/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
---[ end trace 0000000000000000 ]---
听朋友吐槽说 i915 驱动的代码质量很糟糕,可能确实如此。去网上搜了一圈,报告相关错误的贴不在少数:
- [ADL_P] Dual eDP support is missing, PPS state tracking gets confused, backlight does not work
- i915 - Oops during boot with i7-1270p
- Thinkpad T14 Gen 3 has no available HDMI ports under Linux.
我也看过了 Arch Wiki 的 Intel Graphics 页面,没啥帮助。
编译新内核
根据 i915 wiki 的提示,先尝试看看新版内核有没有把这个 bug 修复好。参考 Build Guide 和 Arch Wiki 的 Kernel/Traditional compilation 编译 drm-tip 内核,并且在 /etc/default/grub
里面加 drm.debug=0x1e log_buf_len=1M
参数来输出更多的调试信息,结果还是不行。
之后看了一下 “Dual eDP support is missing” 这个 issue,发现提 issue 的人给了一个简单粗暴的 workaround:
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -8835,7 +8835,7 @@ static void intel_setup_outputs(struct drm_i915_private *dev_priv)
intel_ddi_init(dev_priv, PORT_TC1);
} else if (IS_ALDERLAKE_P(dev_priv)) {
intel_ddi_init(dev_priv, PORT_A);
- intel_ddi_init(dev_priv, PORT_B);
+ // intel_ddi_init(dev_priv, PORT_B);
intel_ddi_init(dev_priv, PORT_TC1);
intel_ddi_init(dev_priv, PORT_TC2);
intel_ddi_init(dev_priv, PORT_TC3);
并且在打开了 drm.debug=0x1e
之后,有这样的提示:
i915 0000:00:02.0: [drm:intel_dp_aux_xfer [i915]] AUX B/DDI B/PHY B: timeout (status 0x7d4003ff)
i915 0000:00:02.0: [drm:intel_dp_aux_xfer [i915]] AUX B/DDI B/PHY B: timeout (status 0x7d4003ff)
...
i915 0000:00:02.0: [drm:drm_dp_dpcd_access [drm_dp_helper]] AUX B/DDI B/PHY B: Too many retries, giving up. First error: -110
我的电脑显然没有内置两个显示屏,所以测试了一下把这行注释掉,然后重新编译,重启,发现 i915 不再吐 call trace 了,但是 HDMI 还是不太正常。和 @rtxux 聊时候他说「盲猜 VBT 有问题」,于是我接下来就往这个方向去调试。
VBT
VBT 是 Video BIOS Table,似乎是 Intel 自己的一套东西。VBT 包含了显示硬件相关的配置信息。intel-gpu-tools
这个包里面有解析 VBT 的工具 intel_vbt_decode
,我们可以来看一下 VBT 里面到底记录了什么内容:
# cat /sys/kernel/debug/dri/0/i915_vbt > /tmp/i915_vbt
# intel_vbt_decode --file=/tmp/i915_vbt
可以观察到 VBT 大致的结构:
- VBT header,记录 VBT 的 signature、版本、checksum 等信息;
- BDB header,BDB = BIOS Data Block。BDB 是真正记录显示信息的地方;
- 接下来是多个 BDB。对于我的设备来讲,block 2 是 general definitions,包含了各个 child device 的信息,也是后面关注的地方。其他的 block 不太重要,忽略。
BDB block 2 的大致信息如下,只列出了最后用到的信息(其实看完下面的输出之后可能就能猜到是什么原因了,但是实际上 vbt decode 输出的项非常多,所以我当时没有发现某个重要的线索):
BDB block 2 - General definitions block:
// ...
Child device size: 39
Child device count: 10
Child device info:
Device handle: 0x0008 (LFP 1 (eDP))
Device type: 0x1806 (unknown)
Internal connector
DisplayPort output
Digital output
// ...
DVO Port: 0x0a (DP-A)
// ...
Child device info:
Device handle: 0x0080 (LFP 2 (eDP))
Device type: 0x1806 (unknown)
Internal connector
DisplayPort output
Digital output
// ...
DVO Port: 0x07 (DP-B)
// ...
Child device info:
Device handle: 0x0004 (EFP 1 (HDMI/DVI/DP))
Device type: 0x60d2 (DVI-D)
Power management
Hotplug signaling
HDMI output
Content protection
High speed link
TMDS/DVI signaling
Digital output
// ...
DVO Port: 0x07 (DP-B)
// ...
Child device info:
Device handle: 0x0040 (EFP 2 (HDMI/DVI/DP))
Device type: 0x68c6 (DisplayPort)
Power management
Hotplug signaling
Content protection
High speed link
DisplayPort output
Digital output
// ...
DVO Port: 0x0d (unknown)
// ...
Child device info:
Device handle: 0x0020 (EFP 3 (HDMI/DVI/DP))
Device type: 0x68c6 (DisplayPort)
Power management
Hotplug signaling
Content protection
High speed link
DisplayPort output
Digital output
// ...
DVO Port: 0x0f (unknown)
// ...
Child device info:
Device handle: 0x0010 (EFP 4 (HDMI/DVI/DP))
Device type: 0x68c6 (DisplayPort)
Power management
Hotplug signaling
Content protection
High speed link
DisplayPort output
Digital output
// ...
DVO Port: 0x11 (unknown)
// ...
Child device info:
Device handle: 0x0002 (unknown)
Device type: 0x68c6 (DisplayPort)
Power management
Hotplug signaling
Content protection
High speed link
DisplayPort output
Digital output
// ...
DVO Port: 0x13 (unknown)
// ...
开启调试输出后对应启动时的输出:
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port A VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:1 LSPCON:0 USB-Type-C:0 TBT:0 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port A VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port B VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:1 LSPCON:0 USB-Type-C:0 TBT:0 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port B VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] More than one child device for port B in VBT, using the first.
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port D VBT DP max link rate: 810000
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port E VBT DP max link rate: 810000
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port F VBT DP max link rate: 810000
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT info: CRT:0 DVI:0 HDMI:0 DP:1 eDP:0 LSPCON:0 USB-Type-C:1 TBT:1 DSC:0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT HDMI level shift: 0
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT HDMI max TMDS clock: 297000 kHz
i915 0000:00:02.0: [drm:intel_bios_init [i915]] Port G VBT DP max link rate: 810000
VBT info
在内核中的输出代码如下:
is_dvi = intel_bios_encoder_supports_dvi(devdata);
is_dp = intel_bios_encoder_supports_dp(devdata);
is_crt = intel_bios_encoder_supports_crt(devdata);
is_hdmi = intel_bios_encoder_supports_hdmi(devdata);
is_edp = intel_bios_encoder_supports_edp(devdata);
supports_typec_usb = intel_bios_encoder_supports_typec_usb(devdata);
supports_tbt = intel_bios_encoder_supports_tbt(devdata);
drm_dbg_kms(&i915->drm,
"Port %c VBT info: CRT:%d DVI:%d HDMI:%d DP:%d eDP:%d LSPCON:%d USB-Type-C:%d TBT:%d DSC:%d\n",
port_name(port), is_crt, is_dvi, is_hdmi, is_dp, is_edp,
HAS_LSPCON(i915) && child->lspcon,
supports_typec_usb, supports_tbt,
devdata->dsc != NULL);
但是可以看到,所有的 Port 对应输出 HDMI 支持都是 HDMI:0
,表明 i915 认为它们不支持 HDMI 编码。是 intel_bios_encoder_supports_hdmi
的问题吗?我一开始是这么认为的,然后去看了一下它的实现:
bool
intel_bios_encoder_supports_dvi(const struct intel_bios_encoder_data *devdata)
{
return devdata->child.device_type & DEVICE_TYPE_TMDS_DVI_SIGNALING;
}
bool
intel_bios_encoder_supports_hdmi(const struct intel_bios_encoder_data *devdata)
{
return intel_bios_encoder_supports_dvi(devdata) &&
(devdata->child.device_type & DEVICE_TYPE_NOT_HDMI_OUTPUT) == 0;
}
其中 DEVICE_TYPE_TMDS_DVI_SIGNALING
值为 1 << 4
,DEVICE_TYPE_NOT_HDMI_OUTPUT
值为 1 << 11
。如果识别到了正确的设备,那么:
Python 3.10.5 (main, Jun 6 2022, 18:49:26) [GCC 12.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 0x60d2 & (1 << 4)
16
>>> 0x60d2 & (1 << 11)
0
>>> 0x1806 & (1 << 4)
0
>>> 0x1806 & (1 << 11)
2048
>>> 0x68c6 & (1 << 4)
0
>>> 0x68c6 & (1 << 11)
2048
对于上面 VBT dump 里面三类不同的子设备,如果 intel_bios_encoder_supports_hdmi
值不为 true,那么只有可能是 0x1806
(eDP)或者 0x68c6
(DP)。为了验证,我添加了一下输出:
drm_dbg_kms(&i915->drm,
"Port %c Device type: %x",
port_name(port), devdata->child.device_type);
编译安装重启后的日志验证了猜测:HDMI 设备(0x60d2
)没有被 i915 识别到。一开始我在想:会不会是 i915 读 VBT 的时候出了问题?但是仔细读日志看到了这一行:
More than one child device for port B in VBT, using the first.
所以 port B 其实有不止一个设备?重新看 decode 的输出,发现那个不存在的第二个 eDP 和 HDMI 子设备占用了同一个 DVO Port(DVO = Digital Video Out)。去看了一下对应输出的内核源代码:
static void parse_ddi_port(struct intel_bios_encoder_data *devdata)
{
struct drm_i915_private *i915 = devdata->i915;
const struct child_device_config *child = &devdata->child;
enum port port;
port = dvo_port_to_port(i915, child->dvo_port);
if (port == PORT_NONE)
return;
if (!is_port_valid(i915, port)) {
drm_dbg_kms(&i915->drm,
"VBT reports port %c as supported, but that can't be true: skipping\n",
port_name(port));
return;
}
if (i915->vbt.ports[port]) {
drm_dbg_kms(&i915->drm,
"More than one child device for port %c in VBT, using the first.\n",
port_name(port));
return;
}
sanitize_device_type(devdata, port);
if (intel_bios_encoder_supports_dvi(devdata))
sanitize_ddc_pin(devdata, port);
if (intel_bios_encoder_supports_dp(devdata))
sanitize_aux_ch(devdata, port);
i915->vbt.ports[port] = devdata;
}
整个 i915 源代码只有这个会写入 i915->vbt.ports[port]
,而调用 parse_ddi_port()
的函数只有一个地方:
static void parse_ddi_ports(struct drm_i915_private *i915)
{
struct intel_bios_encoder_data *devdata;
enum port port;
if (!has_ddi_port_info(i915))
return;
list_for_each_entry(devdata, &i915->vbt.display_devices, node)
parse_ddi_port(devdata);
for_each_port(port) {
if (i915->vbt.ports[port])
print_ddi_port(i915->vbt.ports[port], port);
}
}
对 VBT 里面每个显示设备循环 parse。所以 parse_ddi_port()
做的事情就是生成 i915->vbt.ports
(它的类型是 struct intel_bios_encoder_data *ports[I915_MAX_PORTS];
)。而如果 ports
数组中对应的 port 已经有值了,那么说明之前已经有相同 port 的设备 parse 过了,i915 的行为就是跳过后面相同 ports 的设备。所以,对应这台笔记本的 VBT,i915 会去尝试初始化那个不存在的 eDP,而直接不去初始化 HDMI 子设备,而对于超时而无法初始化的 eDP,i915 的状态会被搞乱掉,从而导致了开头提到的一系列 bug。
怎么修?
对于终端用户来说,目前除了自己编译内核以外别无他法。对 VBT 很熟悉的人可能可以去自己改 BIOS 的 VBT,但是至少我不太敢这么玩。要改的代码很简单,位于 drivers/gpu/drm/i915/display/intel_bios.c
的 parse_ddi_port()
函数:
if (i915->vbt.ports[port]) {
// drm_dbg_kms(&i915->drm,
// "More than one child device for port %c in VBT, using the first.\n",
// port_name(port));
// return;
drm_dbg_kms(&i915->drm,
"I know that i915 stucks with an nonexisting eDP, "
"thus although we have more than one child device for port %c in VBT, "
"I would like to use the latter one and ignore the first!\n",
port_name(port));
}
其实只要把这个 return;
去掉就行(让后面 HDMI 子设备的配置覆盖掉前面那个不对的 eDP 设备的配置)。另外别去注释掉 Port B 的加载(不然 HDMI 就没了)。这个解决方法也附在了对应的 issue 里面。我在 drm-tip 内核(5.19.0-rc7)上测试,目前还没有发现问题——我没有去看 5.18 和 5.15 LTS 的代码,但是应该也是类似的改法。
这个改法应该不能直接推到 i915 里面,因为说不准别的电脑会依赖于「对相同 port 的子设备使用首个设备」的逻辑。
同样,我也不太清楚联想这个 BIOS VBT 写得是否合理,说不准这个 VBT 是从别的型号的电脑上抄过来改的,然后没删掉第二个 eDP,然后 Intel 的 Windows 驱动能够正常处理这个情况呢。全局来讲最简单的办法恐怕是让联想推个新 BIOS 更新。如果让 i915 来处理,恐怕会变得更麻烦。
此外,尽管 UEFI 设置里可以开启 S3 睡眠,但是我只测试了修改后在 S0ix 下的情况,因为 S0ix——
不也挺好吗.jpg
我没有量化对比过 S3 和 S0ix 的耗电情况,只是 S0ix 睡眠恢复后似乎不会让任何东西坏掉,而 S3 睡眠会让触摸板一卡一卡的(好像可以通过重新加载 psmouse
来恢复),休眠(Hibernate)会让 sof 音频固件出问题(可以通过重新加载 snd_sof_pci_intel_tgl
来恢复)。
Comments