#125 G43T: NIC NVM fails e1000 checksum, ethernet device does not work

Open
opened 1 year ago by biovoid · 1 comments
biovoid commented 1 year ago

After flashing the precompiled 20220710 release on G45T/G43T-AM3, Linux gives the message at boot: e1000e [...] The NVM Checksum Is Not Valid. The eth device is not available in-OS. The factory ROM does the same after flash, but clearing CMOS corrects it. Doing so does not fix Libreboot.

I eventually worked around this by patching the e1000 driver to skip the checksum check, as documented here: https://superuser.com/a/1106641 . But there is probably a better solution.

Following are some notes from myself and Leah on IRC:

23:34 < biovoid> this issue has been solved in the past by others under factory, by using intel's proprietary tool to recalculate the checksum (https://superuser.com/questions/1197908)
23:36 < biovoid> trying a newer kernel, as this seems nearly related: https://bugzilla.kernel.org/show_bug.cgi?id=213667
23:37 < biovoid> but I fear I will have to fall back to this more complicated workaround: https://superuser.com/a/1106641

(kernel update didn't matter, but the "complicated workaround" included information on where to patch out the checksum)

01:03 < biovoid> I managed to get it working, by commenting out the checksum check in the source and loading it as a dkms module
01:18 < biovoid> (I did also need to change the MAC address in-OS)
01:39 <@leah> biovoid: can you document all that on the bug tracker?
01:39 <@leah> or actually
01:39 <@leah> document it on lbwww
01:39 <@leah> email me a patch
01:43 <@leah> perhaps a kernel patch can be submitted, so that you can disable the check in a kernel option
01:43 <@leah> e.g. nogbechecksum
01:47 <@leah> shouldn't be too hard
01:48 <@leah> one for the BSDs could be done too
01:48 <@leah> e.g. /boot/loader.conf in freebsd
01:49 <@leah> disabling that check is a bad idea in most cases, but having at as a kernel argument would be great. unless there's another way besides skipping that check
01:50 <@leah> i bet i could replace that intel tool
01:50 <@leah> is it just reading gbe?
01:51 <@leah> the real question is, why is the checksum bad
01:52 <@leah> biovoid: i *think* maybe the vendor BIOS is setting the MAC address, and then you get the checksum correct. could you check?
01:52 <@leah> set the mac address to what's on the sticker
01:52 <@leah> i think  on that board there's a file in cbfs where you set the mac address
01:52 <@leah> like on g41m boards
01:54 <@leah> biovoid: in that stackexchange link they're fixing the checksum
01:55 <@leah> biovoid: can you look at how ethtool is doing that? look how that works
01:55 <@leah> biovoid: i want to know how to write nvm (gbe) on that board
01:55 <@leah> i could probably whip something up in coreboot
01:56 <@leah> even if the nvm is read-only as on nic/board, i gather that some people correct the checksum at boot time
01:56 <@leah> the nvm is likely loaded somewhere in memory
01:56 <@leah> mmio
01:56 <@leah> so yeah maybe just
01:56 <@leah> write a checksum from coreboot
01:56 <@leah> mac address and checksum
01:56 <@leah> i could adapt nvmutil code for that!
01:57 <@leah> then you wouldn't need to mess about
01:57 <@leah> but yeah put all you wrote and all i wrote, and those links, in a bug tracker page of libreboot
01:57 <@leah> i'll look into it
01:57 <@leah> fixing broken hacky intel shit could be fun :S
01:58 <@leah> your current solution is to disable the checksum verification, but i believe this step can be bypassed
01:58 <@leah> that board is descriptorless but has a working intel nic
01:59 <@leah> which means the gbe rom is likely read-only
01:59 <@leah> baked in to the nic
01:59 <@leah> but when you boot, its loaded into memory
01:59 <@leah> and you could just set it from coreboot probably
01:59 <@leah> before linux/bsd sees it
01:59 <@leah> in other words
01:59 <@leah> this is *for coreboot* to fix
02:00 <@leah> i only need to figure out how to write it
02:00 <@leah> the actual checksum calculation is easy
02:00 <@leah> in nvm, every word is 2-bytes little endian in the rom. add them all up accordingly, truncated, 0x00 to 0x3f, and it should become 0xBABA
02:01 <@leah> word 0x3f is the checksum word, it is changed so that the result is 0xBABA
02:01 <@leah> the hardware does not enforce this
02:01 <@leah> only linux/bsd etc will check it. it's a software validation thing, totally optional, that's why you can get it working with the check disabled
02:02 <@leah> what's in rom is probably already wrong, but bios is setting the mac
02:02 <@leah> and setting checksum
02:02 <@leah> vendor bios i mean
02:02 <@leah> and by rom here i refer to gbe, not on the main flash
02:02 <@leah> it almost certainly is a rom
02:02 <@leah> probably a mask rom embedded deep in the nic
02:02 <@leah> but you can modify it memory-mapped in a volatile way. that's likely the way this goes
02:09 <@leah> biovoid: i recall that in coreboot land, you do set the mac address, but you're told to use the assigned one
02:10 <@leah> i'm not 100% sure but i think coreboot can already set the mac,
02:10 <@leah> but no idea whether it sets a checksum
02:10 <@leah> everything i've told you today could be bollocks btw
02:10 <@leah> make a report. write everything you and i have written, and those links, in a bug tracker page on libreboot
02:10 <@leah> i will look into this myself at a later date. unless you can fashion something in the meantime
02:11 <@leah> i think attacking this from coreboot is the Most Correct Way
02:11 <@leah> basically what ethtool is doing but from coreboot

09:09 < biovoid> I wondered if passing `---ifd -i bios` on flash and leaving gbe alone might get around it. but that just gave me aforementioned brick
09:09 < biovoid> not sure if that's actually sound, or if that's how that works
After flashing the precompiled 20220710 release on G45T/G43T-AM3, Linux gives the message at boot: `e1000e [...] The NVM Checksum Is Not Valid`. The eth device is not available in-OS. The factory ROM does the same after flash, but clearing CMOS corrects it. Doing so does not fix Libreboot. I eventually worked around this by patching the e1000 driver to skip the checksum check, as documented here: https://superuser.com/a/1106641 . But there is probably a better solution. Following are some notes from myself and Leah on IRC: ``` 23:34 < biovoid> this issue has been solved in the past by others under factory, by using intel's proprietary tool to recalculate the checksum (https://superuser.com/questions/1197908) 23:36 < biovoid> trying a newer kernel, as this seems nearly related: https://bugzilla.kernel.org/show_bug.cgi?id=213667 23:37 < biovoid> but I fear I will have to fall back to this more complicated workaround: https://superuser.com/a/1106641 ``` (kernel update didn't matter, but the "complicated workaround" included information on where to patch out the checksum) ``` 01:03 < biovoid> I managed to get it working, by commenting out the checksum check in the source and loading it as a dkms module 01:18 < biovoid> (I did also need to change the MAC address in-OS) 01:39 <@leah> biovoid: can you document all that on the bug tracker? 01:39 <@leah> or actually 01:39 <@leah> document it on lbwww 01:39 <@leah> email me a patch 01:43 <@leah> perhaps a kernel patch can be submitted, so that you can disable the check in a kernel option 01:43 <@leah> e.g. nogbechecksum 01:47 <@leah> shouldn't be too hard 01:48 <@leah> one for the BSDs could be done too 01:48 <@leah> e.g. /boot/loader.conf in freebsd 01:49 <@leah> disabling that check is a bad idea in most cases, but having at as a kernel argument would be great. unless there's another way besides skipping that check 01:50 <@leah> i bet i could replace that intel tool 01:50 <@leah> is it just reading gbe? 01:51 <@leah> the real question is, why is the checksum bad 01:52 <@leah> biovoid: i *think* maybe the vendor BIOS is setting the MAC address, and then you get the checksum correct. could you check? 01:52 <@leah> set the mac address to what's on the sticker 01:52 <@leah> i think on that board there's a file in cbfs where you set the mac address 01:52 <@leah> like on g41m boards 01:54 <@leah> biovoid: in that stackexchange link they're fixing the checksum 01:55 <@leah> biovoid: can you look at how ethtool is doing that? look how that works 01:55 <@leah> biovoid: i want to know how to write nvm (gbe) on that board 01:55 <@leah> i could probably whip something up in coreboot 01:56 <@leah> even if the nvm is read-only as on nic/board, i gather that some people correct the checksum at boot time 01:56 <@leah> the nvm is likely loaded somewhere in memory 01:56 <@leah> mmio 01:56 <@leah> so yeah maybe just 01:56 <@leah> write a checksum from coreboot 01:56 <@leah> mac address and checksum 01:56 <@leah> i could adapt nvmutil code for that! 01:57 <@leah> then you wouldn't need to mess about 01:57 <@leah> but yeah put all you wrote and all i wrote, and those links, in a bug tracker page of libreboot 01:57 <@leah> i'll look into it 01:57 <@leah> fixing broken hacky intel shit could be fun :S 01:58 <@leah> your current solution is to disable the checksum verification, but i believe this step can be bypassed 01:58 <@leah> that board is descriptorless but has a working intel nic 01:59 <@leah> which means the gbe rom is likely read-only 01:59 <@leah> baked in to the nic 01:59 <@leah> but when you boot, its loaded into memory 01:59 <@leah> and you could just set it from coreboot probably 01:59 <@leah> before linux/bsd sees it 01:59 <@leah> in other words 01:59 <@leah> this is *for coreboot* to fix 02:00 <@leah> i only need to figure out how to write it 02:00 <@leah> the actual checksum calculation is easy 02:00 <@leah> in nvm, every word is 2-bytes little endian in the rom. add them all up accordingly, truncated, 0x00 to 0x3f, and it should become 0xBABA 02:01 <@leah> word 0x3f is the checksum word, it is changed so that the result is 0xBABA 02:01 <@leah> the hardware does not enforce this 02:01 <@leah> only linux/bsd etc will check it. it's a software validation thing, totally optional, that's why you can get it working with the check disabled 02:02 <@leah> what's in rom is probably already wrong, but bios is setting the mac 02:02 <@leah> and setting checksum 02:02 <@leah> vendor bios i mean 02:02 <@leah> and by rom here i refer to gbe, not on the main flash 02:02 <@leah> it almost certainly is a rom 02:02 <@leah> probably a mask rom embedded deep in the nic 02:02 <@leah> but you can modify it memory-mapped in a volatile way. that's likely the way this goes 02:09 <@leah> biovoid: i recall that in coreboot land, you do set the mac address, but you're told to use the assigned one 02:10 <@leah> i'm not 100% sure but i think coreboot can already set the mac, 02:10 <@leah> but no idea whether it sets a checksum 02:10 <@leah> everything i've told you today could be bollocks btw 02:10 <@leah> make a report. write everything you and i have written, and those links, in a bug tracker page on libreboot 02:10 <@leah> i will look into this myself at a later date. unless you can fashion something in the meantime 02:11 <@leah> i think attacking this from coreboot is the Most Correct Way 02:11 <@leah> basically what ethtool is doing but from coreboot 09:09 < biovoid> I wondered if passing `---ifd -i bios` on flash and leaving gbe alone might get around it. but that just gave me aforementioned brick 09:09 < biovoid> not sure if that's actually sound, or if that's how that works ```
Leah Rowe commented 1 year ago
Owner

i've added notes about this, here: https://libreboot.org/docs/hardware/acer_g43t-am3.html

it's linked from the main hardware compatibility list

could you send me the factory rom for your board, i want to see if i can glean anything from that (i doubt i'll find anything)

i think we need to set that checksum from coreboot, because the default ROM size for that board is 2MB right? probably not IFD-based, so it's running descriptorless which means ICH7 behaviour. which means GbE config is on an eeprom probably, for the NIC. it probably does actually have a bad checksum but the factory bios is actually setting the checksum at boot, or loading entirely different info (with your mac address in there as a result) - i doubt it. it probably is just setting checksum

figuring that out (how to set checksum) is where i'd look first, assuming that's it

study those tools talked about on that stackexchange post, see what they do

i've added notes about this, here: <https://libreboot.org/docs/hardware/acer_g43t-am3.html> it's linked from the main hardware compatibility list could you send me the factory rom for your board, i want to see if i can glean anything from that (i doubt i'll find anything) i think we need to set that checksum from coreboot, because the default ROM size for that board is 2MB right? probably not IFD-based, so it's running descriptorless which means ICH7 behaviour. which means GbE config is on an eeprom probably, for the NIC. it probably does actually have a bad checksum but the factory bios is actually setting the checksum at boot, or loading entirely different info (with your mac address in there as a result) - i doubt it. it probably is just setting checksum figuring that out (how to set checksum) is where i'd look first, assuming that's it study those tools talked about on that stackexchange post, see what they do
Sign in to join this conversation.
No Label
No Milestone
No assignee
2 Participants
Loading...
Cancel
Save
There is no content yet.