#195 Higher and Constant CPU Usage on Arch Linux compared to Fedora.

Closed
opened 2 years ago by pureshores · 24 comments

Hello, first of all thank you for this project!

I have a question, so I've been switching between Fedora and Arch, one of the things i noticed is that Arch's CPU usage is quite high and constant compared to Fedora.

What i mean by constant is regardless of what action taking place in the game its CPU usage is the same, unlike Fedora

I'm not so sure how what causes this, i use the same Wine runner in Lutris so i don't think it's wine but in the Kernel?

The first screenshot is from Arch then the second is a simulated picture of what the CPU usage on Fedora would look like since don't have my Fedora install anymore so i was not able to test there. also this was in an open area in the game.

Thank you very much!

Edit: Only happens on this game, every other game is the same from what i can remember.

Hello, first of all thank you for this project! I have a question, so I've been switching between Fedora and Arch, one of the things i noticed is that Arch's CPU usage is quite high and constant compared to Fedora. What i mean by constant is regardless of what action taking place in the game its CPU usage is the same, unlike Fedora I'm not so sure how what causes this, i use the same Wine runner in Lutris so i don't think it's wine but in the Kernel? The first screenshot is from Arch then the second is a simulated picture of what the CPU usage on Fedora would look like since don't have my Fedora install anymore so i was not able to test there. also this was in an open area in the game. Thank you very much! Edit: Only happens on this game, every other game is the same from what i can remember.
Krock commented 2 years ago
Owner

@pureshores From the game's point of view, there's no difference at all. It must be caused by a difference in your setup, such as fsync support in the kernel (if used), enabled CPU features (in case of a BIOS reset) or the active graphics driver.

Also note that the CPU usage might increase over long game sessions, and depending on the actions that are happening simultaneously.

If your Fedora installation still exists, consider comparing the configurations. The packages are vastly identical across distributions, hence their configuration and version are the only variables.

@pureshores From the game's point of view, there's no difference at all. It must be caused by a difference in your setup, such as fsync support in the kernel (if used), enabled CPU features (in case of a BIOS reset) or the active graphics driver. Also note that the CPU usage might increase over long game sessions, and depending on the actions that are happening simultaneously. If your Fedora installation still exists, consider comparing the configurations. The packages are vastly identical across distributions, hence their configuration and version are the only variables.
pureshores commented 2 years ago
Poster

Thanks for the reply, I'll recheck Fedora this weekend if time allows.

Thanks for the reply, I'll recheck Fedora this weekend if time allows.
pureshores commented 2 years ago
Poster

So i reinstalled Fedora 34 and upgraded to Fedora 35 Beta to make sure "too new libraries" isn't the cause but still exhibits the same correct behavior as on Fedora 34

I have no idea why this happens on Arch, i believe it's been like this ever since i started playing, i actually tried using fsync before as esync would lag badly due to the CPU usage reaching 90-95%.

I tried changing scheduler to performance just to make sure it's not CPU scheduling issue, and nope still the same CPU usage on Arch.

My last theory would be it's caused by NVIDIA driver not playing nice with rolling release software but it's fine with other games so might not be the case.

So i think I'm gonna close this as i can't think of any explanation why this happens.

So i reinstalled Fedora 34 and upgraded to Fedora 35 Beta to make sure "too new libraries" isn't the cause but still exhibits the same correct behavior as on Fedora 34 I have no idea why this happens on Arch, i believe it's been like this ever since i started playing, i actually tried using fsync before as esync would lag badly due to the CPU usage reaching 90-95%. I tried changing scheduler to performance just to make sure it's not CPU scheduling issue, and nope still the same CPU usage on Arch. My last theory would be it's caused by NVIDIA driver not playing nice with rolling release software but it's fine with other games so might not be the case. So i think I'm gonna close this as i can't think of any explanation why this happens.
pureshores commented 2 years ago
Poster

I think i found the issue I'll post it here in case someone having a similar issue and has a Pre-Rocket Lake CPU (Especially Haswell below where mitigations have the most impact)

The higher CPU usage is caused by the mitigations which Arch Linux has all enabled, while Fedora has 3 of the mitigations disabled, Turning off all mitigations in Arch would result to same behavior as Fedora but not exactly the same when only 3 mitigations disabled, so no idea what factors Fedora used to keep the CPU usage low (My guess is Fedora might not have updated Microcode as Arch). Edit: Invalid. See Below. DO NOT DISABLE MITIGATIONS!

~~I think i found the issue I'll post it here in case someone having a similar issue and has a Pre-Rocket Lake CPU (Especially Haswell below where mitigations have the most impact)~~ ~~The higher CPU usage is caused by the mitigations which Arch Linux has all enabled, while Fedora has 3 of the mitigations disabled, Turning off all mitigations in Arch would result to same behavior as Fedora but not exactly the same when only 3 mitigations disabled, so no idea what factors Fedora used to keep the CPU usage low (My guess is Fedora might not have updated Microcode as Arch).~~ Edit: Invalid. See Below. **DO NOT DISABLE MITIGATIONS!**
ppplayer commented 2 years ago

Hi @pureshores could you elaborate on the list of mitigations you enalbed/disabled to address the issue? I'm using a Kaby Lake CPU on archlinux with a few CPU vulnerability mitigations enabled. So I would be curious to try if a similar performance fix applies to me.

Hi @pureshores could you elaborate on the list of mitigations you enalbed/disabled to address the issue? I'm using a Kaby Lake CPU on archlinux with a few CPU vulnerability mitigations enabled. So I would be curious to try if a similar performance fix applies to me.
pureshores commented 2 years ago
Poster

@ppwwyyxx the 3 mitigations mentioned/disabled are MDS, Spec Store Bypass and SRBDS, the rest are enabled. Edit: This was a red herring DO NOT DISABLE MITIGATIONS!

@ppwwyyxx ~~the 3 mitigations mentioned/disabled are MDS, Spec Store Bypass and SRBDS, the rest are enabled.~~ Edit: This was a red herring **DO NOT DISABLE MITIGATIONS!**
pureshores commented 2 years ago
Poster

Update: Turning off mitigations was a red herring. just noticed my CPU usage is still actually high, I'm not sure of the mechanism on why that worked before.

So i did some "analysis" (through a lot of guesses/theory, lol).

The actual issue lies in the difference between the kernel configuration, specifically the kernel tick rate CONFIG_HZ, Arch kernel linux has tick rate set to 300 while Fedora is set to 1000 Hz, so it seems that the game is sensitive to tick rate?

2nd issue, the game shows higher CPU usage when using Fsync (with 1000 Hz config) but not Esync, so i guess this game seems to run better with Esync. (Tried another Unity game (Subnautica) and it doesn't seem to be affected))

I did a small benchmark (only on login screen), Arch Esync with CONFIG_HZ of 300 has the highest of all, while linux-zen's CONFIG_HZ of 1000 Hz shows less usage but a bit higher with Fsync enabled, same with Fedora.

This was recorded with MangoHud for 2 minutes.

OS (Sync method, kernel package, CONFIG_HZ) Min Average Max
Arch Linux (Esync, linux, 300 Hz) 81% 91% 96%
Arch Linux (Fsync, linux, 300 Hz) 71% 86% 98%
Arch Linux (Esync, linux-zen, 1000 Hz) 21% 25% 42%
Arch Linux (Fsync, linux-zen, 1000 Hz) 36% 40% 60%
Fedora Linux (Esync, linux (Default), 1000 Hz) 20% 25% 53%
Fedora Linux (Fsync, linux (Default), 1000 Hz) 34% 38% 57%
Update: Turning off mitigations was a red herring. just noticed my CPU usage is still actually high, I'm not sure of the mechanism on why that worked before. So i did some "analysis" (through a lot of guesses/theory, lol). The actual issue lies in the difference between the kernel configuration, specifically the kernel tick rate `CONFIG_HZ`, Arch kernel `linux` has tick rate set to 300 while Fedora is set to 1000 Hz, so it seems that the game is sensitive to tick rate? 2nd issue, the game shows higher CPU usage when using Fsync (with 1000 Hz config) but not Esync, so i guess this game seems to run better with Esync. (Tried another Unity game (Subnautica) and it doesn't seem to be affected)) I did a small benchmark (only on login screen), Arch Esync with `CONFIG_HZ` of 300 has the highest of all, while `linux-zen`'s `CONFIG_HZ` of 1000 Hz shows less usage but a bit higher with Fsync enabled, same with Fedora. This was recorded with MangoHud for 2 minutes. |OS (Sync method, kernel package, CONFIG_HZ) | Min | Average | Max | |-----------------------------------------------------|-----|----------|------| |Arch Linux (Esync, `linux`, 300 Hz) | 81% | 91% | 96% | |Arch Linux (Fsync, `linux`, 300 Hz) | 71% | 86% | 98% | |Arch Linux (Esync, `linux-zen`, 1000 Hz) | 21% | 25% | 42% | |Arch Linux (Fsync, `linux-zen`, 1000 Hz) | 36% | 40% | 60% | |Fedora Linux (Esync, `linux` (Default), 1000 Hz) | 20% | 25% | 53% | |Fedora Linux (Fsync, `linux` (Default), 1000 Hz) | 34% | 38% | 57% |
ppplayer commented 2 years ago

@pureshores Thanks for sharing these. I'm on archlinux and I can confirm that CPU usage was greatly reduced after switching to an off-the-shelf linux-zen kernel. On my 4-core/8-threads CPU, the utilization went down from around 400% to 200%.

@pureshores Thanks for sharing these. I'm on archlinux and I can confirm that CPU usage was greatly reduced after switching to an off-the-shelf linux-zen kernel. On my 4-core/8-threads CPU, the utilization went down from around 400% to 200%.
pureshores commented 1 year ago
Poster

Hello @Krock, sorry for the bother.

I discovered that the patch is what actually causes this issue all along. Running an unpatched base.dll-renamed game on a non-zen kernels makes this issue disappear entirely regardless of kernel and esync/fsync configuration.

May i know if dawn does some modification that is somewhat tied to CONFIG_HZ that might "interfere" with inner workings of the game? I find it a bit odd that a change in CONFIG_HZ results in rather large utilization difference with dawn compared to unpatched game. Do you have idea as to why this happens?

Hello @Krock, sorry for the bother. I discovered that the patch is what actually causes this issue all along. Running an unpatched base.dll-renamed game on a non-zen kernels makes this issue disappear entirely regardless of kernel and esync/fsync configuration. May i know if dawn does some modification that is somewhat tied to `CONFIG_HZ` that might "interfere" with inner workings of the game? I find it a bit odd that a change in `CONFIG_HZ` results in rather large utilization difference with dawn compared to unpatched game. Do you have idea as to why this happens?
Krock commented 1 year ago
Owner

@pureshores

I proofread the patch again for mistakes but could not find any. It currently only modifies functions that are run once; unrelated to the kernel's timer frequency. If you are sure this is not caused by the deletion of the DXVK cache file, try this command (for the 3.1.0 patched DLL). If there is no difference, with that command, the issue must be elsewhere.

@pureshores I proofread the patch again for mistakes but could not find any. It currently only modifies functions that are run once; unrelated to the kernel's timer frequency. If you are sure this is not caused by the deletion of the DXVK cache file, try [this command](https://pastebin.com/raw/QJ1emw8V) (for the 3.1.0 patched DLL). If there is no difference, with that command, the issue must be elsewhere.
pureshores commented 1 year ago
Poster

@Krock

DXVK does not seem to have an affect in this, i have my state cache symlinked so i can easily restore it.

I did the dd command, here is the result:

Unpatched (Dawn not applied)

  • No modification: Driver error (expected)
  • UnityPlayer.dll is dd'ed mhypbase.dll not renamed: Game launches, high utilization (Problematic)
  • UnityPlayer.dll is not dd'ed mhypbase.dll renamed: Game launches, normal utilization (The normal behavior)
  • UnityPlayer.dll is dd'ed mhypbase.dll renamed: Game launches, normal utilization (The normal behavior)

Patched (Dawn Applied)

  • UnityPlayer.dll is not dd'ed mhypbase.dll not renamed: Game launches, high utilization (Problematic)
  • UnityPlayer.dll is not dd'ed mhypbase.dll renamed: Game launches, normal utilization (The normal behavior)
  • UnityPlayer.dll is dd'ed mhypbase.dll not renamed: Game launches, high Utilization (Problematic)
  • UnityPlayer.dll is dd'ed mhypbase.dll is renamed: Game launches, normal utilization (The normal behavior)

Based on my tests, dd did not have an effect, it seems that mhypbase.dll is the one that's causing the problem afterall (I assumed it was Dawn, my bad). Though, interestingly, this issue predates the time where mhypbase.dll was introduced in the game files.

@Krock DXVK does not seem to have an affect in this, i have my state cache symlinked so i can easily restore it. I did the `dd` command, here is the result: Unpatched (Dawn not applied) - No modification: Driver error (expected) - `UnityPlayer.dll` is dd'ed `mhypbase.dll` **not** renamed: Game launches, high utilization (Problematic) - `UnityPlayer.dll` is **not** dd'ed `mhypbase.dll` **renamed**: Game launches, normal utilization (The normal behavior) - `UnityPlayer.dll` is dd'ed `mhypbase.dll` **renamed**: Game launches, normal utilization (The normal behavior) Patched (Dawn Applied) - `UnityPlayer.dll` is **not** dd'ed `mhypbase.dll` **not** renamed: Game launches, high utilization (Problematic) - `UnityPlayer.dll` is **not** dd'ed `mhypbase.dll` **renamed**: Game launches, normal utilization (The normal behavior) - `UnityPlayer.dll` is **dd'ed** `mhypbase.dll` **not** renamed: Game launches, high Utilization (Problematic) - `UnityPlayer.dll` is **dd'ed** `mhypbase.dll` is **renamed**: Game launches, normal utilization (The normal behavior) Based on my tests, `dd` did not have an effect, it seems that `mhypbase.dll` is the one that's causing the problem afterall (I assumed it was Dawn, my bad). Though, interestingly, this issue predates the time where `mhypbase.dll` was introduced in the game files.
Krock commented 1 year ago
Owner

@pureshores My theory is that mhypbase.dll calls to a function (Wine -> Kernel userspace) that falls back to intense polling after completing a less demanding task.

Skipping mhypbase results in a crash which however is fixable by the "anti logincrash" (yet not written). What shall be done?

@pureshores My theory is that `mhypbase.dll` calls to a function (Wine -> Kernel userspace) that falls back to intense polling after completing a less demanding task. Skipping mhypbase results in a crash which however is fixable by the "anti logincrash" (yet not written). What shall be done?
Krock reopened 1 year ago
pureshores commented 1 year ago
Poster

What shall be done?

The only solution/workaround i can think of is by using a kernel with CONFIG_HZ=1000 compiled in, if the intent to keep mhypbase.dll intact. Downsides are not all distro have CONFIG_HZ=1000 by default, only Fedora does that and other distros that have low-latency/desktop oriented kernel (Arch/derivatives with Zen kernel), another is it's only effective with esync as fsync wastes a bit more CPU cycles which adds around 20% of CPU usage based on my observation. (this workaround might be outside scope of dawn)

Skipping mhypbase results in a crash which however is fixable by the "anti logincrash" (yet not written).

I think this is an option as well. Delivered as an optional patch for those who have slow CPU and don't have access to custom kernels. Downside is there might be an increased chance of ban (cause the anti-debug protection is now being bypassed too?)

> What shall be done? The only solution/workaround i can think of is by using a kernel with `CONFIG_HZ=1000` compiled in, if the intent to keep `mhypbase.dll` intact. Downsides are not all distro have `CONFIG_HZ=1000` by default, only Fedora does that and other distros that have low-latency/desktop oriented kernel (Arch/derivatives with Zen kernel), another is it's only effective with esync as fsync wastes a bit more CPU cycles which adds around 20% of CPU usage based on my observation. (this workaround might be outside scope of dawn) > Skipping mhypbase results in a crash which however is fixable by the "anti logincrash" (yet not written). I think this is an option as well. Delivered as an optional patch for those who have slow CPU and don't have access to custom kernels. Downside is there might be an increased chance of ban (cause the anti-debug protection is now being bypassed too?)
Krock commented 1 year ago
Owner

As of commit 8b7f7c0 I integrated the mhypbase renaming into the patch_anti_logincrash.sh script.

Notes for testers:

It would be nice to see some in-game CPU comparisons to judge how many systems are affected by this issue. Apparently MangoHud can do such recordings (see above).

As of commit 8b7f7c0 I integrated the mhypbase renaming into the `patch_anti_logincrash.sh` script. Notes for testers: * Using **testing accounts** for this patch is **highly advised**. * Determine your timer frequency: https://stackoverflow.com/a/13604847 It would be nice to see some in-game CPU comparisons to judge how many systems are affected by this issue. Apparently MangoHud can do such recordings (see above).
pureshores commented 1 year ago
Poster

@Krock Thank you for the patch. So far i haven't encountered any issues with it.

Update: Sometime after playing, one freeze occured. But I'm not certain if it's cause by the patch or the anti-crash patch.


Here is a graph I taken earlier to showcase real world example rather than just login screen. This is a 5 minute session recorded from the start of the program to logging in and daily commission:

I used fsync for the test.

Edit: I replaced the graph with a table

Min Average Max
patch_anti_logincrash.sh not applied 44 81 100
path_anti_logincrash.sh applied 1 48 74

As @Krock said, it would indeed great to see other's measurements for comparison.

@Krock Thank you for the patch. So far i haven't encountered any issues with it. Update: Sometime after playing, one freeze occured. But I'm not certain if it's cause by the patch or the anti-crash patch. ---------- Here is a graph I taken earlier to showcase real world example rather than just login screen. This is a 5 minute session recorded from the start of the program to logging in and daily commission: I used fsync for the test. *Edit: I replaced the graph with a table* | | Min | Average | Max | | --- | --- | --- | --- | | `patch_anti_logincrash.sh` not applied | 44 | 81 | 100 | | `path_anti_logincrash.sh` applied | 1 | 48 | 74 | ------------ As @Krock said, it would indeed great to see other's measurements for comparison.

On wine-staging-7.17 without anti-login patch at the door screen, 44% cpu usage (e.g entirety of one core). With anti-login patch at door, 28%. I have a full dynaticks kernel with low latency preempt and a 300hz timer. Very nice patch. Of a semi unrelated note genshin white screens before the door but after showing the genshin logo with any patch combination using wine-staging 7.18.

On wine-staging-7.17 without anti-login patch at the door screen, 44% cpu usage (e.g entirety of one core). With anti-login patch at door, 28%. I have a full dynaticks kernel with low latency preempt and a 300hz timer. Very nice patch. Of a semi unrelated note genshin white screens before the door but after showing the genshin logo with any patch combination using wine-staging 7.18.
Nadats commented 1 year ago

Im using the default kernel on opensuse tumbleweed which is set 250 Hz.

The CPU used is an Ryzen 5 2600.

Tested by running along the same path twice.

Tool used for recording is mangohud.

Average CPU Usage:

  • Without the patch: 39.9%
  • With the patch: 23.0%
Im using the default kernel on opensuse tumbleweed which is set 250 Hz. The CPU used is an Ryzen 5 2600. Tested by running along the same path twice. Tool used for recording is mangohud. Average CPU Usage: - Without the patch: 39.9% - With the patch: 23.0%
Krock commented 1 year ago
Owner

Thank you for the reports so far. There is indeed a remarkable difference in performance. I will let the testing phase continue and include this patch by default in the next update cycle if it proves to be safe and the CPU usage stays unreasonably high.


@servtestator123

Of a semi unrelated note genshin white screens [...] using wine-staging 7.18.

Please provide more information in a separate issue (or by email, as you prefer) so that I can write an entry in TROUBLESHOOTING.md. Please provide the crash dump & log file (*.zip by email, see FAQ.md), Wine output log (if there is no crash dump) and working Wine 7.18 builds (if known).

Thank you for the reports so far. There is indeed a remarkable difference in performance. I will let the testing phase continue and include this patch by default in the next update cycle if it proves to be safe and the CPU usage stays unreasonably high. --- @servtestator123 > Of a semi unrelated note genshin white screens [...] using wine-staging 7.18. Please provide more information in a separate issue (or by email, as you prefer) so that I can write an entry in TROUBLESHOOTING.md. Please provide the crash dump & log file (*.zip by email, see FAQ.md), Wine output log (if there is no crash dump) and working Wine 7.18 builds (if known).
jingai commented 1 year ago

Out of curiosity, is it removing the DLL or patching xlua that solves the issue? Or both..?

Out of curiosity, is it removing the DLL or patching xlua that solves the issue? Or both..?
pureshores commented 1 year ago
Poster

@jingai mhypbase.dll is what causes the high utilization. While patch_anti_logincrash.sh prevents the crash upon login due to mhypbase.dll not being present.

@jingai `mhypbase.dll` is what causes the high utilization. While `patch_anti_logincrash.sh` prevents the crash upon login due to `mhypbase.dll` not being present.
Krock commented 1 year ago
Owner

I would like to have a quick feedback from people who are (or were) testing this patch to judge its stability prior to including it by default. Thanks in advance.

Were there side-effects, or other undesirable behaviour that could be related to this patch?

I would like to have a quick feedback from people who are (or were) testing this patch to judge its stability prior to including it by default. Thanks in advance. Were there side-effects, or other undesirable behaviour that could be related to this patch?
pureshores commented 1 year ago
Poster

I've been testing the patch daily with patch_anti_logincrash.sh since it was published. Of the entire testing, only one freeze/crash happened (I'm not certain if that was caused specifically by the patch_anti_logincrash.sh). Other than that, no freezes/crashes ever happened again. It's been a flawless experience for me.

I've been testing the patch daily with `patch_anti_logincrash.sh` since it was published. Of the entire testing, only one freeze/crash happened (I'm not certain if that was caused specifically by the `patch_anti_logincrash.sh`). Other than that, no freezes/crashes ever happened again. It's been a flawless experience for me.
Nadats commented 1 year ago

Hopefully this is still useful one week late: I have been runnning the second patch for a while now and did not encounter any issues that are related to it. This goes for 3.1 as well as 3.2

Hopefully this is still useful one week late: I have been runnning the second patch for a while now and did not encounter any issues that are related to it. This goes for 3.1 as well as 3.2
Krock commented 1 year ago
Owner

Thank you for the feedbacks. The secondary patch will be included for future versions until the out-of-the-box negative CPU impact issue is no more.

Please let me know when the issue is gone - either as a reply here or by email.

Thank you for the feedbacks. The secondary patch will be included for future versions until the out-of-the-box negative CPU impact issue is no more. Please let me know when the issue is gone - either as a reply here or by email.
Sign in to join this conversation.
Loading...
Cancel
Save
There is no content yet.