Ethernet Hardware Error Intel e1000e Centos 7 “jiffies”

Last Updated on

So i came upon this after happily switching to Centos 7 from Ubuntu (cosmic cuttlefish)  thinking many of the issues i had been facing were due to my chosen distribution.  Well, that was clearly a novice mistake because every distro has different issues that take a while to massage away….especially with non standard / custom hardware (to an extent).  I gave up on the most recent Ubuntu when my attached USB drives caused the latest kernel to crash on boot;  significant “research”  couldn’t alleviate that issue so i switched to Centos 7….and later migrated the same configuration to new hardware.

While breaking in the new platform i started noticing intermittent disconnections from my remote VDI sessions to an external host.  Well, that was just unacceptable so after a sufficient amount of profanity and googling….i was able to isolate and identify the issue.  The log below is what pointed me in the right direction toward a resolution, specifically relating to my e1000e network adapter…

1
2
3
4
Feb 15 17:52:05 nuuk kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:#012 TDH <88>#012 TDT #012 next_to_use #012 next_to_clean <85>#012buffer_info[next_to_clean]:#012 time_stamp <1000a93a1>#012 next_to_watch <88>#012 jiffies <1000a99b4>#012 next_to_watch.status <0>#012MAC Status <80083>#012PHY Status <796d>#012PHY 1000BASE-T Status <3c00>#012PHY Extended Status <3000>#012PCI Status <10>
Feb 15 17:52:07 nuuk kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:#012 TDH <88>#012 TDT #012 next_to_use #012 next_to_clean <85>#012buffer_info[next_to_clean]:#012 time_stamp <1000a93a1>#012 next_to_watch <88>#012 jiffies <1000aa184>#012 next_to_watch.status <0>#012MAC Status <80083>#012PHY Status <796d>#012PHY 1000BASE-T Status <3c00>#012PHY Extended Status <3000>#012PCI Status <10>
Feb 15 17:52:09 nuuk kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:#012 TDH <88>#012 TDT #012 next_to_use #012 next_to_clean <85>#012buffer_info[next_to_clean]:#012 time_stamp <1000a93a1>#012 next_to_watch <88>#012 jiffies <1000aa954>#012 next_to_watch.status <0>#012MAC Status <80083>#012PHY Status <796d>#012PHY 1000BASE-T Status <3c00>#012PHY Extended Status <3000>#012PCI Status <10>
Feb 15 17:52:11 nuuk kernel: e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:#012 TDH <88>#012 TDT #012 next_to_use #012 next_to_clean <85>#012buffer_info[next_to_clean]:#012 time_stamp <1000a93a1>#012 next_to_watch <88>#012 jiffies <1000ab124>#012 next_to_watch.status <0>#012MAC Status <80083>#012PHY Status <796d>#012PHY 1000BASE-T Status <3c00>#012PHY Extended Status <3000>#012PCI Status <10>

…jiffies…sure that brings to mind Chelsea, Michigan and muffins that taste like a prophylactic stuffed with sawdust…if someone could explain why that shows up in an error log, i would love to know!

And some additional error output from syslog:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Feb 15 17:52:11 nuuk kernel: CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 3.10.0-957.5.1.el7.x86_64 #1
Feb 15 17:52:11 nuuk kernel: Hardware name: LENOVO 10AXS0C900/SHARKBAY, BIOS FHKT46AUS 05/16/2014
Feb 15 17:52:11 nuuk kernel: Call Trace:
Feb 15 17:52:11 nuuk kernel: [] dump_stack+0x19/0x1b
Feb 15 17:52:11 nuuk kernel: [] __warn+0xd8/0x100
Feb 15 17:52:11 nuuk kernel: [] warn_slowpath_fmt+0x5f/0x80
Feb 15 17:52:11 nuuk kernel: [] dev_watchdog+0x248/0x260
Feb 15 17:52:11 nuuk kernel: [] ? dev_deactivate_queue.constprop.26+0x60/0x60
Feb 15 17:52:11 nuuk kernel: [] call_timer_fn+0x38/0x110
Feb 15 17:52:11 nuuk kernel: [] ? dev_deactivate_queue.constprop.26+0x60/0x60
Feb 15 17:52:11 nuuk kernel: [] run_timer_softirq+0x24d/0x300
Feb 15 17:52:11 nuuk kernel: [] __do_softirq+0xf5/0x280
Feb 15 17:52:11 nuuk kernel: [] call_softirq+0x1c/0x30
Feb 15 17:52:11 nuuk kernel: [] do_softirq+0x65/0xa0
Feb 15 17:52:11 nuuk kernel: [] irq_exit+0x105/0x110
Feb 15 17:52:11 nuuk kernel: [] smp_apic_timer_interrupt+0x48/0x60
Feb 15 17:52:11 nuuk kernel: [] apic_timer_interrupt+0x162/0x170
Feb 15 17:52:11 nuuk kernel: [] ? hrtimer_start_range_ns+0x1ed/0x3c0
Feb 15 17:52:11 nuuk kernel: [] ? cpuidle_enter_state+0x57/0xd0
Feb 15 17:52:11 nuuk kernel: [] ? cpuidle_enter_state+0x4d/0xd0
Feb 15 17:52:11 nuuk kernel: [] cpuidle_idle_call+0xde/0x230
Feb 15 17:52:11 nuuk kernel: [] arch_cpu_idle+0xe/0xc0
Feb 15 17:52:11 nuuk kernel: [] cpu_startup_entry+0x14a/0x1e0
Feb 15 17:52:11 nuuk kernel: [] rest_init+0x77/0x80
Feb 15 17:52:11 nuuk kernel: [] start_kernel+0x44b/0x46c
Feb 15 17:52:11 nuuk kernel: [] ? repair_env_string+0x5c/0x5c
Feb 15 17:52:11 nuuk kernel: [] ? early_idt_handler_array+0x120/0x120
Feb 15 17:52:11 nuuk kernel: [] x86_64_start_reservations+0x24/0x26
Feb 15 17:52:11 nuuk kernel: [] x86_64_start_kernel+0x154/0x177
Feb 15 17:52:11 nuuk kernel: [] start_cpu+0x5/0x14
Feb 15 17:52:11 nuuk kernel: ---[ end trace 9c76f7ff07fb727a ]---
Feb 15 17:52:11 nuuk kernel: e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly

This behavior i did not experience under any flavor of Ubuntu using the same hardware. For my use case scenario, this is simply unacceptable as connections were dropped and multi-factor authentication (RSA-Key)  is always required to reconnect.

The Remedy:

1
vi /etc/rc.local

In this case, simply add a line to your /etc/rc.local  with the following line: ethtool -K eno1 gso off gro off tso off and bounce the box.

Mine looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.

touch /var/lock/subsys/local

ethtool -K eno1 gso off gro off tso off

The Impact?

No clue. this is one of those solutions that i fell ass backwards into and it happened to work – not familiar with this command string or its effect.  I have read others reporting performance and other issues while leveraging this solution.  In my scenario, everything runs fine and my connection is no longer intermittently interrupted.  Everything works and I happen to like things that work.  Any negatives to this solution?  Let me know below.

If this doesn’t run at startup, make sure you have enabled your rc.local 

1
2
chmod +x /etc/rc.d/rc.local
systemctl enable rc-local
Lima

About the author

Lima is the visual nautical indicator for "stop instantly."

Comments

Leave a Reply