前言:
现在同学们对“c语言错误代码e0020”大体比较着重,朋友们都需要学习一些“c语言错误代码e0020”的相关知识。那么小编在网摘上汇集了一些有关“c语言错误代码e0020””的相关知识,希望咱们能喜欢,你们快快来学习一下吧!什么是EDAC
EDAC是Error Detection and Correction(错误检测和校正),在计算机系统中,EDAC是一种用于检测和纠正内存中的硬件错误的技术。内存中的硬件错误可能会导致数据损坏或系统崩溃,可以通过实现EDAC在系统可以在发生错误时及时检测到并进行纠正,以确保数据的完整性和系统的稳定性。
EDAC技术通常涉及硬件和软件的协同工作。硬件部分包括内存控制器和相关的电路,用于检测内存中的错误并在可能的情况下进行自动纠正。软件部分则包括驱动程序和工具,用于监控硬件错误、记录错误信息以及通知系统管理员。
EDAC原理
EDAC的原理基本上是通过添加冗余信息来检测和纠正内存中的硬件错误1. 错误检测(Error Detection): 在存储数据时,系统会额外存储一些冗余信息,例如校验和、奇偶校验位或纠错码 当数据被读取时,系统会重新计算这些冗余信息,并与存储的值进行比较 如果检测到数据与冗余信息不匹配,系统就会意识到发生了错误2. 错误纠正(Error Correction): 对于能够纠正错误的EDAC系统,系统可以根据冗余信息中的特定算法自动纠正错误3. 故障定位和通知: 当发生错误时,EDAC系统会记录错误信息,包括错误类型、发生位置等
EDAC驱动
Linux 内核里会包含多款edac的驱动,如果在低版本的内核找不到对应的edac驱动,那就需要去高版本内核中查找对应的驱动,可以选择移植到低版本内核中或者升级系统内核版本;高低版本都找不到的edac驱动只能联系厂商,例如intel、amd
其中我用到的就是MC Driver for Intel 10nm server processors,但是4.19内核当中没有找到
i10nm_edac驱动,所以就从内核5.10中移植过来(花了好几天调通)。
驱动加载 i10nm_edac
[ 3.945980] EDAC MC0: Giving out device to module i10nm_edac controller Intel_10nm Socket#0 IMC#0: DEV 0000:fe:0c.0 (INTERRUPT)[ 3.945985] EDAC i10nm: v0.0.3
EINJ注错
EINJ(Error Injection)是一种用于内存错误注入的机制,通常用于测试和验证系统的错误处理能力。通过EINJ,用户可以模拟内存中的硬件错误,以测试系统的错误检测、纠正和处理机制是否有效。
EINJ通常包含以下几个关键元素:
错误类型(Error Type):EINJ定义了不同类型的错误,例如单位错误、双位错误、奇偶校验错误等。
错误位置(Error Location):EINJ指定了要注入错误的位置,通常是内存地址或内存模块。
注入参数(Injection Parameters):EINJ还包含了一些注入参数,例如错误严重性级别、注入次数、注入速率等。这些参数可以帮助用户控制错误注入的方式和规模。
注错驱动
root@sonic:/home/admin# insmod einj.koroot@sonic:/home/admin# dmesg[ 66.882973] EINJ: Error INJection is initialized.
注错工具
记忆中是intel提供的
root@sonic:/home/admin/test_tools# chmod +x inject_inject_ce inject_ue
EINJ表
查看EINJ表
1.cat /sys/firmware/acpi/tables/EINJ > EINJ.bin
2.iasl -d EINJ.bin
3.cat EINJ.dsl
root@bsp:/home/bsp-server# cat /sys/firmware/acpi/tables/EINJ > EINJ.binroot@bsp:/home/bsp-server# iasl -d EINJ.bin Command 'iasl' not found, but can be installed with:apt install acpica-toolsroot@bsp:/home/bsp-server# apt install acpica-toolsReading package lists... DoneBuilding dependency tree... DoneReading state information... DoneThe following NEW packages will be installed: acpica-tools0 upgraded, 1 newly installed, 0 to remove and 91 not upgraded.Need to get 900 kB of archives.After this operation, 2,570 kB of additional disk space will be used.Get:1 jammy/universe amd64 acpica-tools amd64 20200925-6 [900 kB]Fetched 900 kB in 4s (223 kB/s) Selecting previously unselected package acpica-tools.(Reading database ... 316633 files and directories currently installed.)Preparing to unpack .../acpica-tools_20200925-6_amd64.deb ...Unpacking acpica-tools (20200925-6) ...Setting up acpica-tools (20200925-6) ...Processing triggers for man-db (2.10.2-1) ...root@bsp:/home/bsp-server# iasl -d EINJ.bin Intel ACPI Component ArchitectureASL+ Optimizing Compiler/Disassembler version 20200925Copyright (c) 2000 - 2020 Intel CorporationFile appears to be binary: found 324 non-ASCII characters, disassemblingBinary file appears to be a valid ACPI table, disassemblingInput file EINJ.bin, Length 0x170 (368) bytesACPI: EINJ 0x0000000000000000 000170 (v01 HPE Server 00000001 INTL 00000001)Acpi Data Table [EINJ] decodedFormatted output: EINJ.dsl - 10805 bytesroot@bsp:/home/bsp-server# cat EINJ.dsl /* * Intel ACPI Component Architecture * AML/ASL+ Disassembler version 20200925 (64-bit version) * Copyright (c) 2000 - 2020 Intel Corporation * * Disassembly of EINJ.bin, Thu Feb 29 11:19:54 2024 * * ACPI Data Table [EINJ] * * Format: [HexOffset DecimalOffset ByteLength] FieldName : FieldValue */[000h 0000 4] Signature : "EINJ" [Error Injection table][004h 0004 4] Table Length : 00000170[008h 0008 1] Revision : 01[009h 0009 1] Checksum : DC[00Ah 0010 6] Oem ID : "HPE "[010h 0016 8] Oem Table ID : "Server "[018h 0024 4] Oem Revision : 00000001[01Ch 0028 4] Asl Compiler ID : "INTL"[020h 0032 4] Asl Compiler Revision : 00000001[024h 0036 4] Injection Header Length : 0000000C[028h 0040 1] Flags : 00[029h 0041 3] Reserved : 000000[02Ch 0044 4] Injection Entry Count : 0000000A[030h 0048 1] Action : 00 [Begin Operation][031h 0049 1] Instruction : 03 [Write Register Value][032h 0050 1] Flags (decoded below) : 01 Preserve Register Bits : 1[033h 0051 1] Reserved : 00[034h 0052 12] Register Region : [Generic Address Structure][034h 0052 1] Space ID : 00 [SystemMemory][035h 0053 1] Bit Width : 40[036h 0054 1] Bit Offset : 00[037h 0055 1] Encoded Access Width : 04 [QWord Access:64][038h 0056 8] Address : 00000000A0DCA018[040h 0064 8] Value : 0000000055AA55AA[048h 0072 8] Mask : 00000000FFFFFFFF[050h 0080 1] Action : 01 [Get Trigger Table][051h 0081 1] Instruction : 00 [Read Register][052h 0082 1] Flags (decoded below) : 00 Preserve Register Bits : 0[053h 0083 1] Reserved : 00[054h 0084 12] Register Region : [Generic Address Structure][054h 0084 1] Space ID : 00 [SystemMemory][055h 0085 1] Bit Width : 40[056h 0086 1] Bit Offset : 00[057h 0087 1] Encoded Access Width : 04 [QWord Access:64][058h 0088 8] Address : 00000000A0DCA048[060h 0096 8] Value : 0000000000000000[068h 0104 8] Mask : FFFFFFFFFFFFFFFF[070h 0112 1] Action : 02 [Set Error Type][071h 0113 1] Instruction : 02 [Write Register][072h 0114 1] Flags (decoded below) : 01 Preserve Register Bits : 1[073h 0115 1] Reserved : 00[074h 0116 12] Register Region : [Generic Address Structure][074h 0116 1] Space ID : 00 [SystemMemory][075h 0117 1] Bit Width : 40[076h 0118 1] Bit Offset : 00[077h 0119 1] Encoded Access Width : 04 [QWord Access:64][078h 0120 8] Address : 00000000A0DCA020[080h 0128 8] Value : 0000000000000000[088h 0136 8] Mask : 00000000FFFFFFFF[090h 0144 1] Action : 03 [Get Error Type][091h 0145 1] Instruction : 00 [Read Register][092h 0146 1] Flags (decoded below) : 00 Preserve Register Bits : 0[093h 0147 1] Reserved : 00[094h 0148 12] Register Region : [Generic Address Structure][094h 0148 1] Space ID : 00 [SystemMemory][095h 0149 1] Bit Width : 40[096h 0150 1] Bit Offset : 00[097h 0151 1] Encoded Access Width : 04 [QWord Access:64][098h 0152 8] Address : 00000000A0DCA050[0A0h 0160 8] Value : 0000000000000000[0A8h 0168 8] Mask : 00000000FFFFFFFF[0B0h 0176 1] Action : 04 [End Operation][0B1h 0177 1] Instruction : 03 [Write Register Value][0B2h 0178 1] Flags (decoded below) : 01 Preserve Register Bits : 1[0B3h 0179 1] Reserved : 00[0B4h 0180 12] Register Region : [Generic Address Structure][0B4h 0180 1] Space ID : 00 [SystemMemory][0B5h 0181 1] Bit Width : 40[0B6h 0182 1] Bit Offset : 00[0B7h 0183 1] Encoded Access Width : 04 [QWord Access:64][0B8h 0184 8] Address : 00000000A0DCA018[0C0h 0192 8] Value : 0000000000000000[0C8h 0200 8] Mask : 00000000FFFFFFFF[0D0h 0208 1] Action : 05 [Execute Operation][0D1h 0209 1] Instruction : 03 [Write Register Value][0D2h 0210 1] Flags (decoded below) : 01 Preserve Register Bits : 1[0D3h 0211 1] Reserved : 00[0D4h 0212 12] Register Region : [Generic Address Structure][0D4h 0212 1] Space ID : 01 [SystemIO][0D5h 0213 1] Bit Width : 10[0D6h 0214 1] Bit Offset : 00[0D7h 0215 1] Encoded Access Width : 02 [Word Access:16][0D8h 0216 8] Address : 00000000000000B2[0E0h 0224 8] Value : 000000000000009A[0E8h 0232 8] Mask : 000000000000FFFF[0F0h 0240 1] Action : 06 [Check Busy Status][0F1h 0241 1] Instruction : 01 [Read Register Value][0F2h 0242 1] Flags (decoded below) : 00 Preserve Register Bits : 0[0F3h 0243 1] Reserved : 00[0F4h 0244 12] Register Region : [Generic Address Structure][0F4h 0244 1] Space ID : 00 [SystemMemory][0F5h 0245 1] Bit Width : 40[0F6h 0246 1] Bit Offset : 00[0F7h 0247 1] Encoded Access Width : 04 [QWord Access:64][0F8h 0248 8] Address : 00000000A0DCA058[100h 0256 8] Value : 0000000000000001[108h 0264 8] Mask : 0000000000000001[110h 0272 1] Action : 07 [Get Command Status][111h 0273 1] Instruction : 00 [Read Register][112h 0274 1] Flags (decoded below) : 01 Preserve Register Bits : 1[113h 0275 1] Reserved : 00[114h 0276 12] Register Region : [Generic Address Structure][114h 0276 1] Space ID : 00 [SystemMemory][115h 0277 1] Bit Width : 40[116h 0278 1] Bit Offset : 00[117h 0279 1] Encoded Access Width : 04 [QWord Access:64][118h 0280 8] Address : 00000000A0DCA060[120h 0288 8] Value : 0000000000000000[128h 0296 8] Mask : 00000000000001FE[130h 0304 1] Action : 08 [Set Error Type With Address][131h 0305 1] Instruction : 02 [Write Register][132h 0306 1] Flags (decoded below) : 01 Preserve Register Bits : 1[133h 0307 1] Reserved : 00[134h 0308 12] Register Region : [Generic Address Structure][134h 0308 1] Space ID : 00 [SystemMemory][135h 0309 1] Bit Width : 40[136h 0310 1] Bit Offset : 00[137h 0311 1] Encoded Access Width : 04 [QWord Access:64][138h 0312 8] Address : 00000000A0DCA078[140h 0320 8] Value : 0000000000000000[148h 0328 8] Mask : 00000000FFFFFFFF[150h 0336 1] Action : 09 [Get Execute Timings][151h 0337 1] Instruction : 00 [Read Register][152h 0338 1] Flags (decoded below) : 00 Preserve Register Bits : 0[153h 0339 1] Reserved : 00[154h 0340 12] Register Region : [Generic Address Structure][154h 0340 1] Space ID : 00 [SystemMemory][155h 0341 1] Bit Width : 40[156h 0342 1] Bit Offset : 00[157h 0343 1] Encoded Access Width : 04 [QWord Access:64][158h 0344 8] Address : 00000000A0DCA09C[160h 0352 8] Value : 00007FFF00003FFF[168h 0360 8] Mask : FFFFFFFFFFFFFFFFRaw Table Data: Length 368 (0x170) 0000: 45 49 4E 4A 70 01 00 00 01 DC 48 50 45 20 20 20 // EINJp.....HPE 0010: 53 65 72 76 65 72 20 20 01 00 00 00 49 4E 54 4C // Server ....INTL 0020: 01 00 00 00 0C 00 00 00 00 00 00 00 0A 00 00 00 // ................ 0030: 00 03 01 00 00 40 00 04 18 A0 DC A0 00 00 00 00 // .....@.......... 0040: AA 55 AA 55 00 00 00 00 FF FF FF FF 00 00 00 00 // .U.U............ 0050: 01 00 00 00 00 40 00 04 48 A0 DC A0 00 00 00 00 // .....@..H....... 0060: 00 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF // ................ 0070: 02 02 01 00 00 40 00 04 20 A0 DC A0 00 00 00 00 // .....@.. ....... 0080: 00 00 00 00 00 00 00 00 FF FF FF FF 00 00 00 00 // ................ 0090: 03 00 00 00 00 40 00 04 50 A0 DC A0 00 00 00 00 // .....@..P....... 00A0: 00 00 00 00 00 00 00 00 FF FF FF FF 00 00 00 00 // ................ 00B0: 04 03 01 00 00 40 00 04 18 A0 DC A0 00 00 00 00 // .....@.......... 00C0: 00 00 00 00 00 00 00 00 FF FF FF FF 00 00 00 00 // ................ 00D0: 05 03 01 00 01 10 00 02 B2 00 00 00 00 00 00 00 // ................ 00E0: 9A 00 00 00 00 00 00 00 FF FF 00 00 00 00 00 00 // ................ 00F0: 06 01 00 00 00 40 00 04 58 A0 DC A0 00 00 00 00 // .....@..X....... 0100: 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 // ................ 0110: 07 00 01 00 00 40 00 04 60 A0 DC A0 00 00 00 00 // .....@..`....... 0120: 00 00 00 00 00 00 00 00 FE 01 00 00 00 00 00 00 // ................ 0130: 08 02 01 00 00 40 00 04 78 A0 DC A0 00 00 00 00 // .....@..x....... 0140: 00 00 00 00 00 00 00 00 FF FF FF FF 00 00 00 00 // ................ 0150: 09 00 00 00 00 40 00 04 9C A0 DC A0 00 00 00 00 // .....@.......... 0160: FF 3F 00 00 FF 7F 00 00 FF FF FF FF FF FF FF FF // .?..............root@bsp:/home/bsp-server#
安装edac-util
root@sonic:/home/admin# apt install edac-utils
注错的方法1.使用工具注错
ce注错
注错CE,BIOS设置CE的阈值是2000,所以第一次注错1900个并不会触发,第二次注错100个的时候就可以通过“edac-util -v”查看CE的注错个数,不同机器设置不一样。
root@sonic:/home/admin/test_tools# chmod +x inject_*root@sonic:/home/admin/test_tools# ./inject_ce -c 1900repeat_times = 1900allocate memory in virt addr=0x7fad9d079000 , physical addr=0x143c64000[2023-9-22 14:15:30] inject 1900 CE error to paddr = 0x143c64000 begin...[2023-9-22 14:15:41] inject 1900 CE error to paddr = 0x143c64000 finish.root@sonic:/home/admin/test_tools# ./inject_ce -c 100repeat_times = 100allocate memory in virt addr=0x7f75020d9000 , physical addr=0x135935000[2023-9-22 14:16:02] inject 100 CE error to paddr = 0x135935000 begin...[2023-9-22 14:16:02] inject 100 CE error to paddr = 0x135935000 finish.root@sonic:/home/admin/test_tools# root@sonic:/home/admin/test_tools# edac-util -vmc0: 0 Uncorrected Errors with no DIMM infomc0: 0 Corrected Errors with no DIMM infomc0: csrow0: 0 Uncorrected Errorsmc0: csrow0: CPU_SrcID#0_MC#0_Chan#1_DIMM#0: 2000 Corrected Errorsroot@sonic:/home/admin/test_tools# ./inject_ce -c 100repeat_times = 100allocate memory in virt addr=0x7f22f5384000 , physical addr=0x130454000[2023-9-22 14:16:19] inject 100 CE error to paddr = 0x130454000 begin...[2023-9-22 14:16:20] inject 100 CE error to paddr = 0x130454000 finish.
ue注错
注错UE,系统是否宕机需要看bios和os的配置
root@sonic:/home/admin/test_tools# ./inject_ue -addr=0x0003f004inject addr = 0x0allocate memory in virt addr=0x7efd8e3d7000 , physical addr=0x189bfe000[2023-9-22 14:17:37] inject 1 UE error to paddr = 0x189bfe000 begin...[2023-9-22 14:17:37] inject 1 UE error to paddr = 0x189bfe000 finish.Message from syslogd@sonic at Sep 22 14:17:39 ... kernel:[ 219.279745] mce: [Hardware Error]: CPU 12: Machine Check Exception: f Bank 1: bd80000000100134Bus error (core dumped)root@sonic:/home/admin/test_tools# Message from syslogd@sonic at Sep 22 14:17:39 ... kernel:[ 219.384566] mce: [Hardware Error]: RIP 33:<0000000000401b23> Message from syslogd@sonic at Sep 22 14:17:39 ... kernel:[ 219.455029] mce: [Hardware Error]: TSC 8efdf8d6b7 ADDR 189bfe000 MISC 86 PPIN 5d6f3e0a91d69cd8 Message from syslogd@sonic at Sep 22 14:17:39 ... kernel:[ 219.560913] mce: [Hardware Error]: PROCESSOR 0:606c1 TIME 1695363457 SOCKET 0 APIC 9 microcode 1000211root@sonic:/home/admin/test_tools# edac-util -vmc0: 0 Uncorrected Errors with no DIMM infomc0: 0 Corrected Errors with no DIMM infomc0: csrow0: 3 Uncorrected Errorsmc0: csrow0: CPU_SrcID#0_MC#0_Chan#1_DIMM#0: 4000 Corrected Errorsroot@sonic:/home/admin/test_tools#
通过/sys节点注错
查看内存地址范围root@bsp:/home/bsp-server# cat /proc/iomem | grep "System RAM"00001000-0008dfff : System RAM00090000-0009ffff : System RAM00100000-89d7c017 : System RAM89d7c018-89dade57 : System RAM89dade58-89dae017 : System RAM89dae018-89edf457 : System RAM89edf458-89ee4fff : System RAM89f16000-8a02efff : System RAM8a037000-8a752fff : System RAM8a754000-94695fff : System RAM95733000-95736fff : System RAM9697a000-9e98a017 : System RAM9e98a018-9e9bbe57 : System RAM9e9bbe58-9e9bc017 : System RAM9e9bc018-9e9ede57 : System RAM9e9ede58-9e9ee017 : System RAM9e9ee018-9ea1fe57 : System RAM9ea1fe58-9ea20017 : System RAM9ea20018-9ea82857 : System RAM9ea82858-9ea83017 : System RAM9ea83018-9eae5857 : System RAM9eae5858-9eae6017 : System RAM9eae6018-9eaf0c57 : System RAM9eaf0c58-9eaf1017 : System RAM9eaf1018-9eaf9057 : System RAM9eaf9058-a0179fff : System RAMa347a000-a347afff : System RAMa34fc000-afffffff : System RAM100000000-203fffefff : System RAMcd /sys/kernel/debug/apei/einj/支持的注错类型cat available_error_type0x00000008 Memory Correctable0x00000010 Memory Uncorrectable non-fatal0x00000020 Memory Uncorrectable fatal注错类型echo 0x8 > error_type内存地址掩码echo 0xfffffffffffff000 > param2# 内存地址echo 0x32dec000 > param1# 写入0x0,若为1,则会跳过触发环节echo 0x0 > notrigger# 注错echo 1 > error_inject
标签: #c语言错误代码e0020