Solved Fixing an Unbootable Windows 11 OS


One thing you can do is mount the D: drive's SYSTEM registry hive, and check if the Service was actually removed. I don't have a PC with the Intel CPS feature, so I'm going by a 3rd-party site's ZIP copy of the extracted files.

reg load HKLM\TEMP D:\Windows\System32\config\SYSTEM
regedit

Now search under HKLM\TEMP\SYSTEM\ControlSet001\Services. Look for "INTCCoSvc". If it's there, delete that whole subkey.

Exit regedit
reg unload HKLM\TEMP
 

My Computer

System One

  • OS
    Windows 7
I removed the whole subkey and the DriverEntry failed 0xc0020035 error no longer appears!

The system however still does not boot........ So I'm now I'm a bit out of ideas.

I'm currently going through the Microsoft Debugging help articles to hopefully find a way to determine what's going wrong in the kernel.
 

My Computer

System One

  • OS
    Windows 11
    Computer type
    PC/Desktop
    CPU
    AMD Ryzen 7 7800X3D
    Motherboard
    ASUS X670E-E
    Memory
    Trident Z5 Neo RGB, DDR5-6000 CL30-38-38-96, 2x16GB
    Graphics Card(s)
    EVGA RTX 3060 XC
    Monitor(s) Displays
    Samsung 34" Odyssey OLED G8 & Dell U2518D
    Screen Resolution
    3440 x 1440 & 2560 x 1440
    Hard Drives
    Samsung 990 Pro 2TB & Samsung 850 Evo 1TB
    PSU
    Corsir RM1000x
    Case
    Fractal North
    Cooling
    EK-Nucleus AIO Lux CR240 D-RGB
    Internet Speed
    250Mbps/25Mbps
Ok so I’m back and I can happily say that after another month or so of tinkering around in the kernel, I have finally fixed the problem! After my last message I was pretty much out of ideas, so as a bit of a last-ditch effort I decided to see if ChatGPT could guide me through the debugging of the core processes of the window boot sequence. Overall, it proved to be extremely useful in this task. Below is a summary of the process that was taken to diagnose the core issue. Obviously, I made many mistakes, encountered numerous dead ends and red herrings in this process, but I won’t include them for the sake of brevity and to keep my self-esteem intact haha.

The first thing I did was to record the sequence of user processes loaded in safe mode by the broken system and then compare that the to the sequence loaded by a separate working system of the same windows version. To identify when a process is loaded it, was recommended to set a breakpoint at the function “NtCreateUserProcess” (command: bp NtCreateUserProcess) and every time it fired to dump out the ‘user process parameters’, which were located at register offset rsp+0x48. To dump the parameters I used the command, dt _RTL_USER_PROCESS_PARAMETERS poi(@rsp+0x48). This would display all the parameters of the process about to be loaded, the key part in my case being the CommandLine parameter, which displayed the path of the process being loaded. E.g. CommandLine : _UNICODE_STRING "\SystemRoot\System32\winlogon.exe", indicated that winlogon was being launched by the NtCreateUserProcess function.

Through this method I discovered that one of the last processes to be launched by the broken system was winlogon.exe. Winlogon was then supposed to launch LogonUI.exe, but this never occurred. After a lot of ferreting around in the Winlogon process, I narrowed the problem down to a function called winlogon!WinMain and disassembled it. Within it I found that the call to launch LogonUI (winlogon!StartLogonUI) was never reached. To disassemble a user process function, I first had to switch from kernel mode debugging to user mode debugging of the specific process that the desired function was within. To do this, after the break point fired at the ‘NtCreateUserProcess’ function that was going to launch the desired process (in this case winlogon), I let the NtCreateUserProcess function complete (in my case I lazily stepped through it using the p command) so that the process will have launched to an extent where it would appear on the process list. To display the process list, I then used the command: !process 0 0. This displays a list of all the running processes, the most recent one to be launched appears at the bottom of the list. Along with the name of the processes running, it also gives their Process ID, which can be used to user mode debug that process or set a breakpoint within it. Unfortunately, at this point, the process has not sufficiently loaded into memory for its symbols to be loaded into the debugger. These symbols are labels for the memory addresses of all the components of the process and are required to make sense of what the process is doing. To let the process run for long enough that the symbols can be loaded, I arbitrarily decided to let it run to the point where it launched its first thread. To do this I set a breakpoint at the function which did this, named nt!KiStartUserThread. To set this break point within a specific process (with a unique Process ID of ‘ffffb58b909e3080’ for example), I used the command bp /p ffffb58b909e3080 nt!KiStartUserThread. When this fires, the process is at a point where its symbols can be loaded. To switch to debugging this process in user mode and to load its symbols I used the following command: ‘.process /p /r ffffb58b909e3080’. After doing so I could set a breakpoint at the function ‘winlogon!WinMain’ and disassemble it to determine what was contained within the function and what path the system took through it. To disassemble this function, I used the command ‘uf winlogon!WinMain’.

Instead of launching LogonUI, the winlogon!WinMain function would hang after calling another function named ‘winsta!_WinStationWaitForConnect’. From here, winlogons stack trace could be used to determine what the process was waiting for. One of the ways to view the stack trace of a process is with the following command, ‘!process ffffe606abbe4080 17’ (for this example I used the Process ID of the process in the example screenshot below). An example of the stack trace at this point is shown below.

Winlogon.webp

Here (with help from ChatGPT) it could be seen that the process was waiting for the LSM (Local Session Manager) service to start. The function in the stack that made this obvious was the one called ‘winsta!WaitForLsmStart’ which is above the previously mentioned ‘winsta!_WinStationWaitForConnect’, itself above ‘winlogon!WinMain’. Now I had to determine how LSM was started and what was preventing it from doing so.

After a bit of a debate with ChatGPT, in which it insisted that LSM was a process itself, it was determined that while this use to be the case, in recent versions of windows LSM became a dll that is hosted by a svchost process. So, the next thing that needed to be identified was the svchost that hosted LSM. This part was a little tricky but eventually I came up with the following method. Svchosts are launched by the services.exe process, so by using the same method that I used above for winlogon, I got to the point that services.exe was launched and obtained its Process ID. From there I was able to set a breakpoint for NtCreateUserProcess within the process (bp /p <Process ID> NtCreateUserProcess) and identify whenever it launched a svchost. Again, I let the NtCreateUserProcess function complete and then switched to user mode debugging of the process. Although this time, I then dumped out its Process Environment Block using the !peb command (!peb <Process ID>). This again showed the command line for the process which indicated which svchost was being launched. After conducting this procedure on the working system, I determined that I was looking for the svchost with the following command line: 'C:\Windows\system32\svchost.exe -k DcomLaunch -p -s LSM'. However, this process never appeared on the broken system. From here, the next thing to work out was where in the services.exe process, it failed to launch the svchost that hosts LSM.

Looking at the working system, I found that the LSM svchost was the third svchost to launch. Before it was ‘svchost.exe -k DcomLaunch -p’ and then ‘svchost.exe -k RPCSS -p’. On the broken system I found that ‘svchost.exe -k DcomLaunch -p’ did actually launch, but the RPCSS (Remote Procedure Call System Service) and LSM svchosts didn’t. Since the RPCSS svchost was supposed to lunch before the LSM one, I decided to interrogate that one first. Looking at the working system, I switched to user mode debugging of services.exe and recorded the stack trace of the process at the point it launched the RPCSS svchost. This can be seen below.

Services1.webp

I then switched back to the broken system and used the above as a map to see how close the services process got to the NtCreateUserProcess function (which I knew never got called). By setting a series of breakpoints on the functions listed in that stack, I discovered that it got as far as the function right below NtCreateUserProcess, called KERNELBASE!CreateProcessInternalW. I then disassembled that function to determine at which point it failed.

Within CreateProcessInternalW is a call to launch another function called KERNELBASE!GetFileAttributesW, which is (as the name suggests) where the process attempts to retrieve the attributes of the process it is about to launch. After this point, I found that the broken system takes a different path through the CreateProcessInternalW function than the working system. With a lot of help from ChatGPT I worked out why it diverged at this point. Firstly I had to determine what file it was trying to get the attributes for. To do this, at the point that the GetFileAttributesW function starts, I was instructed to dump the value at register offset rsp+258 with the command ‘du poi(@rsp+258)’. This returned a result of, ‘C:\windows\system32\svchost.exe’, which makes sense. Next, I needed to determine if the system gave any indication as to why the function failed. I found that after GetFileAttributesW completes, it assigns a value to the eax register, which CreateProcessInternalW uses to determine which path to take through itself next. At that point I dumped out the value of eax and found it to be “0xFFFFFFFF”, this value indicates a failure of GetFileAttributesW. This value directs CreateProcessInternalW to take an alternate path which attempts to resolve the problem by retrying certain elements of the function. Below is an excerpt of the disassembled CreateProcessInternalW function showing the area where GetFileAttributesW is launched (call), eax is compared (cmp) to 0xFFFFFFFF, followed by a jump to another part of the function if eax is equal (je) to 0xFFFFFFFF.

CPI1.webp
However, a value of 0xFFFFFFFF doesn’t exactly tell me what caused the GetFileAttributesW function to fail, so to determine what exactly went wrong I needed to disassemble it. Within GetFileAttributesW, another function named ‘NtQueryAttributesFile’ is called. When this function is launched, the stack trace appears as below.

Services2.webp
After running, NtQueryAttributesFile assigns a value to register eax to indicate whether it failed or not. This value is then later converted to 0xFFFFFFFF within GetFileAttributesW so it can be interpreted by CreateProcessInternalW. The value that NtQueryAttributesFile put into eax was c0000022. ChatGPT was very kindly and instantly able to decode this to mean STATUS_ACCESS_DENIED.

This was a massive clue and had me instantly checking the permission of the svchost.exe file. Here I discovered that it only had administrator (full control) permission and a couple of inherited user permissions from when I was accessing the folder from the working system. This was far less than the NT SERVICE\TrustedInstaller, BUILTIN\Administrators, NT AUTHORITY\SYSTEM, BUILTIN\Users, ALL APPLICATION PACKAGES and ALL RESTRICTED APPLICATION PACKAGES permissions that were granted to the svchost of the working system. I applied the correct permissions to the file and reinspected the same functions within the services.exe process. I found that register eax was no longer set to c0000022 and then 0xFFFFFFFF like before. The CreateProcessInternalW function then ran identically to that of the working system, NtCreateUserProcess was called and the RPCSS svchost started! This was looking great!..... However, svchost failed shortly after and the LSM svchost never started.

Having found a permission misconfiguration with one file, led me to check the configuration of other files withing the windows folder. This led me to discover that almost all files and folders within the windows directory had lost most of their required permissions. In most cases only NT AUTHORITY\SYSTEM: (F) and BUILTIN\Administrators: (F) remained. Most folders on the drive were also affected, with Program Files strangely being the only folder untouched. From here began the long process of reapplying permissions to most of the files on my drive. Fortunately this wasn’t as arduous as it sounds, as most of the files inherited their permissions from their parent folder and a lot could be copied from my working system. For this process I used the command prompt icacls commands, a description of which can be found here. Most useful were the /grant, /remove, /inheritancelevel, /save and /restore commands. However, command prompt didn’t have the required authority to modify some of the files. To overcome this problem I started by using the takeown command, but this created a problem where I ended up having multiple files with the incorrect ownership. A better solution was to run command prompt as Trusted Installer rather than Administrator. To do this I used a program called Advanced Run, which can be found here. With the permissions of most of the core files corrected, I attempted to boot again. This time it booted successfully!!! I was able to log on, however I was greeted with several error messages and core features like the start menu weren’t functional. Using the Process Monitor utility (which is found here), I was able to filter its displayed events to those that resulted in access denied. To do this I added the following filter: 'Result | contains | DENIED | then Include'. This showed me exactly what file or folder the faulting processes were trying to access, so with this information I could specifically target those file's or folder's permissions.

I am now at a point where my system is 90% functional and I’m just mopping up rare errors as they arise. It’s been a fascinating journey and I’ve learnt an incredible amount along the way. If anyone has any questions or if I haven’t explained something very well, I’m happy to elaborate further upon this post. My method of finding the problem through the live debugging of the system is probably far from the cleanest or most correct way of doing it, but it was the best that I could do from what I was able to learn in the time I had. The problem seems so obvious now and I’m surprised that the checking of permissions didn’t come up earlier in my trouble shooting process, or in any of the guidance I could find online. After such a long process to fix this problem, I’m astonished that I reached this outcome at all. I couldn’t have done it without all the knowledgeable people on forums like this, the wealth of information available on the Microsoft Learn website, a working system to to use as a comparison and ChatGPT, which turned out to be an amazing resource, able to guide me through areas like kernel debugging, where there isn’t a lot of guidance available online.

Finally, I would be interested to know if anyone has a theory as to how my permissions were corrupted in the first place. ChatGPT reckons it could have been an offline DISM/SFC operation on the wrong volume or a Chkdsk/NTFS corruption recovery, but I’m not sure how this could cause my problem or what it really means by this. Any input would be appreciated, so I can take steps to prevent it from happening a second time.

Thanks again!
 

My Computer

System One

  • OS
    Windows 11
    Computer type
    PC/Desktop
    CPU
    AMD Ryzen 7 7800X3D
    Motherboard
    ASUS X670E-E
    Memory
    Trident Z5 Neo RGB, DDR5-6000 CL30-38-38-96, 2x16GB
    Graphics Card(s)
    EVGA RTX 3060 XC
    Monitor(s) Displays
    Samsung 34" Odyssey OLED G8 & Dell U2518D
    Screen Resolution
    3440 x 1440 & 2560 x 1440
    Hard Drives
    Samsung 990 Pro 2TB & Samsung 850 Evo 1TB
    PSU
    Corsir RM1000x
    Case
    Fractal North
    Cooling
    EK-Nucleus AIO Lux CR240 D-RGB
    Internet Speed
    250Mbps/25Mbps

Latest Support Threads

Back
Top Bottom