Microsoft Azure AZ-801 — Section 19: Troubleshoot Windows Server virtual machines in Azure

Microsoft Azure AZ-801 — Section 19: Troubleshoot Windows Server virtual machines in Azure

112. Troubleshoot deployment failures

I’d like to get in the concepts now of troubleshooting deployment failures in the Windows environment.

So, first off, let’s make sure we understand the installation. The real key to understanding how to troubleshoot a deployment installation of Windows is to understand a little bit about the different phases and the fact that it generates logs during these different phases.

So, essentially what happens here is you let’s say you format the hard drive and then the setup runs. It’s going to install usually into the C :/windows folder, right? And then from there, it’s going through these different phases. And Windows is running in what’s called Windows P, P is the pre installation environment, which is essentially just a command line version of Windows, a very lightweight version of Windows that’s going through this deployment. If you look at the little graphic at the bottom here, you’ll see that first thing that happens obviously is bias.

Now, it says BIOS on the screen, but it could be UEFI, which is more accurate with today’s this day and age. Basic input output system has pretty much been replaced usually by UEFI Unified Extensible Firmware interface. But anyway, the very first thing that happens is set up specialize and of course that gets into where Windows is going to be installed. And you’ve chosen the version of Windows you want and a lot of Windows installations will have different versions you can choose from in certain distributions. And so you’ve chosen all of that. And if you’ve used an in answer file, because you can use what’s called an answer file, the answer file lets you turn things on and turn things off during this specialized phase as well. I might want to have a specific thing disabled or enabled right out of the gates here, and you can do that. What’s called an answer. Microsoft has a toolkit. You can get called Windows eight K, Windows Assessment, Deployment Toolkit, and there’s a tool called SIM, the System Image Manager tool that lets you edit what’s called an answer file, unattended XML file. And this little XML file can turn things on, turn things off. That’s why they call this the specialized phase, because that would be happening in that instance if you weren’t just installing write off of a DVD or flash drive or just a download and you were actually specializing things. Those are the logs you’ll see there. And I’ll talk about the logs more in just a moment. But those are the logs that would get generated from there. It’ll do a reboot and at that point it’s going to start getting it getting into the Out-of-box experience, which is where a user would get to set certain criteria, like, for example, the keyboard layout, language settings that they want and they can turn certain things on, turn certain things off during this Out-of-box experience phase. And then from there it’ll do another reboot and you’ll get to the log on UI, which is where the user can officially log on. And of course it goes through the OEM first run. This is where if the machine was not activated it would try to get activated and all of that, if there’s any issues there, would generate logs for that. And then of course, you come to the desktop. All right. Which is the standard environment where whoever is sitting in front of the computer would be able to use the machine, use windows however they want.

So, talking about these logs, there’s various logs here that you could analyze if there was a deployment problem, the first one being the set up act.log and you can see here, this is the primary log file that’s going to grab just about every type of error that can occur. All right. Depending upon where you’re at in the installation and. And you can see there that it shows you the path in which that usually goes, usually gets deployed, the Windows folders there.

So, you can go into that file. If you did have a deployment failure, you could you could edit that file and try to get down to the issue of what might be going on. If you’ve got a fresh, hard drive, it’s going to be pretty rare that you’re going to get a deployment failure anyway. Deployment failures happen a lot of times when you’re trying to do upgrades or they happen because you’ve got something malfunctioning with a piece of hardware, like a hard drive that’s got a problem with it or something like that. But at least these errors can. These logs can help you. Then you got the setup error log file. That’s another high level list of errors. Tells you where that would be located at. Down at the bottom you have the setup apical offline. And this is this is for driver related issues.

So, if the problem you’re having relates to drivers, then more often than not, this log here is going to assist you with that. The next log would be CBS on a ten-point log. This is if you are doing an unattended installation using the answer file I was mentioning a moment ago. If it’s related to that, it would be logged there and you got the setup API level log. This is another driver related one, but it would involve the out-of-box experience.

So, you may be aware that in Windows. You have mission critical drivers and then you have what are called optional drivers.

So, the mission critical drivers are the drivers that get loaded first. And that would be more related to this. This set up API for offline log file. But then when you get to the Out-of-box experience, that’s where this set up API dev log file would come in.

So, would be more optional oriented drivers as opposed to mission critical. The next log mentioned there would be sessions.xml. This is actually a transaction log that’s monitoring. Basically tracks every activity that’s going on during the deployment.

So, this is a good one for just figuring out if something stopped or if you were imaging the server or the client operating system or whatever it is, because again, this is all this is related to both servers and clients, not just servers and not just clients. If you are imaging the machine, then this session’s XML file would tell you if there was any imaging underlay related problems. Like for example, you’ll see disk there. The deployment imaging and servicing managing management tool, which is used for the deployment of images. And then the last thing you’ll see there is you’ll see a log file called the CBS.log. All right. And so this log, this is a servicing log, and it refers to the component based servicing. That’s what CBS stands for. But this is another servicing base log. And this is going to involve the different services that are starting up. If you’ve got service failures, a service starts or a service fails to start, you would you would see that. And this is also it’s also mostly oriented towards imaging as well. So, if you’re imaging the machine, this log would play a major role in that and displaying any kind of errors.

Last thing here to be aware of, one of the things that was introduced a few years ago as originally introduced with Windows 10, but it’s not just for Windows 10. It supports the new all the newer operating system as well as a server operating systems is you have something called setup diag.

Now, the setup diag. Is built into the windows setup Now set up, set up diag see however you can download it, you go out there and do a google search or a being searched for download setup diagnostics and you’ll find you can download it directly and run it yourself. But if Windows is having problems getting deployed and it detects that it’s not that it’s generating errors and it’s not able to continue, then it will automatically run the windows setup diag the setup diagnostics, and it will try to resolve based on the errors that it’s finding. It’ll automatically try to resolve what’s going on. But at the bare minimum, if you were to run this yourself, you go to a command prompt and download this and run it yourself. There is some documents out there on the internet that can get into some of the parameters you could use for troubleshooting some of these errors that a deployment might have. All right. Not really anything there. I would say you really need to memorize. I mean, here again, if you were doing this in the real world, you could look it up, but nothing there. I would say you didn’t memorize, but I would just remember that there is a command called setup, which is built into windows, but you can download it directly and run it if you want. So, if you are having an issue with deployment, this can be a good tool that could be used for troubleshooting. All right.

Those are the various things that I would I would recommend being aware of when it comes to troubleshooting a deployment.

113. Troubleshoot booting failures

I’d like to get in the concepts now of troubleshooting boot problems with Windows Server. All right.

The first thing to be aware of is that if Windows Server is not booting correctly, it will automatically boot into a mode called Windows Recovery environment. All right. That can help you with troubleshooting. Now, you can also trigger this mode Windows recovery environment mode by holding down the shift key and then right clicking start shut down and then restart. Although in Hyper-V you. You know, if you if you hold down shift and try to restart it, it won’t trigger. And that’s just the way it is with Hyper-V and these VMs. What I can do is I’ll shut down my VM here, NYC-SVR1. All right.

Keep in mind holding down shift when you start, the VM does not do anything that does not achieve anything. You’d have to be rebooting for this to happen. But here’s the other thing. If a VM attempts to start twice and it will not start successfully on the third attempt, it will automatically take you into the recovery environment.

Some of us remember back in the day when you could hit F eight and all that. Now, that’s been gone for a long time. So, I’m going to hit start, but I’m going to force this VM2 fail at booting. So, here we go. We’re starting it. All right. So, I’m going to let it let it get started here. And as soon as the color changes on the screen, I’m going to turn it off. All right. So, still in the process of starting. All right. I’m going to go ahead and close that. I’m going to go here and say, turn off. All right. So, that’s the first failure. It tried to boot and it failed. Let’s try it again.

So, Start. We’re going to do the exact same thing. And again, the purpose here is to kind of simulate a boot failure, right? So, here we are. It’s trying to boot screen is going to change to blue. It’s going to ask me to change my resolution. All right. I didn’t ask that time. That’s fine. All right. Turn off. It’s failed a second time.

So, on this third attempt, it should boot us into the Windows recovery environment. All right. As you can see, it is that little bar that just flashed across the screen is letting us know that it’s detected, that it’s failed already twice. All right. So, that’s going to bring us into the Windows recovery environment.

Let’s see if we can make this a little bit bigger on the screen. And. Yeah, let’s do this view and we’ll do Zoom letter level at 150. Let’s try. 200%. There we go. So, right here, I’m going to click see Advanced Repair Options. And I could continue to boot. I could say use a device like USB, or I could troubleshoot, turn off the PC. Obviously, I want to troubleshoot because that’s what we’re talking about here.

The first thing I’ve got here is a start-up settings change, windows start up behavior, and if you did that, that’ll let you boot into safe mode. Safe mode with command, prompt driver, signature enforcement, disable auto restart, all of that stuff.

So, if I went into safe mode, I could get into safe mode command prompt and I could run commands if I needed to run commands I can do enable low resolution video mode. That’s a that’s if you’ve got a graphic problem for some reason, maybe, there’s been a bad driver or something it’s installed or, maybe, your resolution is too high for the monitor and the screen is good. Screen goes dark every time you boot. You could boot in low resolution mode to fix that. Debugging mode generates a massive debug log that would be used if you’re on the phone with Microsoft trying to troubleshoot a problem with their tech support, you can do boot logging, which is something I mainly wanted to talk about here. Boot logging. It generates a log while you’re booting that could help you troubleshoot why you’re unable to boot. There’s safe mode driver signature enforcement disabled that would be used in a situation where you wanted to disable or you wanted to install a driver that was not digitally signed. Disable the early anti malware. That right. There is another reason you could you might not be able to boot if there’s some type of malware that’s being detected or something that it’s detecting is malware, maybe, it isn’t malware. Then you could disable that and then you could disable the automatic restart on system failure. That would be like a blue screen of death occurs and, maybe, it’s flashing on the screen too fast and you’re not able to see it. And every time you boot your server up at blue screens of death, you could you could disable the automatic failure so that the automatic reboot.

So, that way you’d have time to look at it. All right. You’ve you could do system image recovery if you have a backup so you can use the backup tool that Windows provides you can actually back up it is an image and you could do a restores an image. You have UEFI firmware settings. That’s if you’ve got something involving your unified extensible firmware interface, you need to fix that. And then of course there is command prompt, which gives you the ability to run various commands. All right. Like BCD. Edit all of that boot rack. And so what is all this? So BCD edit is a command that will show you what your boot configuration data is. It’ll show you what your operating system is being booted to. Right now, there isn’t much here you’d have to really mess with if something somebody went in and edit and screwed something up, you could go in and use BCD edit to fix it. Mostly the times that you do that would be like if you did some kind of a dual boot which where you’re booting to operating systems. That’s very rare that anybody does that nowadays because of virtual machines. But most importantly though, I would say if there’s anything here I could get you to remember, it would be this right here boot rack. This command is only available in this mode. You can’t run this in normal command prompt, but this is going to fix certain problems. Like if you if you’ve got an MBR problem, let’s say you’re booting up and Windows gives you a message that says MBR is missing or corrupted, you could run boot req/fix MBR and that’ll fix that problem.

Now, if Windows boots up and says that your boot sector is missing or corrupted. You can run boot, /, fix boot if windows boots up and is telling you that there are issues with a windows boot file, then you can try scan OS. And then lastly, if it says that your BCD, your boot configuration data is corrupted, you can run/B, rebuild BCD and that’ll fix your problem.

So, I definitely would recommend knowing the switches right here. The last thing too, is there’s a command called check disk. This one you might be very familiar with. This is an oldie but a goodie. As I like to say here, the check disk command can tell you if there’s any bad sectors in your environment.

So, if, maybe, the machine wasn’t shut down properly, something could get corrupted. And I’ve had I can’t tell you over the years I’ve had this happen numerous times where somebody shut a machine down properly. I don’t care if it’s a client operating system or a server, it can. You can get corrupted sectors and bad sectors will cause the machine not to boot. At that point you can run check disk.

Now, in order for it really to work, you have to use check disk /f in order to do that. So, just keep that in mind. In order for that to work. Now, I’ll also tell you that if things are really bad and you could not get into Windows recovery environment like we are here, if you couldn’t boot into this mode, you can if you got Windows on a DVD or a USB drive or something, you could boot off that DVD or USB drive and you can get into this mode that way as well.

So, that’s another way you can get in for troubleshooting in this mode. And I’ve had to do that before. In fact, the time I had to, I had to fix somebody’s computer a few years ago. And again, this it doesn’t matter if you’re talking about server or client operating system, but it was a bad sector problem so the computer wouldn’t boot it literally said operating system can’t be found and I fixed it by booting off of the media and then running the check disk/F command. And then it took about 30 minutes to fix the bad sectors, but it did fix them and then everything was back up and running like it’s supposed to be. All right.

So, those are your various options for troubleshooting boot problems in a Windows Server environment.