Tag Archives: troubleshooting

Error Draining Roles in Hyper-V 2012 R2 Failover Cluster

I was trying to do some maintenance on my Hyper-V 2012 R2 Failover Cluster and I was unable to drain one of the nodes in order to install Windows Updates.

An error occurred pausing node 'RDU-HV01'
Error Code: 0x80071748
"The requested operation can not be completed because a resource has locked status"
I love specific error messages.
I love specific error messages.

In Hyper-V Manager the VM was stuck in a “Backing Up” status, and this was after I manually Live Migrated all other VMs to my second node.

status-backing-up

When trying to manually Live Migrate this VM I was prompted to override the locked resources and try again… and just like any good System Administrator I saw an opportunity to try and force something to work, while potentially producing an extremely horrible outcome, so I naturally clicked “Yes”. YOLO.

The virtual equivalent of kicking the computer.
The virtual equivalent of kicking the computer.

But it still failed with an error!

Failed to Live migrate virtual machine 'VM_name'
The action 'Move' did not complete.
Error Code: 0x80070057
The parameter is incorrect
#fail
#fail

I restarted the VM but that had no effect so I shut it down.

Now that the VM was shut down I could restart the node which I did from a command prompt utilizing the shutdown command. However the node would not restart – it was stuck somewhere in the shutdown process. I could still see it in Server Manager and when I did a systeminfo from the command prompt the System Boot Time told me that it had not restarted yet. Since I was doing this remotely, I could not go into the server room to shut down the server and I had no OoB management configured so I had to do a little digging. I found that others with this issue were able to fix it by restarting the Hyper-V Virtual Machine Management service. I tried stopping this service (vmms is the Service Name) from Server Manager of my Windows 8.1 laptop but it did not seem to work.

I then opened a command prompt to try using SC.exe to stop or restart the vmms service. By the time I figured out the correct syntax, I noticed that the node had just gone down for a restart. Maybe it timed out, or maybe my command from Server Manager just took a minute to go through.

The correct syntax would have been:

sc \\rdu-hv01 query vmms
sc \\rdu-hv01 stop vmms

The VM which was stuck in the “Backing up…” state was automatically moved to my second node and the first node restarted itself. The VM which was stuck started properly on the second node and the status for “Backing up…” was no longer showing.

Once the first node came back up from its restart I was able to Pause and Drain Roles to go on with my maintenance.

If this happened again I would suggest shutting down the VM which is stuck in the “Backing up…” status. Then Live Migrate everything else (don’t forget your storage!) so that the only thing on this node is the VM that is off. I would then attempt to restart the vmms service. If that does not work, restart the node.

Slow Network Performance on Server 2012 R2 Core

In going through the motions of upgrading our Hyper-V cluster from 2008 R2 to 2012 R2, I had originally started to deploy a Hyper-V 2012 cluster. While learning more about 2012 R2, I realized that there is no real way to upgrade a Hyper-V cluster, so I would need to burn down our 2012 cluster completely in order to use that hardware to create a 2012 R2 cluster. I wanted the new functionality of 2012 R2, and had not migrated more than a couple of VMs to the 2012 cluster, so I evicted one node from the 2012 cluster and installed 2012 R2. The VMs on the 2012 cluster were living on the single node in the 2012 cluster of one node.

Once I had the new node (a Dell PowerEdge R620 with 128 GB of RAM) running Server 2012 R2 Core Edition, I performed the initial setup of configuring the server properties with sconfig, configuring network settings using PowerShell, joining the server to the domain, running Windows Updates, installing Corefig, installing EMC software such as PowerPath and the Navisphere Agent, and a few other things to prepare the server for deployment.

I even created my new 2012 R2 cluster at this point, even though it was not needed quite yet since there was only one node running with Server 2012 R2.

After everything was ready for deployment, I created a test VM running Server 2012 R2. Since we run Server Core edition I used a 2012 R2 VM in my 2008 R2 Failover Cluster to manage the new node, using Hyper-V Manager to create the VM. Once the test VM was ready to be sent into “test production” I closed the Console connection and used Remote Desktop Connection to log on to my new VM.

I noticed that the performance of the VM via RDP was very slow. Even my RDP sessions to a remote site were better than my RDP session to this test VM which was in the server room at the main office (which was where I was). Doing a simple test by pinging the server came back with poor results. Pinging the node on which this VM was running via the management interface was fine – all response times were between <1ms and 1ms.

The storage network (connectivity to my SAN via two 1GB NIC using EMC PowerPath via iSCSI) was performing fine. Ping was normal and data transfer speeds between the test VM and the SAN matched those between the node and the SAN, as well as those from my 2012 cluster VMs to the same SAN.

Something was obviously wrong, but what?

The first thing I tried was to make another VM from scratch and see if it had the same results when in a RDP session. The outcome was the same – poor performance.

I thought it might be a settings issue, so I compared all of the settings related to networking with my Server 2012 node which was the exact same hardware. The only difference was that it was running Server 2012 and my new node was running Server 2012 R2. I compared settings of VMs themselves, settings of the Virtual Switch attached to these VMs, and the NIC Teaming settings on the nodes. Only one setting was different and it was the “Load balancing mode” of the NIC team dedicated to Cluster traffic (all VM traffic). I changed this to match, but it had no effect.

I figured something might be wrong that I can’t see via the GUI, so I recreated all of the virtual networking components that were tied to this machine. Since this node was so new, there was no production system running on it and I was able to do this outside of an official maintenance window. I deleted the Virtual Switch and destroyed the NIC Team. I then rebuilt the networking and attempted a test – the same problem was occurring.

Every experienced IT Pro has been in this situation before. You have something going wrong and you’ve almost run out of ideas. But on the bright side, you’re probably going to learn something new…

Like I said, I was almost out of ideas.

My next troubleshooting steps included thinking about the physical components. I thought maybe a LAN cable was bad. I was going to test this by trying new cables, but I wanted to try something else first before getting physical.

After doing more research on NIC Teaming with Windows Server 2012 R2 and learning more about Teaming mode and Load balancing mode, I destroyed the NIC Team and recreated it once more for good measure. I noticed that when I recreated the NIC Team it took some time for the second NIC in the team to become Active. Whether or not this observation had any merit, it got me thinking on the right track:

DRIVERS!

Before I went down the road of troubleshooting drivers I wanted to try the test I had in mind, which was segregating the NICs and testing them individually. If it was a bad cable, I would be able to tell which one (if only one and not both) was having problems.

So I destroyed the NIC team again and assigned the NICs static IP addresses. I didn’t need to assign static IPs to run my test because DHCP was working, but I wanted to reinforce some PowerShell learning. I opted to give out static IP addresses and also disable the interfaces from registering with DNS.

I don’t want these interfaces registering in DNS because they will be the interfaces that are being used for Cluster traffic only; I will not be allowing the Host OS to use the network adapter (a Hyper-V Virtual Switch setting which I will disable). If the host registers these interfaces in DNS I could have some issues, so I opt to remove the DNS registration.

My saved PowerShell code for setting a static IP address:

#call network adapter by name
 $netadapter = get-netadapter -name "name of NIC"
 #disable dhcp on this network adapter
 $netadapter | set-netipinterface -dhcp disabled
 #set ipv4 address, subnet mask, type
 $netadapter | new-netipaddress -addressfamily ipv4 -ipaddress 192.168.1.100 -prefixlength 24 -type unicast

Then with help from this thread on TechNet I was able to prepare a script to disable DNS registration. I know how to do it with netsh:

netsh interface ipv4 set dnsservers name="name of NIC" source=static address=172.20.1.5 register=none

but I wanted to do it with PowerShell.

#get adapter configuration by adapter name (NetConnectionID)
 $na=Get-WMIObject Win32_NetworkAdapter -filter "NetConnectionID = 'name of NIC'"
 $config=$na.Getrelated('Win32_NetworkAdapterConfiguration')
 #display current settings for DNS registration
 $config|select DomainDNSRegistrationEnabled, FullDNSRegistrationEnabled
 #disable DNS registration
 $config|%{$_.SetDynamicDNSRegistration($false,$false)}

Now that I had my static IP addresses set, I did the ping test to each static IP.

They both came back perfect. The results were <1ms to 1ms for both endpoints. This cemented my belief that it was something to do with the NIC Team and/or the driver.

Immediately I remembered that NIC Teaming was now  “in the box” with Server 2012 (and 2012 R2) (AKA now officially supported by Microsoft).

This led me to believe that it might have some functionality issues due to it being a new feature in Windows Server. I decided that I would update the drivers of my network adapters to see if this would resolve the issue. Since Server 2012 R2 is still fairly new, I figured my Broadcom NICs probably need the latest OEM driver rather than the one that Windows Server 2012 R2 installed on its own.

I knew I had Broadcom NICs but I didn’t know exactly which model. I tried to view the Device Manager remotely, since we run Core Edition of Windows Server, but found out after a few hours of research that viewing the Device Manager remotely is no longer supported! (Cue Sad Trombone)

However I did learn that you can get the same Device Manager information via PowerShell:
http://blogs.technet.com/b/wincat/archive/2012/09/06/device-management-powershell-cmdlets-sample-an-introduction.aspx

After installing the Device Management PowerShell cmdlets and trying to figure out how to get the information I wanted, I resorted to using Corefig. I was already about two hours in just trying to figure out what NICs I had in the server. I had contemplated changing over to GUI mode in order to run Device Manager, but I really did not want to have to go that far.

By using Corefig, I was able to view the “System Information” and look at hardware components to find information about the network adapters and what driver they were currently using.

Notes from Beyond The Post: Little did I know I could have opened System Information by typing “msinfo32” in the command prompt.

The driver file is b57nd60a.sys so I looked that up on the Internet and it led me to http://www.broadcom.com/support/ethernet_nic/netxtreme_server.php

I scrolled down to “Windows 2012-R2 (x64)” and saw that the latest driver version is 16.2.0.4

The driver listed in “System Information” was old – version 15.6.0.10

Copy/Paste output from “System Information”:

Name [00000010] Broadcom NetXtreme Gigabit Ethernet
 Adapter Type Ethernet 802.3
 Product Type Broadcom NetXtreme Gigabit Ethernet
 Installed Yes
 PNP Device ID PCI\VEN_14E4&DEV_165F&SUBSYS_1F5B1028&REV_00\000090B11C1DBA1D00
 Last Reset 2/18/2014 10:16 AM
 Index 10
 Service Name b57nd60a
 IP Address Not Available
 IP Subnet Not Available
 Default IP Gateway Not Available
 DHCP Enabled No
 DHCP Server Not Available
 DHCP Lease Expires Not Available
 DHCP Lease Obtained Not Available
 MAC Address ‪90:B1:1C:1D:BA:1D‬
 Memory Address 0xD91A0000-0xD91AFFFF
 Memory Address 0xD91B0000-0xD91BFFFF
 Memory Address 0xD91C0000-0xD91CFFFF
 IRQ Channel IRQ 4294967266
 …
 …
 IRQ Channel IRQ 4294967242
 Driver c:\windows\system32\drivers\b57nd60a.sys (15.6.0.10, 444.20 KB (454,864 bytes), 8/1/2013 8:34 PM)

So I downloaded the new driver and put it on the node in the C:\Drivers folder

I then used pnputil to install the driver.

I knew that I would get disconnected since I was doing all of this remotely, but if I wasn’t able to reconnect to the node I would just walk into the server room and hop on the server with our KVM in the rack.

pnputil -i -a c:\drivers\broadcom_win_b57_x64\b57nd60a.inf

Yep, I got disconnected. But I knew my session would reconnect after the driver update completed (as long as things went well).

And it did!

Using pnputil to update the Broadcom drivers in 2012 R2 Core
Using pnputil to update the Broadcom drivers in 2012 R2 Core

Now the exciting part – did this work to fix the network performance??

Seriously. I was excited. This is the fun part of my job that I really enjoy. I quickly went to my Server 2012 R2 VM to manage the node remotely in order to build the NIC Team as quickly as possible. I used Server Manager on this management VM to launch “Configure NIC Teaming” and build my NIC Team. This time, I made my Load Balancing setting “Dynamic” after learning more about that setting.

In order to learn more about NIC Teaming Mode in Server 2012 R2 I used the “Windows Server 2012 R2 NIC Teaming (LBFO) Deployment and Management” guide.

According to section 3.4.3 of this guide (emphasis my own):

3.4.3 Switch Independent configuration / Dynamic distribution
This configuration will distribute the load based on the TCP Ports address hash as modified by the Dynamic load balancing algorithm. The Dynamic load balancing algorithm will redistribute flows to optimize team member bandwidth utilization so individual flow transmissions may move from one active team member to another. The algorithm takes into account the small possibility that redistributing traffic could cause out-of-order delivery of packets so it takes steps to minimize that possibility.
The receive side, however, will look identical to Hyper-V Port distribution. Each Hyper-V switch port’s traffic, whether bound for a virtual NIC in a VM (vmNIC) or a virtual NIC in the host (vNIC), will see all its inbound traffic arriving on a single NIC.
This mode is best used for teaming in both native and Hyper-V environments except when:
a) Teaming is being performed in a VM,
b) Switch dependent teaming (e.g., LACP) is required by policy, or
c) Operation of a two-member Active/Standby team is required by policy.

Once the NIC Team was built I went to Hyper-V Manager and opened the Virtual Switch Manager for the node in question. I then created my Virtual Switch that would carry VM traffic.

Hyper-V Virtual Switch Manager

Now that the Virtual Switch was created I added it to the VM itself and clicked Start.

Once the machine was online and accessible, I did a ping test just as I had done a long time ago (at this point it had been more time than I care to admit!) and SUCCESS! The pings were all between <1ms and 1ms!

Updating the Broadcom drivers to the latest version for Server 2012 R2 was the solution to my issue. I could not be happier to resolve this, as now I could go full steam into migrating our VM infrastructure from Hyper-V 2008 R2 to Hyper-V 2012 R2.

Just to verify everything, I used “System Information” again to look at the drivers post-update:

System Information after updating the Broadcom drivers
System Information after updating the Broadcom drivers

The Driver path here technically applies to the NIC above that is not seen, but the NIC is the same as the one that is shown (it is listed 8 times in “System Information” because I have two 4-port Broadcom NICs).

As always if you see any way I could have improved this process or have anything to add, please leave a comment below!

 

Installation Configuration: Hyper-V Server 2012 R2

Upon first boot of Hyper-V Server 2012 R2 you are presented with a blue Server Configuration window

sconfig window in Hyper-V Server 2012
sconfig window in Hyper-V Server 2012

This window is called automatically by the system but it can be found in C:\Windows\System32

(Read more about Sconfig.cmd here: http://technet.microsoft.com/en-us/library/jj647766.aspx)

If you closed the window you can always access it via command line by typing:

sconfig
tip
Helpful tip for Core mode: If you closed the command prompt and have nothing on the screen, press CTRL + ALT + END
 

Even after enabling Remote Management, I was not able to connect to my server via RDP. I enabled ping in the sconfig Network Settings to ensure that I could see the server from my desktop, and while ping was successful I still could not open a RDP session to manage my Core installation.

I figured it was the firewall, so I disabled it completely:

netsh advfirewall set currentprofile state off
netsh-firewall-off
Turning off the firewall with netsh

After turning the firewall off in its current profile I was able to successfully remote in to my Hyper-V Server 2012 R2 installation and finish up my settings.

Since this is a home lab setup, I am fine with the firewall being off. In a domain environment, you might turn the firewall off as well (depending on your security protocols).

*******

At first I did not want to disable the firewall completely so I went down the path of figuring out why I could not RDP to my server. This turned out to be somewhat of a challenge.

I started by checking out the Remote Desktop rules. This command will return the name of the Remote Desktop rule and whether it is enabled:

get-netfirewallrule -displaygroup "remote desktop" | format-table name, enabled -autosize
List of firewall rules that are part of the "Remote Desktop" group
List of Windows Firewall rules that are part of the “Remote Desktop” group

If we had run this command prior to using sconfig to enable Remote Desktop, we would only see the first three rules. The second three rules are added and enabled when we use sconfig to enable Remote Desktop. The first three rules are not enabled, but the GUID rules are enabled.  Why did sconfig create three new rules to enable Remote Desktop?

You can see here that the GUID rule (the first in the image, after I renamed it from the GUID to match the DisplayName) matches the second rule shown: “RemoteDesktop-UserMode-In-TCP” (no spaces), except for the Profile attribute, which for the GUID rule (top) is Domain, Private and for the built-in rule (bottom) is Public.

Caption TBD
GUID Rule (top, renamed), Built-In rule (bottom)
(Read my post on renaming firewall rules with PowerShell)

Still, Remote Desktop is not enabled, so we can enable the three rules that are not enabled:

enable-netfirewallrule -displaygroup "remote desktop"
Enabling all rules in the "Remote Desktop" group
Enabling all rules in the “Remote Desktop” group

Another check of the rules:

Now all rules in the "Remote Desktop" group are enabled
Now all rules in the “Remote Desktop” group are enabled

Remote Desktop rules are all now enabled and I was able to successfully RDP to my server!

These steps would be the same for the full version of Windows Server 2012 R2 Core Edition Standard or Datacenter.

But WHY?!

I ended up figuring all of this out after a re-install of Hyper-V Server 2012 R2. What happened was that sconfig added the firewall rules for RDP (the GUID rules), but it added them for the Domain and Private firewall profiles. My server was set on the Public profile. Therefore, the rules that were added via sconfig were not applicable. Why does this happen out of the box? I suppose that is a question for Microsoft.

In the end I simply added the Domain and Private Profiles to the built-in rules, then enabled the group as above. I did NOT enable Remote Desktop with sconfig because I did not want it to add those three “extra” GUID rules. I suppose if you were going to have multiple connections using different firewall profiles then you would want separate rules, but this is for a lab setup and I like to make things less confusing!

In order to add the Domain and Private profiles to the built-in firewall rules, I used the following command. I included the Public profile just to be complete, even though it is already part of that rule:

set-netfirewallrule -name remotedesktop-shadow-in-tcp -profile Domain,Private,Public
Add Domain and Private profiles to the existing firewall rules
Add Domain and Private profiles to the existing firewall rules
All firewall profiles are now part of the built-in firewall rule
All firewall profiles are now part of the built-in firewall rule

Now repeat this command for your other two rules:

RemoteDesktop-UserMode-In-TCP
RemoteDesktop-UserMode-In-UDP

Happy Administration!