ESXi network routes

I honestly don’t know why this is still a problem. Support for routed vmotion traffic was added back at vSphere6. Here we are vSphere7 and still have to set your gateway/routes for the vmotion stack through esxcli.

Either way, here is how it works

vMotion stack

Each tcp/ip stack can only have one gateway, that makes sense. And if you want to keep your management and vMotion traffic separated you need two tcp/ip stacks.

It’s nicely done through vSphere vCenter GUI and there is a KB for it. And you even have the option to override the default gateway and specify the right one for your vMotion stack.

[root@dc1esxcompx-xx:~] esxcli network ip route ipv4 list -N vmotion
Network     Netmask        Gateway  Interface  Source
----------  -------------  -------  ---------  ------
10.1.115.0  255.255.255.0  0.0.0.0  vmk1       MANUAL

But when looking at the routing table from esxcli it is not set. If you know, feel free to give me a kick and enlighten me.

ESXCLI add a static route

So for me to set an actually default route I have to do it as shown below

[root@dc1esxcompx-xx:~] esxcli network ip route ipv4 add -g 10.1.115.1 -n 0.0.0.0/0 -N vmotion

[root@dc1esxcompx-xx:~] esxcli network ip route ipv4 list -N vmotion
Network     Netmask        Gateway     Interface  Source
----------  -------------  ----------  ---------  ------
default     0.0.0.0        10.1.115.1  vmk1       MANUAL
10.1.115.0  255.255.255.0  0.0.0.0     vmk1       MANUAL

PowerCLI

Need to do it on a cluster with multiple hosts? No problem LucD from VMware community got you covered. I only did a little customization and it works for my needs.

connect-viserver -Server 

$stackName = 'vmotion'
$ipGateway = '10.1.115.1'
$ipDevice = 'vmk3'
$cluster = "computexx"
$vmhosts = get-cluster $cluster | get-vmhost

foreach($vmhost in $vmhosts)
{
$esx = Get-VMHost -Name $vmhost
$netSys = Get-View -Id $esx.ExtensionData.ConfigManager.NetworkSystem
$stack = $esx.ExtensionData.Config.Network.NetStackInstance | where{$_.Key -eq 'vmotion'}
$config = New-Object VMware.Vim.HostNetworkConfig
$spec = New-Object VMware.Vim.HostNetworkConfigNetStackSpec
$spec.Operation = [VMware.Vim.ConfigSpecOperation]::edit
$spec.NetStackInstance = $stack
$spec.NetStackInstance.ipRouteConfig.defaultGateway = $ipGateway
$spec.NetStackInstance.ipRouteConfig.gatewayDevice = $ipDevice
$config.NetStackSpec += $spec
$netsys.UpdateNetworkConfig($config,[VMware.Vim.HostConfigChangeMode]::modify)
}

Conclusion

Manipulating the vmotion stack route table with either esxcli or PowerCLI is working great.

Need to know more? there are plenty of good bloggers and KBs out here.

ssacli – CLI configuring SmartArray

When you install an HPE server with the VMware custom image for HPE servers you automatically get all the HPE tools for configuring the hardware. Neat.

Here is a small guide on how to clean and setup the array. The ssacli is located in /opt/smartstorageadmin/ssacli/bin on the ESXi serve – start here and follow the next commands

Show the existing config:

[root@esx2:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl slot=0 ld all show
Smart Array P440ar in Slot 0 (Embedded)
   Array A
      logicaldrive 1 (279.37 GB, RAID 1, Failed)
   Array B
      logicaldrive 2 (3.27 TB, RAID 1+0, OK)

Delete the existing logical drives:

[root@esx2:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl slot=0 ld 1 delete
Warning: Deleting an array can cause other array letters to become renamed.
         E.g. Deleting array A from arrays A,B,C will result in two remaining
         arrays A,B ... not B,C 
Warning: Deleting the specified device(s) will result in data being lost.
         Continue? (y/n)  y
[root@esx2:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl slot=0 ld 2 delete
Warning: Deleting the specified device(s) will result in data being lost.
         Continue? (y/n)  y
[root@esx2:/opt/smartstorageadmin/ssacli/bin] 

Show all physical drives in the server:

[root@esx2:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl slot=0 pd all show
Smart Array P440ar in Slot 0 (Embedded)
   Array A
      physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS HDD, 900 GB, OK)
      physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS HDD, 900 GB, OK)
      physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS HDD, 900 GB, OK)
      physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS HDD, 900 GB, OK)
      physicaldrive 1I:1:13 (port 1I:box 1:bay 13, SAS HDD, 900 GB, OK)
      physicaldrive 1I:1:14 (port 1I:box 1:bay 14, SAS HDD, 900 GB, OK)
      physicaldrive 1I:1:15 (port 1I:box 1:bay 15, SAS HDD, 900 GB, OK)
      physicaldrive 1I:1:16 (port 1I:box 1:bay 16, SAS HDD, 900 GB, OK)

Create a new volume with available physical drives:

[root@esx2:/opt/smartstorageadmin/ssacli/bin] ./ssacli ctrl slot=0 create type=ld drives=1I:1:9,1I:1:10,1I:1:11,1I:1:12,1I:1:13,1I:1:14,1I:1:15,1I:1:16 raid=6

Warning: Controller cache is disabled. Enabling logical drive cache will not take effect until this has been resolved.
[root@esx2:/opt/smartstorageadmin/ssacli/bin] 

Conclusion

Nice and easy – Give the HPE ssacli manual a read for more commands, starting page 57 and forward or use ssacli help.

[root@esx2:/opt/smartstorageadmin/ssacli/bin] ./ssacli help

CLI Syntax
   A typical SSACLI command line consists of three parts: a target device, 
   a command, and a parameter with values if necessary. Using angle brackets to
   denote a required variable and plain brackets to denote an optional 
   variable, the structure of a typical SSACLI command line is as follows:

      <target> <command> [parameter=value]

   <target> is of format:
      [controller all|slot=#|serialnumber=#]
      [array all|<id>]
...........

VMware CSE – Stuck cluster deployment

After upgrading to CSE 3.1.3 with VCD 10.3.1 I encountered a problem when creating clusters from the Ubuntu 20.04 native cluster template.

Basically, the mstr node would be deployed and started, VMTools will become ready and the first script injection would happen. Then all of a sudden the VM would reboot and the cluster creation will fail because it can’t see the process anymore. This will sometimes leave a cluster in the “Creation in progress” status but somehow it can not be managed anymore.

22-06-02 10:42:34 | cluster_service_2_x:2811 - _wait_for_tools_ready_callback | DEBUG :: waiting for guest tools, status: vm='vim.VirtualMachine:vm-835608', status=guestToolsNotRunning
22-06-02 10:42:39 | cluster_service_2_x:2811 - _wait_for_tools_ready_callback | DEBUG :: waiting for guest tools, status: vm='vim.VirtualMachine:vm-835608', status=guestToolsRunning
22-06-02 10:42:41 | cluster_service_2_x:2817 - _wait_for_guest_execution_callback | DEBUG :: waiting for process 1706 on vm 'vim.VirtualMachine:vm-835608' to finish (1)
22-06-02 10:42:46 | cluster_service_2_x:2817 - _wait_for_guest_execution_callback | DEBUG :: process [0, <Response [200]>, <Response [200]>] on vm 'vim.VirtualMachine:vm-835608' finished, exit code: 0
22-06-02 10:42:46 | cluster_service_2_x:2869 - _execute_script_in_nodes | DEBUG :: about to execute script on mstr-7e34 (vm='vim.VirtualMachine:vm-835608'), wait=True
22-06-02 10:42:48 | cluster_service_2_x:2817 - _wait_for_guest_execution_callback | DEBUG :: waiting for process 1729 on vm 'vim.VirtualMachine:vm-835608' to finish (1)
22-06-02 10:42:58 | cluster_service_2_x:2896 - _execute_script_in_nodes | ERROR :: Error executing script in node mstr-7e34: process not found (pid=1729) (vm='vim.VirtualMachine:vm-835608')
Traceback (most recent call last):
  File "/opt/vmware/cse/python/lib/python3.7/site-packages/container_service_extension/rde/backend/cluster_service_2_x.py", line 2879, in _execute_script_in_nodes
    callback=_wait_for_guest_execution_callback)

I created an SR request with Cloud Director GSS for both the failed deployment and for the stuck clusters that now couldn’t be deleted. Multiple screen sharing sessions later and no result.

Then I found the GitHub for Container Service Extension, the issue page had a very tempting title Failed deployments using TKGm on VCD. Many seem to have the same problem, no fix on the deployments but it seems that one guy had the fix for deletion of the stuck clusters.

The workaround

You need to find the ID of the user that owns the cluster. You can in the More>Kubernetes Clusters menu in VCD see who the owner is.

When you have the owner you can go into Administration > User > <User>. Then then the URL with contain the ID of the user.

vcd.ramsgaard.me/tenant/tenant1/administration/access-control/users/v9993018-ebf5-4ded-8134-27ddcc4ccbf0/general

With the userId you can fill out the body for the next API call.

$vdchost = "vcd.ramsgaard.me"
$apiusername = "svc-cse@system"
$password = 'Ye.........iks12!'

$base64AuthInfo = [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes(("{0}:{1}" -f $apiusername,$password)))
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12
$auth =Invoke-WebRequest -Uri "https://$vdchost/api/sessions" -Headers @{Accept = "application/*;version=32.0";Authorization="Basic $base64AuthInfo"} -Method Post

$accessBody = '{
    "grantType": "MembershipAccessControlGrant",
    "accessLevelId": "urn:vcloud:accessLevel:FullControl",
    "memberId": "urn:vcloud:user:e96cf9e8-535f-45d8-8a87-b9dac659f85f"
  }' | ConvertFrom-Json

$status = Invoke-RestMethod -Uri "https://$vdchost/cloudapi/1.0.0/entities/urn:vcloud:type:cse:nativeCluster:2.1.0/accessControls" -Headers @{Accept = "application/json;version=36.1";Authorization="Bearer $($auth.Headers.'X-VMWARE-VCLOUD-ACCESS-TOKEN')"} -ContentType "application/json" -Method post -Body ($accessBody | ConvertTo-Json)

When the API call is done you should now be able to delete the stuck cluster.

If you should be so unfortunate that the cluster is stuck in a “not resolved” state and the deletion through VCD GUI still fails you need to use the vcd cse cli.

### Login to VCD system or tenant organistaion
vcd login vcd.ramsgaard.me system jr
### Show clusters
vcd cse cluster list
### Force delete the cluster
vcd cse cluster delete tanzu1 --force

Conclusion:

The problem occurred in the first place due to a bug in VCD 10.3.1, the MQTT bus had some bug and therefore the cluster creation failed. 10.3.2 or 10.3.3 fixed the bug. (Off cause the VMware Tanzy Grid version should be used in the future)

It took some time to find the workaround, I hope the future of CSE will be more fault tolerant so these situations would not appear.

Until then there is a way to get out of the stuck cluster situation.

Disk mapping Windows <-> VMware – Part 2

A couple of years ago I did a post on how to map your windows disk with the real disk in VMware. The post will be an extension of it but with updated commands.

Why do I need to know the mapping? It happens when you stumble upon a VM disk with many disks attached. If the many disks vary in size you normally can look at those numbers and match them with the disks in VMware, but when all disks have the same size that approach become difficult.

Windows serial number:

In windows, we can retrieve the serial number on the disk we need to expand and then map the serial number to the VMware disk. In newer Windows Server versions it’s fairly easy to find but when dealing with older than 2012 you are missing the PowerShell cmdlets like get-disk. Someone on StackOverflow got a way that works on Windows Server 2008 > 2022.

$DriveLetter = "C:"
Get-CimInstance -ClassName Win32_DiskDrive |
Get-CimAssociatedInstance -Association Win32_DiskDriveToDiskPartition |
Get-CimAssociatedInstance -Association Win32_LogicalDiskToPartition |
Where-Object DeviceId -eq $DriveLetter |
Get-CimAssociatedInstance -Association Win32_LogicalDiskToPartition |
Get-CimAssociatedInstance -Association Win32_DiskDriveToDiskPartition |
Select-Object -Property SerialNumber

VMware disk:

From VMware’s side, it’s straightforward to find the disk and its serial number. Below is an scripted way of finding the disk and then adding the extra capacity.

Connect-VIServer ""

$VMname = ""
$disksn = "6000c295ec128b3d14472bdbf8e65aee"
$vmDisk = (Get-VM $VMname | Get-HardDisk) | Where-Object {$_.ExtensionData.Backing.uuid.Replace("-","") -eq $disksn } 

$ExpandSizeGb = 50
$vmDisk | Set-HardDisk -CapacityGB ($vmDisk.CapacityGB + $ExpandSizeGb) -Confirm:$false 

Conclusion:

Instead of having to guess what disk in windows is mapping to the VMware disk you here have a more automated way. The disk serial number retrieve commands are compatible with up to Windows Server 2022.

Brocade – FiberChannel zoning with CLI

When having todo zoning in Brocade world you still have to use an old java GUI client. Not always easy to open the GUI since not all Brocade switches have been updated and therefore all kinds of java security will prevent you from doing your job. With CLI we can skip the GUI and get to business instant.

CLI config does also have the great advantage to be sort of self-documentation 🙂 Here we will look a bit into doing what I would call basic zoning. A new device has been added to your fiber channel fabric and you need to create a zone for the new WWN and the storage system.

Adding ssh key

This is, in my opinion, a good thing to set up, makes it way easier for you to log in the next time. Off-cause is not optional, but a great advantage.

The public ssh key is gathered from a system that allows SSH-based logins. See below for info on how to set it up.

SAN-SW-03:admin> sshutil importpubkey
Enter user name for whom key is imported:admin
Enter IP address:10.1.100.20
Enter remote directory:/home/username
Enter public key name(must have .pub suffix):username.pub
Enter login name:username
username@10.1.100.20's password: 
public key is imported successfully.

Create alias

Below you first see the syntax, and afterward the actual command used in production.

# Syntax
alicreate “ALIAS_NAME”, “WWPN”

# Production command
alicreate "dc2esxmgmt1_11", "50:01:43:80:31:80:50:98"

Create zone

# Syntax
zonecreate "ZONE_NAME", "WWPN_alias_1;WWPN_alias_2"

# Production command
zonecreate "dc2storv5030_dc2esxmgmt1_11", "dc2esxmgmt1_11;dc2storv5030_01_01_p1v;dc2storv5030_01_01_p2v;dc2storv5030_01_02_p1v;dc2storv5030_01_02_p2v"

Add zone to config

cfgadd “fabric1”,”dc2storv5030_dc2esxmgmt1_11"

SAN-SW-03:admin> cfgsave
WARNING!!!
The changes you are attempting to save will render the
Effective configuration and the Defined configuration
inconsistent. The inconsistency will result in different
Effective Zoning configurations for switches in the fabric if
a zone merge or HA failover happens. To avoid inconsistency
it is recommended to commit the configurations using the
'cfgenable' command.

Do you want to proceed with saving the Defined
zoning configuration only?  (yes, y, no, n): [no] yes
Updating flash ...

Enable new config

SAN-SW-03:admin> cfgenable fabric1 
You are about to enable a new zoning configuration.
This action will replace the old zoning configuration with the
current configuration selected. If the update includes changes 
to one or more traffic isolation zones, the update may result in  
localized disruption to traffic on ports associated with
the traffic isolation zone changes.
Do you want to enable 'fabric1' configuration  (yes, y, no, n): [no] yes
zone config "fabric1" is in effect
Updating flash ...

Conclusion

It’s easy to do basic fiber channel zoning. The only thing that I miss from the GUI, and haven’t found yet, is a view to see all discovered WWN and then choose the WWN to the new alias you create. The server management system, in this case, shows the WWN name of the FC adapter and it’s easy to copy-paste into CLI commands.

So if you don’t have HPE OneView or some other fancy FC provisioning tool then you know how a basic and lowkey way of doing it.

And again, did I mention that when using CLI it basically document itself? 😉

Juniper upgrade process

Junos is in my opinion an awesome OS for your network. I enjoy the CLI, where commands are alike across all of Juniper’s products. Also, the many features and the fact that it’s not cisco.

BUT it also has its drawbacks. Honestly, I have seen some weird bugs. And keeping track of all the PRs from Juniper is a full-time job. And last but not least, the software upgrades are kind of a pain. especially on Junos devices older than 18.x.

EX3400 – format/install

For this case, I had a new EX3400, but with older firmware, 15.1X53-D58.3. I needed to upgrade to the latest SR in the newest train but from the CLI of the device only jumping 3 firmware versions are supported.

15.1> 18.1 > 18.4 > 19.3 > 20.2 > 21.1

But you can also do a format/install where you interrupt the boot process and then load a new firmware image on the device from a TFTP server. This is all done outside of Junos. This way you can jump to whatever version you want.

Jumping many version might make your config invalid, so beaware.

Juniper has a LOT of kb articles for this process and they all vary. So here is the process in my own writing

Process of format install

First, we need to get the right image from the juniper support side. It needs to the install image and the extension is .tgz

  • Download the image into your TFTP server.

In my case, the TFTP is a Linux box. If you prefer windows then TFTPd3264 is the way to go. Or MacOS then look here.

root@tftp:/srv/tftp# wget -O junos-install-media-net-ex-arm-32-21.4R1.12.tgz  'https://cdn.juniper.net/software/junos/21.4R1.12/junos-install-media-net-ex-arm-32-21.4R1.12.tgz?SM_USER=jv......5ce43fbdad2'
Resolving cdn.juniper.net (cdn.juniper.net)... 23.78.40.231
Connecting to cdn.juniper.net (cdn.juniper.net)|23.78.40.231|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 393745989 (376M) [application/octet-stream]
Saving to: ‘junos-install-media-net-ex-arm-32-21.4R1.12.tgz’

junos-install-media-net-ex-arm-32-21.4R1.12.tgz      100%[==================================================================>] 375.50M  3.48MB/s    in 2m 4s

2022-01-26 20:47:46 (3.03 MB/s) - ‘junos-install-media-net-ex-arm-32-21.4R1.12.tgz’ saved [393745989/393745989]

root@tftp:/srv/tftp# ls
junos-install-media-net-ex-arm-32-21.4R1.12.tgz
  • Now let’s reboot the switch and interrupt the “first” boot loader. just keep hitting ctrl+c after you powered rebooted when you see the “=>” you are in the right place. Here we set the IP address on the me0 interface and boot into the next boot loader.
Board: EX3400-24T
Base MAC: C00380FAAD2E
arm_clk=1000MHz, axi_clk=500MHz, apb_clk=125MHz, arm_periph_clk=500MHz
Net:   Registering eth
Broadcom BCM IPROC Ethernet driver 0.1
Using GMAC0 (0x18022000)
et0: ethHw_chipAttach: Chip ID: 0xdc14; phyaddr: 0x1
et0: gmac_serdes_init read sdctl(0xf4141c)
et0: gmac_serdes_init() serdes_status0: 0xf100ff00; serdes_status1: 0xf00
et0: gmac_serdes_init() PLL ready brought up exit
serdes_reset_core pbyaddr(0x1) id2(0xf)
bcmiproc_eth-0
Last Reset Reason: 0
Hit ^C to stop autoboot:  0
=>setenv ipaddr 10.1.100.253
=>setenv gatewayip 10.1.100.1
=>setenv netmask 255.255.255.0
=>setenv serverip 10.1.101.130
=>save
=>boot
Saving Environment to SPI Flash...
SF: Detected MX25L6405D with page size 256 Bytes, erase size 64 KiB, total 8 MiB, mapped at 0001faa0
Erasing SPI flash...Writing to SPI flash...done
Erasing SPI flash...Writing to SPI flash...done
SF: Detected MX25L6405D with page size 256 Bytes, erase size 64 KiB, total 8 MiB
device 0 offset 0x3c0000, size 0x10000
SF: 65536 bytes @ 0x3c0000 Read: OK
  • Wait for a few seconds for the next bootloader to appear and press ctrl+c again. Now you will see a menu, this menu you choose 5 and 5 and you should see “loader>”
Hit ^C to stop autoboot:  0 
Options Menu

1.  Recover [J]unos volume
2.  Recovery mode - [C]LI

3.  Check [F]ile system
4.  Enable [V]erbose boot
5.  [B]oot prompt
6.  [M]ain menu
Choice: 
Type 'menu' to go back to the menu
Type 'boot-junos' to boot into Junos
Type 'reboot' to reboot

5 5
  • We now set use the install format with the TFTP location of the image we downloaded in the first step.
Type '?' for a list of commands, 'help' for more detailed help.
loader> install --format tftp://10.1.101.130/junos-install-media-net-ex-arm-32-21.4R1.12.tgz
/kernel text=0x105b888 data=0x640fc+0x1fbf04 syms=[0x4+0x914a0+0x4+0x9b821]
/ex3400.dtb size=0x1f76
/crypto.ko text=0x419e0 data=0xe58+0x2a0 syms=[0x4+0x4740+0x4+0x2ba5]
/iflib.ko text=0x11f10 data=0x910+0x58 syms=[0x4+0x2b10+0x4+0x2194]
/miibus.ko text=0x19f38 data=0x10c4+0x78 syms=[0x4+0x51f0+0x4+0x3491]
/if_gmac.ko text=0xbc3c data=0x688+0xc syms=[0x4+0x1cc0+0x4+0x15ad]
/contents.iso size=0x279b000
Using DTB from loaded file '/ex3400.dtb'.
Kernel entry at 0xc1000180...
Kernel args: (null)
---<<BOOT>>---
GDB: no debug ports present
K cache
Release APs
WARNING: WITNESS option enabled, expect reduced performance.
mwill now attempt to reach the remote host.
<====== LOADS OF OUTPUT TO CONSOLE ======>
<====== LOADS OF OUTPUT TO CONSOLE ======>
Downloading /junos-install-media-net-ex-arm-32-21.4R1.12.tgz from 10.1.101.130 ...
rmed on 1024 samples passed.t-up health tests perfo
  300.6MB  03:52random: unblocking device.
  393.7MB  05:04
Installing Junos OS release ...

After 15-20 minutes the switch will have the install finished and ready for you to log into and start loading your config.

FreeBSD/arm (Amnesiac) (ttyu0)
login: 

Conclusion

This is a very helpful process and might come in handy when having new switches with old firmware that need to be applied. Skipping the smaller version jumps is a time saver.

This format install process can also be done with a USB key. This process is also quite simple but requires you to have physical access to the switch.

In my case, I have a console over ssh and can manage the switch out-of-band so TFTP is the easy way.

Veeam – retrive saved passwords from VBR

Ever needed to retrieve a saved Veeam password? I did – Found the process for it on the Veeam forum.

  • Open SQL Studio as administrator and connect to the Veeam DB instance
  • Run query from below on the VeeamBackup database
SELECT TOP (1000) [id]
,[user_name]
,[password]
,[usn]
,[description]
,[visible]
,[change_time_utc]
FROM [VeeamBackup].[dbo].[Credentials]
Query the Veeam DB for all stored credentials to backup infrastructure components

Get the password hash from the results (match the description to the one you need). Then run PowerShell below with the hash you grabbed.

Add-Type -Path "C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.Common.dll"
$encoded = 'AQAAANCM....RhQ'
[Veeam.Backup.Common.ProtectedStorage]::GetLocalString($encoded)
Password revealed and ready to use

Conclusion:

Is this a security problem? Depends, but it will give you a reminder of how important it is to keep your Veeam VBR server safe. Never domain join and have the firewall closed as much as possible. If a malicious person comes by your Veeam server they can grab the keys for the rest of your infrastructure, including your backup of cause. In most cases that would mean game over.

Faster and more scripted way:

$instance = (Get-ItemProperty -Path "HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication" -name SqlInstanceName).SqlInstanceName
$server = (Get-ItemProperty -Path "HKLM:\SOFTWARE\Veeam\Veeam Backup and Replication" -name SqlServerName).SqlServerName
$result = Invoke-Sqlcmd -Query "SELECT TOP (1000) [user_name],[password],[description] FROM [VeeamBackup].[dbo].[Credentials]" -ServerInstance "$server\$instance"
Add-Type -Path "C:\Program Files\Veeam\Backup and Replication\Backup\Veeam.Backup.Common.dll"
$result | ForEach-Object { [Veeam.Backup.Common.ProtectedStorage]::GetLocalString($($_.password))}

Cloud Director 10.3 – Update certificates

Since my last article on how to update Cloud Director SSL certificates, there has been a major change. No more binary java truststore – jaaay.

Cloud Director has changed over too, what I think, is a better and more normal way of storing the private and public keys, which is in PEM format. From release notes, the change actually happened in 10.2, but the certificate path changed again in 10.3. If you are in doubt of where the certificate path is then look inside global.properties

/opt/vmware/vcloud-director/etc/global.properties

VMware’s own documentation state that we can now just swap the .pem files, use the cell-management tool to import and restart the cell.

What we will do and what is needed

  • Get a new public signed certificate
    • Either in PEM format as .key and .pem(certificate including intermediate)
    • Or in PFX so it can be exported
  • Backup existing certificates
  • Replace existing certificates with your new certificate
  • Run VCD tool to import and define the private key encryption password
  • Restart cell(s)

Process

If you have a pfx you can use this article to extract the key and cert. If you already have the two files, .key end .pem then you can proceed.

We will follow VMware documentation and create a backup of the existing files.

cp /opt/vmware/vcloud-director/etc/user.http.pem /opt/vmware/vcloud-director/etc/user.http.pem.original
cp /opt/vmware/vcloud-director/etc/user.http.key /opt/vmware/vcloud-director/etc/user.http.key.original
cp /opt/vmware/vcloud-director/etc/user.consoleproxy.pem /opt/vmware/vcloud-director/etc/user.consoleproxy.pem.original
cp /opt/vmware/vcloud-director/etc/user.consoleproxy.key /opt/vmware/vcloud-director/etc/user.consoleproxy.key.original

Now we can wither SCP in our key and certificate or edit and replace the content of the files on the server by copying and pasting in content from the files you have. Whatever you find to be the easiest.

Forgot your root password for the Cloud Director appliance, off cause not. But anyway, here is a link to reset it....

After the “user.http.pem/key” and “user.consoleproxy.pem/key” files have been updated with the new certificate data we can tell Cloud Dictor to update its config with the commands below. This is done to update the encryption password for the private key.

If you don’t care about security you can also update without –key-password, then off cause your private key will need to be in an unencrypted format in the .key files.

/opt/vmware/vcloud-director/bin/cell-management-tool certificates -j --cert /opt/vmware/vcloud-director/etc/user.consoleproxy.pem --key /opt/vmware/vcloud-director/etc/user.consoleproxy.key --key-password PASSWD
/opt/vmware/vcloud-director/bin/cell-management-tool certificates -p --cert /opt/vmware/vcloud-director/etc/user.http.pem --key /opt/vmware/vcloud-director/etc/user.http.key --key-password PASSWD

If everything works out it will tell you the certificates have been updated and you need to restart VCD for it to take effect.

SSL configuration has been updated. You will need to restart the cell for changes to take effect.

Now safely shut down your cell(s) with the command below. this will ensure that VCD is the first shutdown when all tasks are done.

/opt/vmware/vcloud-director/bin/cell-management-tool cell -i $(service vmware-vcd pid cell) -s

Start again with the command below

systemctl start vmware-vcd

Conclusion

VMware has made it much easier to change a certificate in Cloud Director. The new way of storing certificates is a warm welcome change.

I did see a few different placements for the .key and .pem files depending on versions or if the cells have been created with raw Linux or an appliance, but you can always look in the conflig file placed in the same folder as the certificates.

Storage DRS recommendations – with PowerCLI

Many things can happen when you let Storage DRS run fully automated. If you have it on from the beginning it will probably only give you good things. But enabling it on a large storage space imbalanced cluster might be a bit too risky.

Many things that Storage DRS is not aware of. Like your storage underneath running out of space on pool/aggregate or the operations is too IO heavy to run within business hours.

Call me a wimp, but in this case, it seems better to be in control and apply the recommendations little by little. But having to use the GUI is a pain, you need to go into Storage Cluster > Monitor > Storage DRS > Recommendations. And from here you need to override the selections and uncheck the boxes so you can run smaller batches of Storage vMotions.

I will just use VMware PowerCLI cmdlets…

Well, unfortunately not all of vSphere API is exposed through PowerCLI cmdlets, but after a bit of googling it seemed quite easy to call the SDK API directly from within PowerShell

One post that came to my attention where containing most of the code needed.

Solution:

I’m not that much into what the ServiceInstance or StorageRessoruceManager is. But I expect it to be the API instantiated by PowerShell where you then have each operation from where you can find the functionality that you are looking for.

 # DSC you want to work with
$dscName = 'DatastoreCluster'

# Get DSC info
$dsc = Get-View -ViewType StoragePod -Filter @{'Name'=$dscName}

# Get Service Intance
$si = Get-View ServiceInstance

# Get the StorageResourceManager
$storMgr = Get-View -Id $si.Content.StorageResourceManager

# Refresh SDRS Recommendation on DSC
$storMgr.RefreshStorageDrsRecommendation($dsc.MoRef)

# Update dsc object with fresh recommendation data
$dsc.UpdateViewData()

# Filter on reason for storage balance. Select only 40 VMs.
$balance = $dsc.PodStorageDrsEntry.Recommendation | Where-Object {$_.Reason -eq "balanceDatastoreSpaceUsage"}  | Select-Object -First 40

# Do a run of each VM and start the storage vMotion process
foreach($vm in $balance){
   $message = "Moving VM: {0} to datastore: {1}" -f $(get-vm -id $("VirtualMachine-"+$($vm.Action[0].Target.Value))).name, $(get-datastore -id $vm.Action[0].Destination).name
   write-host $message -ForegroundColor Green
   $storMgr.ApplyStorageDrsRecommendationToPod($dsc.MoRef,$vm.Key)
} 

Conclusion:

I was expecting to use some PowerCLI cmdlets to make my granular balance of the storage cluster. Unfortunately, that did not exist.

But from the great community, I found how to use the vSphere API through PowerShell and in the end got the functionalty I was looking for.

Maybe there is an easier way to do the same, if so, let me know. Until next time I have a bit of vSphere SDK googling to do.

Veeam – Network Extention Appliance performance

This post will give a brief write-up on what to expect from a network perspective when using the Veeam Network Extention.

Since you found this post I don’t think an introduction is needed. But anyway. A quick write-up of the network so you can visualize how the test is performed.

  • Greenline indicated the L2 VPN made from both NEA to CloudGateway
  • The on-prem environment with 10gbit internet uplink
  • Service provider with multiple 10gbit internet uplinks
  • 4ms between on-prem and service provider

Tests:

So when a replica VM has been failover and the NEA L2 is running. What to expect? Veeam does not give you any info on the performance of the NEA. Veeam support is not either able to give out a performance chart. So here a the results from ping and iperf test.

Test 1 – ping of latency over the L2 tunnel:

Ping to 185.177.120.140 showing the latency over the internet. Ping to 192.168.12.151 showing the latency over the L2 VPN.

So from a latency perspective, it seems good. Only adding 1ms to the internet latency. Which is pretty good.

Test 2 – iperf over tunnel

iperf test from a VM in service provider side to an on-prem server.

About 110Mbit, not very good compared to the internet being able of doing 10Gbit.

iperf test from VM in service provider side to an on-prem server. This time with -P 8 for 8 parallel threads.

8 threads are not giving any further bandwidth.

Test 3 – Multiple VLAN with multiple NEA

It’s always interesting to find where the bottleneck could be. Since iperf over the internet is giving a completely different result. then it must be within NEA. When I tried to do multiple VLAN bridges to the cloud resources in Cloud Director I get the same results pr NEA. Meaning it could be something in NEA or its components making the bottleneck.

The good news is off cause that you will see the same result pr NEA even when doing iperf test to the same target in the other end. So NEA will scale linearly.

A look from the Veeam Cloud Connect gateway being the broker of the L2 VPN connections.
View from the iperf server – showing the connections from the servers in the other end of L2 VPN.

Conclusion

NEA is a very helpful solution, especially when it comes to large migrations where L2 between datacenters is required meanwhile migrating. Bandwidth using this solution is not great, but I would say is ok. L2 connection should only be used shortly when doing actually migrations.

In numbers, it seems NEA will add +1ms to the latency seen over the internet between the two environments. Bandwith is between 110 to 140mbit pr sec.

Manual mount VMFS datastore

Have a datastore that shows Not Consumed? From time to time I stumble across them and from what I have found there is really only one way to get around it, manually mount the datastores from the shell of the ESXi host.

Not sure what the root cause for it is, but if you know, then please let me know 🙂

Workaround

What we need to do is have the partitions UUID’s on the block device listed and afterwards mount the datastore with that UUID.

### Listing all available datastores that is not mounted
esxcfg-volume –l
### Mount a specefic datastore with the UUID found with -l 
esxcfg-volume –M <UUID>

Conclusion

After mounting with esxcfg-volume it should be mounted permanatly. Hope it works for you to.

Change VM MoRef in VBR database

This information can be found in many other places on the big internet, but since I can never find it myself, I will make a post more about the procedure.

When you switch ESXi host, vCenter, or remove and add from inventory your VM will get a new ID. In the world of VMware, it’s called MoRef ID.

When this happens Veeam will lose its coupling to the VM and backup will fail with:
– Virtual Machine <> is unavailable and will be skipped from processing.
– Nothing to process. All machines were excluded from task list.

How to verify there is a MoRef mismatch:

From a VMware perspective it’s easy:

connect-viserver <vcenter> -Credential $cred
Get-VM | select name, id

This will give you something like:

PS C:\Windows\system32> Get-VM | select name, id
Name Id
---- --
VirtualMachine-vm-71326

From Veeam perspective it’s a bit harder since you will need to query the MS SQL database that Veeam uses. So download the SQL Studio Manager from Microsoft.

Open the SQL Studio Manager as administrator on the server to gain access to the Veeam database. You can use the following query to find the MoRef that is in the Veeam database:

SELECT [dbo].[BObjects].id, [dbo].[BObjects].object_id, [dbo].[BObjects].host_id, [dbo].[BObjectsSensitiveInfo].object_name, [dbo].[BObjectsSensitiveInfo].path
FROM [dbo].[BObjects]
INNER JOIN [dbo].[BObjectsSensitiveInfo] ON [dbo].[BObjectsSensitiveInfo].bObject_id=[dbo].[BObjects].id  
WHERE object_name = '<vmname>'

Verify:

So we can now see that the VM in VMware has MoRef “vm-71326”. But Veeam database has “vm-992”. From here on you know what’s wrong and you need to open a Veeam support case to get the supported procedure.

If you don’t care about supported procedures you can update the database with VMware VM new MoRef ID and your VBR job should be running again. The SQL query would look like this:

UPDATE [dbo].[BOobjects]
SET [object_id] = 'new-id'
WHERE [object_id] = 'old-id'

Conclusion

It’s not that had to change the MoRef in the VBR database. But remember, if you care about having a supported installation. Then you need to create a Veeam support case and have them help you. Something could have changed in the VBR database schema since this post.

Massdelete files

Not sure what happened, but thousands of small files piled up in the path below.

C:\ProgramData\Microsoft\Crypto\RSA\S-1-5-8

My google-fu wasn’t sufficient enough to find out what files in that folder actually did. An article from Kofax said that just deleting files could be troublesome. But also found another article that stated that he deleted all files older than 30 days and hasn’t had problems yet. So I dare to do the same. The command was found on a blog, so all credit goes to this guy.

forfiles /D -30 /C "cmd /C attrib -s @file & echo @file & del @file"

forfiles is a very nice command that iterates through the files in a folder according to its parameters. /D -30 iterates through all files more than 10 days old. attrib -s takes off the System attribute, which is needed for DEL to work. The echo is there so you can see that it is doing its job.

The PowerShell equivalent would be:

Get-ChildItem -Path C:\programdata\Microsoft\Crypto\rsa\S-1-5-18 -Include *.* -Recurse | foreach { $_.Delete()}

Extend disk with LVM

Here is a quick walkthrough showing you how to expand an LVM volume or partition in Linux by first resizing logical volume followed by resizing the file system to take advantage of the additional space.

Note: In this example, we are working in Ubuntu, some commands may differ in different Linux distributions.

DISCLAIMER: Make sure that you have proper backups in place before starting out with the resize procedure on your VMs! Create any backups necessary to ensure that if something goes wrong you can always go back to a previous working state. Losing any information or wiping out your disks is all your fault if this happens. Using this procedure is all at your own responsibility.

If you are unsure about LVM and its components I will suggest you read a bit upon it. There are tons of articles, for example on Digital Ocean.

Process

This process can be easy to do with LVM as it can be done on the fly with no downtime needed, you can perform it on a mounted volume without interruption. In order to increase the size of a logical volume, the volume group that it is in must-have free space available. It goes as follows.

  • Add more space from the hardware level. Either raid controller, or hypervisor.
  • Resize your partition to contain the extra space
  • Resize the PV in LVM
  • Expand the LV in LVM
  • Resize filesystem

To view the free space of your volume group, run pgdisplay command as shown below and look at the “Free PE / Size” field, in this case non-free.

root@www:~# pvdisplay  
--- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               server1-vg
  PV Size               99.00 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              25343
  Free PE               0
  Allocated PE          25343
  PV UUID               cgPMYF-PkeW-1iaS-FxUZ-Ky9r-Zoa8-ktpEk5

Since we have added more space from the hardware level need to grow the partition. This is done by deleting it and create a new one that is starting from the same sectors but where the new partition uses more sectors than before. In the output beneath we can see how to extend the partition.

root@www:~# fdisk -l /dev/sdb
Disk /dev/sdb: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb97da1a0

Device     Boot Start       End   Sectors Size Id Type
/dev/sdb1        2048 207620095 207618048  99G 83 Linux
root@www:~# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-209715199, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-209715199, default 209715199): +99.9G

Created a new partition 1 of type 'Linux' and of size 99.9 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Device or resource busy

The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).

root@www:~# partprobe

The disk partition is now extended. We need to inform LVM to grow its PV

root@www:~# pvresize /dev/sdb1
  Physical volume "/dev/sdb1" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

Now we can verify that there is available space to grow LV.

root@www:~# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               server1-vg
  PV Size               99.88 GiB / not usable 3.00 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              25568
  Free PE               225
  Allocated PE          25343
  PV UUID               cgPMYF-PkeW-1iaS-FxUZ-Ky9r-Zoa8-ktpEk5

Lastly, we grow the LV and extend the filesystem.

root@www:~# lvextend -l +100%FREE /dev/mapper/server1--vg-root
  Size of logical volume server1-vg/root changed from 196.51 GiB (50307 extents) to 197.39 GiB (50532 extents).
  Logical volume root successfully resized.

root@www:~# resize2fs /dev/mapper/server1--vg-root
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/mapper/server1--vg-root is mounted on /; on-line resizing required
old_desc_blocks = 13, new_desc_blocks = 13
The filesystem on /dev/mapper/server1--vg-root is now 51744768 (4k) blocks long.

Conclusion

We have now successfully expanded a file system and corresponding LVM logical volume without any downtime. This was done by first expanding the partition of the disk, then the logical volume and finally performing an online resize of the file system.

No servers were harmed doing this procedure 🙂

Cisco ASA HA member swap

The old Intel C2000 bug is still among us. Not all devices have been affected, yet. For me, this is the first. So here is my 2 cent on how to remediate the cluster with an RMA unit.

https://www.cisco.com/c/en/us/support/docs/field-notices/642/fn64228.html

Process:

  1. Label all cables from the faulty unit, this is for you to not worry that any cables are mislocated when you are swapping the unit. I also disabled the links for the switch so that I with a calm can plug them in again without fearing that something will go wrong. Better safe than sorry.
  2. Make sure the firmware level on the new device is the same as the existing member. Firmware up/downgrade is needed.
  3. Backup the config of the HA member still running. Just in case something bad happens.
  4. Give an IP on the HA link interface, from this point it should find the existing member and start replication the config.

Firmware upgrade:

I have booted the unit and linked it only with a console cable. I can now see the firmware version and it needs an upgrade.

ciscoasa> en
Password:
ciscoasa#
ciscoasa# sh version

Cisco Adaptive Security Appliance Software Version 9.8(2)20
Firepower Extensible Operating System Version 2.2(2.63)
Device Manager Version 7.5(1)

Compiled on Fri 02-Feb-18 06:10 PST by builders
System image file is "disk0:/asa982-20-lfbff-k8.SPA"
Config file at boot was "startup-config"

ciscoasa up 30 secs

Hardware:   ASA5516, 8192 MB RAM, CPU Atom C2000 series 2416 MHz, 1 CPU (8 cores                                            )
Internal ATA Compact Flash, 8000MB
BIOS Flash M25P64 @ 0xfed01000, 16384KB

I have prepared a USB disk format in FAT and downloaded the matching firmware from cisco.com. When you plug in the USB key to the ASA you should now see a disk1 where you can copy from. If you want to see disk1 content issue command “show disk1”.

We now copy over the files and make the system boot the new firmware.

ciscoasa# copy disk1:/asa9-13-1-lfbff-k8.SPA disk0:/asa9-13-1-lfbff-k8.SPA

Source filename [asa9-13-1-lfbff-k8.SPA]?

Destination filename [asa9-13-1-lfbff-k8.SPA]?

Copy in progress...CCCCC
Verifying file disk0:/asa9-13-1-lfbff-k8.SPA...
Computed Hash   SHA2: 80500c1790c76e90dde61488c3f977b8
                      69711278b6e550eeb8ea8830e19c4a23
                      8cf03fe64d1d9927d4a78e77b6090234
                      98485fbf9bc058eb3820b32e7a56f91f

Embedded Hash   SHA2: 80500c1790c76e90dde61488c3f977b8
                      69711278b6e550eeb8ea8830e19c4a23
                      8cf03fe64d1d9927d4a78e77b6090234
                      98485fbf9bc058eb3820b32e7a56f91f


Digital signature successfully validated

Writing file disk0:/asa9-13-1-lfbff-k8.SPA...

107543456 bytes copied in 26.40 secs (4136286 bytes/sec)
ciscoasa# config t
ciscoasa(config)# boot system disk0:/asa9-13-1-lfbff-k8.SPA
ciscoasa(config)# wr mem
ciscoasa(config)# reload

After reload, the system is now up and I can confirm that it has booted on the new firmware.

ciscoasa> show version

Cisco Adaptive Security Appliance Software Version 9.13(1)
SSP Operating System Version 2.7(1.107)
Device Manager Version 7.5(1)

Compiled on Mon 23-Sep-19 09:28 PDT by builders
System image file is "disk0:/asa9-13-1-lfbff-k8.SPA"
Config file at boot was "startup-config"

ciscoasa up 29 secs

Hardware:   ASA5516, 8192 MB RAM, CPU Atom C2000 series 2416 MHz, 1 CPU (8 cores)
Internal ATA Compact Flash, 8000MB
BIOS Flash M25P64 @ 0xfed01000, 16384KB

Joining the HA cluster

We now verified that the two ASA firewalls are on the correct firmware level. Now connect all the cables to the firewall, on the switch side all data links are administratively down, the HA link between the two ASA is a dedicated link. And those are the links we are now going to configure.

You can grab the lines from the existing member that are actively running. If you don’t have the failover key, you can also reset this on on the primary/existing member.

failover lan unit secondary
failover lan interface HA_FAILOVERLINK GigabitEthernet1/7
failover key ***
failover link HA_STATELINK GigabitEthernet1/8
failover interface ip HA_FAILOVERLINK 172.16.254.1 255.255.255.0 standby 172.16.254.2
failover interface ip HA_STATELINK 172.16.255.1 255.255.255.0 standby 172.16.255.2

The new ASA is now ready to contact the primary member of the cluster and start the replication. In my case, the interfaces for HA were administratively down. So we are now going to enable the link and enable failover.

interface GigabitEthernet 1/7
no shut
interface GigabitEthernet 1/8
no shut
failover

If something in the config is not ok, missing files or other is listed and you have to remediate this before you again can try to enable HA with the “failover” command. In the output beneath you can see what happens when the failover command is enabled with success.

Detected an Active mate
Beginning configuration replication from mate.
End configuration replication from mate.

You can now check failover status and see if the standby member is in ready mode. if not try giving the standby a reload.

failover reload-standby

If the standby member is now in a ready state you are now ready to do a live failover. Remember to enable the ports again on the switch side.

Conclusion

The process is not so bad as I thought. And there where no downtime involved. For me, I was missing ASDM and AnyConnect packages on the new standby node. I downloaded it from the existing primary node and then copied it to a USB disk. When the USB disk is plugged into the standby ASA I can then copy the files over the ASA flash.

copy /noconfirm disk1:/anyconnect-win-4.8.01090-webdeploy-k9.pkg disk0:/anyconnect-win-4.8.01090-webdeploy-k9.pkg
copy /noconfirm disk1:/anyconnect-macos-4.8.01090-webdeploy-k9.pkg disk0:/anyconnect-macos-4.8.01090-webdeploy-k9.pkg
copy /noconfirm disk1:/VPN_client_profile.xml disk0:/VPN_client_profile.xml
copy /noconfirm disk1:/anyconnect-linux64-4.8.01090-webdeploy-k9.pkg disk0:/anyconnect-linux64-4.8.01090-webdeploy-k9.pkg
copy /noconfirm disk1:/Management_client_profile.xml disk0:/Management_client_profile.xml
copy /noconfirm disk1:/asdm-7131.bin disk0:/asdm-7131.bin

From there on I could do a live failover and see the little “Active” light change on the physical ASA firewalls. With all traffic flowing uninterrupted. Mission accomplished.

macOS TFTP server

I have never really found a good TFTP for macOS. Is it funny that macOS is much used by network people but there isn’t a decent TFTP server?

Well, there is. macOS has it built-in, no GUI though. But that’s also fine, as long as you know to use it. It’s disabled by default, but you can start and stop it with the following commands.

### Start TFTP
sudo launchctl load -F /System/Library/LaunchDaemons/tftp.plist

### Stop TFTP
sudo launchctl unload -F /System/Library/LaunchDaemons/tftp.plist

### Check if its running (no process means it not running)
netstat -atp UDP | grep tftp

The TFTP daemon uses the /private/tftpboot folder so we are going to copy the file there. Then set the correct permissions on the file.

### Copy file to tftp folder
cp FILENAME /private/tftpboot
### Set permissions for the folder and files within
chmod -R 766 /private/tftpboot

There is a gotcha with the TFTP daemon, which is you cant copy a file to the TFTP daemon if that file does not already exist there.  To work around it you can just create a file and set the permission for it. Then your devices will just send data into the pre-created file.

### Create the file
touch /private/tftpboot/FILENAME
### Set permissions
chmod -R 766 /private/tftpboot

Shrink VMDK disk

I have always thought that VMDK could only grow, so that has also been my default response to colleagues when they expanded a disk too much. Sure a storage vMotion could reclaim unused space in a thin disk, but the “down arrow” for storage capacity would never work. But then someone mentioned that he had done shrinking of disks a couple of times, I decided to investigate.

The official VMware kb isn’t too much help – somewhere discussing it on StackOverflow. But then I found an older post back from 2016 that seems to have found the approach so that’s what we are going to test out.

Disclaimer:

This is not supported in any way, use at your own responsibility. If you want a supported solution, then VMware converter in a v2v manner is kind of the only way. If you still want to try out the method, then be sure to have a valid backup! And by backup, it’s not a VMware snapshot.

Not supported:

From the VMware documentation, it seems shrinking disk is not allowed under the following circumstances:

  • The virtual machine is hosted on an ESX/ESXi server.ESX/ESXi Server can shrink the size of a virtual disk only when a virtual machine is exported. The space occupied by the virtual disk on the ESX/ESXi server, however, does not change.
  • The virtual machine has a Mac guest operating system.
  • You preallocated all the disk space to the virtual disk when you created it.
  • The virtual machine contains a snapshot.
  • The virtual machine is a linked clone or the parent of a linked clone.
  • The virtual disk is an independent disk in nonpersistent mode.
  • The file system is a journaling file system, such as an ext4, xfs, or jfs file system.

The test scenario:

I have a windows 2019 VM, here is the process I want to try out

  1. Expand VMDK disk in vCenter
  2. Extent disk in VM guest using diskpart
  3. Shrink disk in VM guest using diskpart
  4. calculate new sector size
  5. edit VM *.vmdk with the newly calculated sector size
  6. Storage migrate to other datastore
  7. Check if VM is still ok.

Walkthrough:

We start off with the VM. Its Windows 2019, original size is 40GB.
Disk is now extended with 5gb.
With a view from the esxi we can see the disk is also showing 45GB.
inside “win2019.vmdk” we can see the “extent description”. This is the number we have to change after the guest os filesystem has been shrunk.
Here we see the disk has been extended to 45GB and then shrunk down with 10GB.

Calculating the “extent description”:

So there is now 10GB free space we can shrink the VMDK with.

A virtual disk described as monolithic and flat consists of two files. One file contains the descriptor. The other file is the extent used to store virtual machine data.

Considering our existing extent
RW 94371840 VMFS “win2019-flat.vmdk”
This means that the file win2019-flat.vmdk is 94371840 sectors × 512 bytes/sector = 48318382080 bytes = 48318MB in size.

Let’s calculate the new value from GB to sectors.

36GB x 1024(mb) x 1024(kb) x 1024(byte) / 512byte pr sector = 75.497.472

before proceeding, we need to power off the VM. The .vmdk file is loaded into memory, so even if we can edit it now and start storage vMotion our changed value will just change back.

Letting vMotion do its magic
And after the boot of VM the disk is now shrunk. And we still have a working guest os.

Conclusion

It worked, we were able to add more space to the VM, extent, and shrink the guest os filesystem. We then calculated the number of sectors for the .vmdk file and storage vMotion did its magic and made the VMDK smaller in physical size.

I have also tried this in a couple of cases, also real life senairoes where people have added 4TB to much…. Then its sometimes easier to shrink than having to move files around.

VCD – Find free external IPs

Finding free public IPs in Cloud Director backed by NSX-V is not as easy as it should be. Some people will tell you to ping the scope and see what’s responding. But pinging is not reliable was of finding free IPs. Not every device is responding to ICMP messages.

Somewhere along the line, I found a guy on the VMware forum posting a script for finding available IPs in Cloud Director using the PowerCLI module for querying VCD and getting back IPs that are not allocated by an Edge. I have been using the script quite a bit since. His blog is not available today, but the code is still on the forum.

Now it’s also available here on the site with a bit more explanation on how to connect and use the function. I have been using it with NSX-V as backend, haven’t tried it with NSX-T at the network backend yet.

 ### Install PowerCLI module
Install-Module -Name VMware-vCD-Module

### Import PowerCLI Cloud module
Import-Module -Name VMware.VimAutomation.Cloud

### Connect to Cloud Director instance with your credentials
Connect-CIServer -server <VCD_URL>

Function Get-FreeExtIPAddress([String]$extnetName){
    function  Convertto-IPINT64  () { 
    param ($ip) 
    
    $octets = $ip.split(".") 
    return [int64]([int64]$octets[0]*16777216 +[int64]$octets[1]*65536 +[int64]$octets[2]*256 +[int64]$octets[3]) 
    } 
    
    function  Convertto-INT64IP() { 
    param ([int64]$int) 
    
    return (([math]::truncate($int/16777216)).tostring()+"."+([math]::truncate(($int%16777216)/65536)).tostring()+"."+([math]::truncate(($int%65536)/256)).tostring()+"."+([math]::truncate($int%256)).tostring() )
    } 
    $extnet = Get-ExternalNetwork -name $extnetName
    $ExtNetView = $Extnet | Get-CIView
    $allocatedGatewayIPs = $extnetView.Configuration.IpScopes.IpScope[0].SubAllocations.SubAllocation.IpRanges.IpRange | ForEach-Object {
        $startaddr = Convertto-IPINT64 -ip $_.StartAddress
        $endaddr = Convertto-IPINT64 -ip $_.EndAddress
        for ($i = $startaddr; $i -le $endaddr; $i++) 
        { 
            Convertto-INT64IP -int $i 
        }
    }
    [int]$ThirdStartingIP = [System.Convert]::ToInt32($extnet.StaticIPPool[0].FirstAddress.IPAddressToString.Split(".")[2],10)
    [int]$ThirdEndingIP = [System.Convert]::ToInt32($extnet.StaticIPPool[0].LastAddress.IPAddressToString.Split(".")[2],10)
    [int]$FourthStartingIP = [System.Convert]::ToInt32($extnet.StaticIPPool[0].FirstAddress.IPAddressToString.Split(".")[3],10)
    [int]$FourthEndingIP = [System.Convert]::ToInt32($extnet.StaticIPPool[0].LastAddress.IPAddressToString.Split(".")[3],10)
    $octet = $extnet.StaticIPPool[0].FirstAddress.IPAddressToString.split(".")
    $3Octet = ($octet[0]+"."+$octet[1]+"."+$octet[2])
    $2Octet = ($octet[0]+"."+$octet[1])
    $ips = @()
    if ($ThirdStartingIP -eq $ThirdEndingIP) {
        $ips = $FourthStartingIP..$FourthEndingIP | % {$3Octet+'.'+$_}
    } else {
        do {
            for ($i=$FourthStartingIP; $i -le 255; $i++) {
                $ips += ($2Octet + "." + $ThirdStartingIP + "." + $i)
            }
            $ThirdStartingIP=$ThirdStartingIP + 1
        } while ($ThirdEndingIP -ne $ThirdStartingIP)
        for ($i=0;$i -le $FourthEndingIP; $i++) {
            $ips += ($2Octet + "." + $ThirdStartingIP + "." + $i)
        }
    }
        $allocatedIPs = $ExtNetView.Configuration.IpScopes.IpScope[0].AllocatedIpAddresses.IpAddress
    for ($i=0;$i -le $ips.count; $i++) {
        for ($j=0; $j -lt $allocatedGatewayIPs.count; $j++) {
            if ($ips[$i] -eq $allocatedGatewayIPs[$j]) {
                $ips = $ips | Where-Object { $_ -ne $ips[$i] }
                $i--
            }
        }
        for($z=0;$z -lt $allocatedIPs.count;$z++) {
            if ($ips[$i] -eq $allocatedIPs[$z]) {
                $ips = $ips | Where-Object { $_ -ne $ips[$i] }
                $i--
            }
        }
    }
    return $Ips
}

### Find the names of external networks
Get-ExternalNetwork

### Find free IPs using function from above
Get-FreeExtIPAddress -extnetName <vcd-net>

You can help yourself by copy and pasting the code snip into either PowerShell ISE or VisualCode. And since you need to install a cmdlet you need to run it with elevated rights. If you get a red message with importing the module it’s probably because of execution rights, you then need to run to command beneath. This is for allowing remote signed cmdlets to be executed.

Set-ExecutionPolicy RemoteSigned

Getting the names of the external networks with “get-externalnetwork”
Using the function to find available IPs in the selected external network

Change Veeam job repository

Having to move jobs to another repository is something that can be time-consuming. If you have to do it with the existing job you will need to disable the job, move the data to the new location, point the job to the new location and then enable the job again.

I found the other approach to creating new jobs easier. You can do it in the GUI, but when having 200+ jobs it could take some time. Instead, I did a small script to list all jobs located on the old repo.

The script also looks a the retention point and writes into the old job that is can be deleted after x days. If you have very long retention then this way is not feasible and you will probably have to move data and point exing job to the new location.

Hope somebody else can use it 🙂

Add-PSSnapin -Name VeeamPSSnapIn
Connect-VBRServer -Server <VBRSERVER>

$Jobs = Get-VBRJob | where {$_.IsScheduleEnabled -eq $true} | where {$_.FindTargetRepository().name -eq "dc2sveeamrepo01-scaleout"} | where {$_.IsRunning -eq $false}

# Select first 15 jobs and list them afterwards.
$remainingJobs = $Jobs | select -first 40
$remainingJobs | select name

foreach ($job in $remainingJobs)
{
$oldjob = "$($job.Name)delete after $($job.BackupStorageOptions.RetainCycles) days"
$job.Info.CommonInfo.name = "$oldjob"
$Job.update()
Disable-VBRJob -Job $oldjob

$jobName = "$($job.Name.Split("_")[0])"

Copy-VBRJob -job $oldjob -name $jobName  -Repository "dc2sveeamrepo02-scaleout" 

Enable-VBRJob -job $jobName

sleep 5

start-vbrjob -Job $jobName -RunAsync 

write-host "$jobName : have been cloned and is now started...." -ForegroundColor Green
}
 

NSX API – DLR L2 bridging

Here is a script for mass DLR L2 bridge creation. I had to bridge a couple of hundred VLAN to VXLAN, and while it was maybe faster to create it by hand I would not have learned anything.

The script is reading from a CSV file where I have all my info. Then loops through the entries and create a distributed port group and then initiates an L2 bridge. The VXLAN had been created post to this operation.

$csv = Import-Csv "D:\temp\VLAN.csv" -Delimiter ";"
Import-Module PowerNSX
get-module -name vmware* -ListAvailable | Import-Module

$cred = get-credential
connect-viserver -server -Credential $cred

foreach ($net in $csv) {
    $vdportgroup = ("zitmit-$($net.acl)").ToLower()

    $exists = Get-VDSwitch -Name "DSMpls01-EX" | Get-VDPortgroup -Name $vdportgroup -ErrorAction SilentlyContinue
    if (!$exists) {
        Get-VDSwitch -Name "DSMpls01-EX" | New-VDPortgroup -Name $vdportgroup -VLanId $net.mitvlan -NumPorts 2
        $created = Get-VDSwitch -Name "DSMpls01-EX" | Get-VDPortgroup -Name "zitmit-acl-10344"
        if (!created) {
            Write-Host -ForegroundColor Green "Portgroup created: $vdportgroup"

            $vdportgroupId = ($created.Id).Replace("DistributedVirtualPortgroup-","")
            $vdportgrpupName = $created.Name

            create-nsxl2bridge -aclname $($net.acl) -dvportGroup $($created.key)
        }
    }
    else {
        Write-Host -ForegroundColor Yellow "Portgroup have allready been created: $vdportgroup"
        #Get-VDSwitch -Name "DSMpls01-EX" | New-VDPortgroup -Name $vdportgroup -VLanId $net.mitvlan -NumPorts 2
    }
}

Function create-nsxl2bridge {
    param(
        [string]$aclname,
        [string]$dvportGroup
    )

    # Login info
    $nsxUsername = 
    $nsxPassword = 

    # Allow all SSL protocols
    $AllProtocols = [System.Net.SecurityProtocolType]'Ssl3,Tls,Tls11,Tls12' 
    [System.Net.ServicePointManager]::SecurityProtocol = $AllProtocols

    # Connect to NSX manager
    $connection = Connect-NsxServer  10.1.70.5 -Username $nsxUsername -Password $nsxPassword -WarningAction SilentlyContinue
    $virtualwire = Get-NsxLogicalSwitch | Where-Object { $_.name -match "$aclname" -and $_.name -notmatch "lan" }

    if ($virtualwire.count -gt 1) {
        $message = "Something could wrong - $aclname"
        write-host $message -ForegroundColor yellow
        $message | Out-File C:\log\create-nsxl2bridge.txt -Append
        $virtualwire = $virtualwire[0]
    }
    elseif (!$virtualwire) {
        $message = "virtualwire was not found: $($virtualwire.objectId) - acl: $aclname"
        write-host $message -ForegroundColor yellow
        $message | Out-File C:\log\create-nsxl2bridge.txt -Append
        return
    }

    # Edge info
    $edgeId = "edge-1120"
    $Type = "Accept: application/xml"
    $Header = @{"Authorization" = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($nsxUsername + ":" + $nsxPassword)) }
    $nsxUri = "https://10.1.0.4/api/4.0/edges/$edgeId/bridging/config"

    # Getting edge config
    $currentL2Config = $null
    $currentL2Config = Invoke-RestMethod -Uri $nsxUri -Headers $Header -Method GET -ContentType $Type

    # Check if already there
    foreach ($z in $currentL2Config.SelectNodes("//name"))
    {
        if ($z.'#text' -match $aclname ) {
            write-host "Already exists: $aclname" -ForegroundColor yellow
            return
        }
    }

    # Add extra xml node to currentconfig
    $handler1 = $null
    $handler1 = $currentL2Config.CreateNode('element', "bridge", '')
    $attr = $currentL2Config.CreateNode('element', "bridgeId", '')
    $attr.InnerText = "$null";
    $handler1.AppendChild($attr)
    $attr = $currentL2Config.CreateNode('element', "name", '')
    $attr.InnerText = "$aclname";
    $handler1.AppendChild($attr)
    $attr = $currentL2Config.CreateNode('element', "virtualWire", '')
    $attr.InnerText = "$($virtualwire.objectId)";
    $handler1.AppendChild($attr)
    $attr = $currentL2Config.CreateNode('element', "dvportGroup", '')
    $attr.InnerText = "$dvportGroup";
    $handler1.AppendChild($attr)
    
    # Remove nodes from existing XML
    $currentL2Config.SelectNodes("//virtualWireName") | ForEach-Object { $_.ParentNode.RemoveChild($_) }
    $currentL2Config.SelectNodes("//isSharedNetwork") | ForEach-Object { $_.ParentNode.RemoveChild($_) }
    $currentL2Config.SelectNodes("//dvportGroupName") | ForEach-Object { $_.ParentNode.RemoveChild($_) }

    # Add the newly created node to existing XML
    $currentL2Config.bridges.AppendChild($handler1)

    # PUT edge config
    $respons = Invoke-RestMethod -Uri $nsxUri -Headers $Header -Method PUT -ContentType 'application/xml' -Body $currentL2Config
    write-host "L2 Created: $($virtualwire.objectId) - acl: $aclname" -ForegroundColor Green
}