Storage DRS recommendations – with PowerCLI

Many things can happen when you let Storage DRS run fully automated. If you have it on from the beginning it will probably only give you good things. But enabling it on a large storage space imbalanced cluster might be a bit too risky.

Many things that Storage DRS is not aware of. Like your storage underneath running out of space on pool/aggregate or the operations is too IO heavy to run within business hours.

Call me a wimp, but in this case, it seems better to be in control and apply the recommendations little by little. But having to use the GUI is a pain, you need to go into Storage Cluster > Monitor > Storage DRS > Recommendations. And from here you need to override the selections and uncheck the boxes so you can run smaller batches of Storage vMotions.

I will just use VMware PowerCLI cmdlets…

Well, unfortunately not all of vSphere API is exposed through PowerCLI cmdlets, but after a bit of googling it seemed quite easy to call the SDK API directly from within PowerShell

One post that came to my attention where containing most of the code needed.

Solution:

I’m not that much into what the ServiceInstance or StorageRessoruceManager is. But I expect it to be the API instantiated by PowerShell where you then have each operation from where you can find the functionality that you are looking for.

 # DSC you want to work with
$dscName = 'DatastoreCluster'

# Get DSC info
$dsc = Get-View -ViewType StoragePod -Filter @{'Name'=$dscName}

# Get Service Intance
$si = Get-View ServiceInstance

# Get the StorageResourceManager
$storMgr = Get-View -Id $si.Content.StorageResourceManager

# Refresh SDRS Recommendation on DSC
$storMgr.RefreshStorageDrsRecommendation($dsc.MoRef)

# Update dsc object with fresh recommendation data
$dsc.UpdateViewData()

# Filter on reason for storage balance. Select only 40 VMs.
$balance = $dsc.PodStorageDrsEntry.Recommendation | Where-Object {$_.Reason -eq "balanceDatastoreSpaceUsage"}  | Select-Object -First 40

# Do a run of each VM and start the storage vMotion process
foreach($vm in $balance){
   $message = "Moving VM: {0} to datastore: {1}" -f $(get-vm -id $("VirtualMachine-"+$($vm.Action[0].Target.Value))).name, $(get-datastore -id $vm.Action[0].Destination).name
   write-host $message -ForegroundColor Green
   $storMgr.ApplyStorageDrsRecommendationToPod($dsc.MoRef,$vm.Key)
} 

Conclusion:

I was expecting to use some PowerCLI cmdlets to make my granular balance of the storage cluster. Unfortunately, that did not exist.

But from the great community, I found how to use the vSphere API through PowerShell and in the end got the functionalty I was looking for.

Maybe there is an easier way to do the same, if so, let me know. Until next time I have a bit of vSphere SDK googling to do.

Massdelete files

Not sure what happened, but thousands of small files piled up in the path below.

C:\ProgramData\Microsoft\Crypto\RSA\S-1-5-8

My google-fu wasn’t sufficient enough to find out what files in that folder actually did. An article from Kofax said that just deleting files could be troublesome. But also found another article that stated that he deleted all files older than 30 days and hasn’t had problems yet. So I dare to do the same. The command was found on a blog, so all credit goes to this guy.

forfiles /D -30 /C "cmd /C attrib -s @file & echo @file & del @file"

forfiles is a very nice command that iterates through the files in a folder according to its parameters. /D -30 iterates through all files more than 10 days old. attrib -s takes off the System attribute, which is needed for DEL to work. The echo is there so you can see that it is doing its job.

The PowerShell equivalent would be:

Get-ChildItem -Path C:\programdata\Microsoft\Crypto\rsa\S-1-5-18 -Include *.* -Recurse | foreach { $_.Delete()}

Extend disk with LVM

Here is a quick walkthrough showing you how to expand an LVM volume or partition in Linux by first resizing logical volume followed by resizing the file system to take advantage of the additional space.

Note: In this example, we are working in Ubuntu, some commands may differ in different Linux distributions.

DISCLAIMER: Make sure that you have proper backups in place before starting out with the resize procedure on your VMs! Create any backups necessary to ensure that if something goes wrong you can always go back to a previous working state. Losing any information or wiping out your disks is all your fault if this happens. Using this procedure is all at your own responsibility.

If you are unsure about LVM and its components I will suggest you read a bit upon it. There are tons of articles, for example on Digital Ocean.

Process

This process can be easy to do with LVM as it can be done on the fly with no downtime needed, you can perform it on a mounted volume without interruption. In order to increase the size of a logical volume, the volume group that it is in must-have free space available. It goes as follows.

  • Add more space from the hardware level. Either raid controller, or hypervisor.
  • Resize your partition to contain the extra space
  • Resize the PV in LVM
  • Expand the LV in LVM
  • Resize filesystem

To view the free space of your volume group, run pgdisplay command as shown below and look at the “Free PE / Size” field, in this case non-free.

root@www:~# pvdisplay  
--- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               server1-vg
  PV Size               99.00 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              25343
  Free PE               0
  Allocated PE          25343
  PV UUID               cgPMYF-PkeW-1iaS-FxUZ-Ky9r-Zoa8-ktpEk5

Since we have added more space from the hardware level need to grow the partition. This is done by deleting it and create a new one that is starting from the same sectors but where the new partition uses more sectors than before. In the output beneath we can see how to extend the partition.

root@www:~# fdisk -l /dev/sdb
Disk /dev/sdb: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb97da1a0

Device     Boot Start       End   Sectors Size Id Type
/dev/sdb1        2048 207620095 207618048  99G 83 Linux
root@www:~# fdisk /dev/sdb

Welcome to fdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): d
Selected partition 1
Partition 1 has been deleted.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-209715199, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-209715199, default 209715199): +99.9G

Created a new partition 1 of type 'Linux' and of size 99.9 GiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Device or resource busy

The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).

root@www:~# partprobe

The disk partition is now extended. We need to inform LVM to grow its PV

root@www:~# pvresize /dev/sdb1
  Physical volume "/dev/sdb1" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

Now we can verify that there is available space to grow LV.

root@www:~# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sdb1
  VG Name               server1-vg
  PV Size               99.88 GiB / not usable 3.00 MiB
  Allocatable           yes
  PE Size               4.00 MiB
  Total PE              25568
  Free PE               225
  Allocated PE          25343
  PV UUID               cgPMYF-PkeW-1iaS-FxUZ-Ky9r-Zoa8-ktpEk5

Lastly, we grow the LV and extend the filesystem.

root@www:~# lvextend -l +100%FREE /dev/mapper/server1--vg-root
  Size of logical volume server1-vg/root changed from 196.51 GiB (50307 extents) to 197.39 GiB (50532 extents).
  Logical volume root successfully resized.

root@www:~# resize2fs /dev/mapper/server1--vg-root
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/mapper/server1--vg-root is mounted on /; on-line resizing required
old_desc_blocks = 13, new_desc_blocks = 13
The filesystem on /dev/mapper/server1--vg-root is now 51744768 (4k) blocks long.

Conclusion

We have now successfully expanded a file system and corresponding LVM logical volume without any downtime. This was done by first expanding the partition of the disk, then the logical volume and finally performing an online resize of the file system.

No servers were harmed doing this procedure 🙂

macOS TFTP server

I have never really found a good TFTP for macOS. Is it funny that macOS is much used by network people but there isn’t a decent TFTP server?

Well, there is. macOS has it built-in, no GUI though. But that’s also fine, as long as you know to use it. It’s disabled by default, but you can start and stop it with the following commands.

### Start TFTP
sudo launchctl load -F /System/Library/LaunchDaemons/tftp.plist

### Stop TFTP
sudo launchctl unload -F /System/Library/LaunchDaemons/tftp.plist

### Check if its running (no process means it not running)
netstat -atp UDP | grep tftp

The TFTP daemon uses the /private/tftpboot folder so we are going to copy the file there. Then set the correct permissions on the file.

### Copy file to tftp folder
cp FILENAME /private/tftpboot
### Set permissions for the folder and files within
chmod -R 766 /private/tftpboot

There is a gotcha with the TFTP daemon, which is you cant copy a file to the TFTP daemon if that file does not already exist there.  To work around it you can just create a file and set the permission for it. Then your devices will just send data into the pre-created file.

### Create the file
touch /private/tftpboot/FILENAME
### Set permissions
chmod -R 766 /private/tftpboot

Change Veeam job repository

Having to move jobs to another repository is something that can be time-consuming. If you have to do it with the existing job you will need to disable the job, move the data to the new location, point the job to the new location and then enable the job again.

I found the other approach to creating new jobs easier. You can do it in the GUI, but when having 200+ jobs it could take some time. Instead, I did a small script to list all jobs located on the old repo.

The script also looks a the retention point and writes into the old job that is can be deleted after x days. If you have very long retention then this way is not feasible and you will probably have to move data and point exing job to the new location.

Hope somebody else can use it 🙂

Add-PSSnapin -Name VeeamPSSnapIn
Connect-VBRServer -Server <VBRSERVER>

$Jobs = Get-VBRJob | where {$_.IsScheduleEnabled -eq $true} | where {$_.FindTargetRepository().name -eq "dc2sveeamrepo01-scaleout"} | where {$_.IsRunning -eq $false}

# Select first 15 jobs and list them afterwards.
$remainingJobs = $Jobs | select -first 40
$remainingJobs | select name

foreach ($job in $remainingJobs)
{
$oldjob = "$($job.Name)delete after $($job.BackupStorageOptions.RetainCycles) days"
$job.Info.CommonInfo.name = "$oldjob"
$Job.update()
Disable-VBRJob -Job $oldjob

$jobName = "$($job.Name.Split("_")[0])"

Copy-VBRJob -job $oldjob -name $jobName  -Repository "dc2sveeamrepo02-scaleout" 

Enable-VBRJob -job $jobName

sleep 5

start-vbrjob -Job $jobName -RunAsync 

write-host "$jobName : have been cloned and is now started...." -ForegroundColor Green
}
 

NSX API – DLR L2 bridging

Here is a script for mass DLR L2 bridge creation. I had to bridge a couple of hundred VLAN to VXLAN, and while it was maybe faster to create it by hand I would not have learned anything.

The script is reading from a CSV file where I have all my info. Then loops through the entries and create a distributed port group and then initiates an L2 bridge. The VXLAN had been created post to this operation.

$csv = Import-Csv "D:\temp\VLAN.csv" -Delimiter ";"
Import-Module PowerNSX
get-module -name vmware* -ListAvailable | Import-Module

$cred = get-credential
connect-viserver -server -Credential $cred

foreach ($net in $csv) {
    $vdportgroup = ("zitmit-$($net.acl)").ToLower()

    $exists = Get-VDSwitch -Name "DSMpls01-EX" | Get-VDPortgroup -Name $vdportgroup -ErrorAction SilentlyContinue
    if (!$exists) {
        Get-VDSwitch -Name "DSMpls01-EX" | New-VDPortgroup -Name $vdportgroup -VLanId $net.mitvlan -NumPorts 2
        $created = Get-VDSwitch -Name "DSMpls01-EX" | Get-VDPortgroup -Name "zitmit-acl-10344"
        if (!created) {
            Write-Host -ForegroundColor Green "Portgroup created: $vdportgroup"

            $vdportgroupId = ($created.Id).Replace("DistributedVirtualPortgroup-","")
            $vdportgrpupName = $created.Name

            create-nsxl2bridge -aclname $($net.acl) -dvportGroup $($created.key)
        }
    }
    else {
        Write-Host -ForegroundColor Yellow "Portgroup have allready been created: $vdportgroup"
        #Get-VDSwitch -Name "DSMpls01-EX" | New-VDPortgroup -Name $vdportgroup -VLanId $net.mitvlan -NumPorts 2
    }
}

Function create-nsxl2bridge {
    param(
        [string]$aclname,
        [string]$dvportGroup
    )

    # Login info
    $nsxUsername = 
    $nsxPassword = 

    # Allow all SSL protocols
    $AllProtocols = [System.Net.SecurityProtocolType]'Ssl3,Tls,Tls11,Tls12' 
    [System.Net.ServicePointManager]::SecurityProtocol = $AllProtocols

    # Connect to NSX manager
    $connection = Connect-NsxServer  10.1.70.5 -Username $nsxUsername -Password $nsxPassword -WarningAction SilentlyContinue
    $virtualwire = Get-NsxLogicalSwitch | Where-Object { $_.name -match "$aclname" -and $_.name -notmatch "lan" }

    if ($virtualwire.count -gt 1) {
        $message = "Something could wrong - $aclname"
        write-host $message -ForegroundColor yellow
        $message | Out-File C:\log\create-nsxl2bridge.txt -Append
        $virtualwire = $virtualwire[0]
    }
    elseif (!$virtualwire) {
        $message = "virtualwire was not found: $($virtualwire.objectId) - acl: $aclname"
        write-host $message -ForegroundColor yellow
        $message | Out-File C:\log\create-nsxl2bridge.txt -Append
        return
    }

    # Edge info
    $edgeId = "edge-1120"
    $Type = "Accept: application/xml"
    $Header = @{"Authorization" = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($nsxUsername + ":" + $nsxPassword)) }
    $nsxUri = "https://10.1.0.4/api/4.0/edges/$edgeId/bridging/config"

    # Getting edge config
    $currentL2Config = $null
    $currentL2Config = Invoke-RestMethod -Uri $nsxUri -Headers $Header -Method GET -ContentType $Type

    # Check if already there
    foreach ($z in $currentL2Config.SelectNodes("//name"))
    {
        if ($z.'#text' -match $aclname ) {
            write-host "Already exists: $aclname" -ForegroundColor yellow
            return
        }
    }

    # Add extra xml node to currentconfig
    $handler1 = $null
    $handler1 = $currentL2Config.CreateNode('element', "bridge", '')
    $attr = $currentL2Config.CreateNode('element', "bridgeId", '')
    $attr.InnerText = "$null";
    $handler1.AppendChild($attr)
    $attr = $currentL2Config.CreateNode('element', "name", '')
    $attr.InnerText = "$aclname";
    $handler1.AppendChild($attr)
    $attr = $currentL2Config.CreateNode('element', "virtualWire", '')
    $attr.InnerText = "$($virtualwire.objectId)";
    $handler1.AppendChild($attr)
    $attr = $currentL2Config.CreateNode('element', "dvportGroup", '')
    $attr.InnerText = "$dvportGroup";
    $handler1.AppendChild($attr)
    
    # Remove nodes from existing XML
    $currentL2Config.SelectNodes("//virtualWireName") | ForEach-Object { $_.ParentNode.RemoveChild($_) }
    $currentL2Config.SelectNodes("//isSharedNetwork") | ForEach-Object { $_.ParentNode.RemoveChild($_) }
    $currentL2Config.SelectNodes("//dvportGroupName") | ForEach-Object { $_.ParentNode.RemoveChild($_) }

    # Add the newly created node to existing XML
    $currentL2Config.bridges.AppendChild($handler1)

    # PUT edge config
    $respons = Invoke-RestMethod -Uri $nsxUri -Headers $Header -Method PUT -ContentType 'application/xml' -Body $currentL2Config
    write-host "L2 Created: $($virtualwire.objectId) - acl: $aclname" -ForegroundColor Green
}

Veeam repository with local account

For the matter of security, I consider it a good idea to isolate the Veeam repository server from Active Directory. So that a compromised domain admin account or other can not gain access to the repository.

But when wanting to do add the repository to the VBR its failing and saying “Access Denied”.

Alright, a bit of googling and found a short and precise article from another guy having solved this problem.

What was the solution?

Open regedit on the repository server and navigate to following

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System

Here you add a DWORD with the name of “LocalAccountTokenFilterPolicy” and value of “1”. This fixes the problem and without rebooting.

### The PowerShell way
if((Test-Path 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System')){New-ItemProperty -Path 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System' -Name 'LocalAccountTokenFilterPolicy' -Value '1' -PropertyType DWORD} 

Now you can add the repository server to the VBR. I always forgot where to find the info for the reg hack, so now it’s here so future Jesper can find it 🙂

APC 7920 PDU Console

Short post – needed to reset password on a APC PDU. Only way to do this was getting a console cable, of cause this cable is a proprietary cable of APC and not within my reach when you need it.

So found the pinout and did a little DIY. Pinout is as in the table

APC RJ12 PinDB9 Pin
1 Not used1
2(GND) Yellow5 (GND)
3 Green2 (RX)
4 Red3 (TX)
5 (GND) White5 (GND)
6 Not used1

RJ12 pluged into PDU and connected to jump wires. White and yellow is GND. Green
DB9 male console cable. White is GND pin5. Gray is TX pin3. Black is RX pin2.

Press and hold the reset button of the APC PDU. Wait for Orange LED to blink and press reset again. Check console for login each sec. When it respons you have 30 sec to login with apc/apc and reset the password.

Killing a virtual machine the esxcli way

Rarely I run into a ghost VM. I can’t do anything with the VM from vCenter or local UI for the ESXi host. It looks like it’s powered off but in fact, it’s still running in a sort of ghost state. You can vMotion all VMs of and reboot that hosts that the ghost VM is on. Sometimes its standalone hosts and then killing the VMs world with esxcli is easier.

  1. Connect to the ESXi host with ssh
  2. Get a list of all running VM worlds
esxcli vm process list

3. Identify the world from the output and take note of the World ID. From here we will kill the world. Start with type soft and then escalate it if it doesn’t work.

esxcli vm process kill --type=[soft/hard/force] -–world-id=ID

VM should now be killed, the VMX files are unlocked and you can manage the VM with the GUI tools again. If it didn’t work then you are left with the option to reboot the host containing the ghost VM.

NSX 6.3.6 to 6.4.5 – Controller problem encountered

NSX upgrades can be a delicate thing to upgrade, even though everything is in its finest shape.

After we successfully have upgrade the NSX managers we proceeded with upgrading of the NSX Controllers. We did pre-check and issued command “show control-cluster status” and it looked fine, upgrade to 6.4.6 went well and we could vMotion VMs around after the controller was booted. But post-checks was not ok, the “show control-cluster status” did not return as expected and we where not confident to proceed with the host upgrades.

After some trouble shooting we found that the /var/log partition on 2/3 of the controllers where full. Without any other evidence we concluded that this was the problem. After some google-fu we didn’t find any KB or blogs on how to purge logs.

But we found out that we could get into a engeering mode that would give us shell access. Long store short, we did the following:

1. https://kb.vmware.com/s/article/2149630 to gain shell access on manager
1.1 password is IAmOnThePhoneWithTechSupport
2. Extracting root passwords for controllers with /home/secureall/secureall/sem/WEB-INF/classes/GetNvpApiPassword.sh controller-nn
3. Loged into each controller, and issued : debug os-shell and thereby gain root shell access.
4. Deleted /var/log/syslog.1 on each node.
5. Rolling restart of controllers and after they booted they all joined the cluster.

 

After this we got the status as we wanted. In the mean while we had create a case with VMware support and the supporter was on a remote session with us. We told him what we have done, we verified that the controlleres was health and they where.

Next step, VIB upgrade on the hosts.

Good commands to know:

show process monitor
show controller list all
show control-cluster status

Edit: This article from VMware have the exact problem we encountered. We also contacted VMware Support, but before they where able to assist us we had the problem solved. 🙂
https://kb.vmware.com/s/article/59509

Process of getting the root password for controllers.

Tomcat9 and java12 on FreeBSD

Quick post, had to make Tomcat9 and Java12 work together. The procedure is as follows:

1. pkg install tomcat9 (it will also install java8)
2. pkg install openjdk12

Now and edit /etc/rc.conf with a parameter to start tomcat on boot and set the tomcat java_home.

tomcat9_enable="YES"
tomcat9_java_home="/usr/local/openjdk12"

And not to part that took me a long time to figure out. In Java12, there is no longer a feature that tomcat is using in its startup parameters. But if you remove that from the init script you are able to start it up. The line is: Djava.endorsed.dirs=’/usr/local/apache-tomcat-8.0/endorsed’ \

command="/usr/local/bin/jsvc"
command_args="-java-home '${_tomcat_java_home}' \
        -server \
        -user ${_tomcat_catalina_user} \
        -umask ${_tomcat_umask} \
        -pidfile '${pidfile}' \
        -wait ${_tomcat_wait} \
        -outfile '${_tomcat_stdout}' \
        -errfile '${_tomcat_stderr}' \
        -classpath '${_tomcat_catalina_home}/bin/bootstrap.jar:/usr/local/share/java/classes/commons-daemon.jar:$
{_tomcat_catalina_home}/bin/tomcat-juli.jar${_tomcat_classpath}' \
        -Djava.util.logging.manager=${_tomcat_logging_manager} \
        -Djava.util.logging.config.file='${_tomcat_logging_config}' \
        ${_tomcat_java_opts} \
        -Djava.endorsed.dirs='/usr/local/apache-tomcat-8.0/endorsed' \<<<<<<<<<<<< Remove this line!!!
        -Djava.endorsed.dirs='${_tomcat_catalina_home}/endorsed' \
        -Dcatalina.home='${_tomcat_catalina_home}' \
        -Dcatalina.base='${_tomcat_catalina_base}' \
        -Djava.io.tmpdir='${_tomcat_catalina_tmpdir}' \
        org.apache.catalina.startup.Bootstrap \
        ${_tomcat_pipe_cmd}"

run_rc_command "$1"

After this, you are now able to boot tomcat9 with java12 🙂

Getting all domains from Office365 tenants

Mail spoffing etc. is a big problem, there are technologies that can help, but many domain owners have not yet implemented them. To help our customers we have started to monitor and see if the SPF, DKIM and DMARC policies have been implementened, and if not we can help 🙂

Our own spamfilter solution have a button that gives you an export over all the domains, nice and easy, but Office 365 CSP portal doesnt.

So there is a quick script to help with that. Next post will hopefully contain the checkscript for if the domain have implemented SPF, DKIM or DMARC.

$tenantIds = Get-MsolPartnerContract -All | Select-Object TenantId

foreach ($tenantid in $tenantIds)
{
    $domains = Get-MsolDomain -TenantId $tenantid.TenantId
    $customer = Get-MsolCompanyInformation -TenantId $tenantId.TenantId


    foreach ($domain in $domains)
    {
        if($domain.Name -match 'microsoft')
            {
            }
             else {

                $data = @(
                    [pscustomobject]@{Domain=$domain.Name;Customer=$customer.DisplayName}
                )
                $data | Export-Csv -Path C:\temp\domainsInO365.csv -Append
             }
    }

}

NSX Edge PowerShell manipulation

This is from a VMware support experience. A customer could not change DNS server parameters of the NSX Edge IP Pool. But actually is was a problem due to a bug in VCD 9.5, where a Edge XML config was missing some tags and therefor not being able to validate the XML when VCD post the edited XML config back to NSX manager.

I have attached VMware support answer in the bottom of the post.

Script will get all edges from the NSX manager, then you find the correct one and fill into the next part of the script. Then you get the XML down to a file on your local machine, you then edit the file and put in the missing tags and lastly PUT the XML backup NSX manager. After this operation, it works from the GUI again.

# Import credential module and login information
$ReturnObj = import-credentials vmwareSSO
$nsxUsername = $ReturnObj.Username
$nsxPassword = $ReturnObj.Password

# Other variables
$tempFile = "C:\temp\edge-747_jvr.xml"

# Allow all SSL protocols
$AllProtocols = [System.Net.SecurityProtocolType]'Ssl3,Tls,Tls11,Tls12' 
[System.Net.ServicePointManager]::SecurityProtocol = $AllProtocols

Add-Type @"
    using System;
    using System.Net;
    using System.Net.Security;
    using System.Security.Cryptography.X509Certificates;
    public class ServerCertificateValidationCallback
    {
        public static void Ignore()
        {
            ServicePointManager.ServerCertificateValidationCallback += 
                delegate
                (
                    Object obj, 
                    X509Certificate certificate, 
                    X509Chain chain, 
                    SslPolicyErrors errors
                )
                {
                    return true;
                };
        }
    }
"@


[ServerCertificateValidationCallback]::Ignore();

# Getting all edges
$Type = "Accept: application/xml"
$Header = @{"Authorization" = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($nsxUsername + ":" + $nsxPassword))}
$nsxUri = "https://10.1.10.4/api/4.0/edges"

[xml]$edges = (Invoke-WebRequest -Uri $nsxUri -Headers $Header -Method GET -ContentType $Type).Content
foreach ($edge in $edges.pagedEdgeList.edgePage.edgeSummary)
{
    $edgeInfo = "name: {0} - ID: {1}" -f $edge.name, $edge.objectId
    $edgeInfo
}

# Getting specefic edge config
$edgeId = "edge-747"
$Type = "Accept: application/xml"
$Header = @{"Authorization" = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($nsxUsername + ":" + $nsxPassword))}
$nsxUri = "https://10.1.10.4/api/4.0/edges/$edgeId"

(Invoke-WebRequest -Uri $nsxUri -Headers $Header -Method GET -ContentType $Type).Content | out-file $tempFile

# PUT edge config after edit
$Type = 'application/xml'
$Header = @{"Authorization" = "Basic " + [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes($nsxUsername + ":" + $nsxPassword))}
$nsxUri = "https://10.1.10.4/api/4.0/edges/$edgeId"
$edgeConfigAltered = Get-Content $tempFile

$respons = Invoke-WebRequest -Uri $nsxUri -Headers $Header -Method Put -ContentType 'application/xml' -Body $edgeConfigAltered
# Statuscode 204 is accepted
$respons.StatusCode

From Support:
– The issue you are seeing is a known issue 9.5.
– Like I mentioned in the previous email, this is due to missing elements from the xml.
– From the xml in the logs, I could see there are 52 NAT rules on that edge.Correct me if I am wrong. The following 2 rules had the elements missing

<natRule>
    <ruleId>196726</ruleId>
    <ruleType>user</ruleType>
    <action>dnat</action>
    <vnic>0</vnic>
    <originalAddress>IP</originalAddress>
    <translatedAddress>IP</translatedAddress>
    <dnatMatchSourceAddress>any</dnatMatchSourceAddress>
    <loggingEnabled>false</loggingEnabled>
    <enabled>true</enabled>
    <description>RULE</description>
    <protocol>tcp</protocol>
    <originalPort>3417</originalPort>
    <translatedPort>3478</translatedPort>
    <dnatMatchSourcePort>any</dnatMatchSourcePort>
</natRule>
<natRule>
    <ruleId>196727</ruleId>
    <ruleType>user</ruleType>
    <action>dnat</action>
    <vnic>0</vnic>
    <originalAddress>IP</originalAddress>
    <translatedAddress>IP</translatedAddress>
    <dnatMatchSourceAddress>any</dnatMatchSourceAddress>
    <loggingEnabled>false</loggingEnabled>
    <enabled>true</enabled>
    <description>RULE</description>
    <protocol>tcp</protocol>
    <originalPort>3416</originalPort>
    <translatedPort>3234</translatedPort>
    <dnatMatchSourcePort>any</dnatMatchSourcePort>
</natRule>

I have attached the file with the list of all the NAT rules seen from the logs if you need to cross-verify.

Plan:
– To fix the issue,please follow https://kb.vmware.com/s/article/67193

If you have any further questions,let me know.

Have a good evening,

Best regards,

Deepthy

Make a clone of VMs to NAS – The PowerCLI way

Quick post, had a customer that yearly wants a clone of their VMs, copied to a NAS, and then shipped to customers HQ. The owner of the company put this as a requirement. Fair enough. I have almost always done the clone of the VMs by GUI, in the start, this was easy because they only had 5 servers, but they now have more. So this time I wanted to try and script it instead. It took me some extra time, but in the end, I think it’s worth it. My PowerShell skills are not great, still learning so bear with me.

# Variables
$vcenter = "<IP or hostname>"
$cluster = "<name of cluster that contains the servers>"
$nfsIP = "1.2.3.4"
$nfsMount = "/nfs"

# Getting VMware PowerShell Modules
Get-Module -Name vmware* -ListAvailable | Import-Module

# Connect to vcenter
Connect-VIServer -Server $vcenter -User <username>

# Mount NAS 
get-cluster $cluster | get-vmhost | new-datastore -nfs -name NAS -path $nfsMount -nfshost $nfsIP

# More Variables
$ds = get-datastore NAS
$tempHost = "esx74.domain.tld"
$vms = Get-VM -Name customer*

# Copy all 
foreach ($vm in $vms)
{
    new-vm -name "$($vm.name)-clone" -VM $vm -Datastore $ds -vmhost $tempHost
    Remove-VM -VM "$($vm.name)-clone" -DeleteFromDisk:$false -Confirm:$false -RunAsync
}

# Remove datastore from hosts again
Get-Cluster $cluster | Get-VMHost | Remove-Datastore -Datastore $ds

FreeNAS installation failed – Operation not permitted

Ran into an annoying problem, have been having this problem multiple times in the past, and never remember what the fix is. So now it’s on the blog for the next time that I need it. Use the shell option in the FreeNAS installer, the disks that I wanted to install onto was ada0 and ada1.

1. sysctl kern.geom.debugflags=0x10
2. dd if=/dev/zero of=/dev/ada0 bs=512 count=1 && dd if=/dev/zero of=/dev/ada1 bs=512 count=1

This will wipe out the sectors that keep the partition schema and afterward, you can install FreeNAS without problems.

ESXi – physical memory population

Had en interesting problem where a ESXi host only showed it had 30GB of memory, but the motherboard was populated with 6*8GB modules. In earlier versions of ESXi 5.5< it was possible to use dmidecode to show how the physical hardware was populated. But since 6.0> that have been removed.

The new command to find those kind of information are now “smbiosDump”

smbiosDump | grep -A 6 'Memory Device'

You can also just run smbiosDump without any paramenters and you get a hole lot of information to crawl through.

Timedrift – domain controller and clients

Customers domain controllers where both virtual and due to CPU congestion it seems that time had been drifting.

So it was 5 minutes behind, and so was the clients. The fix is as follows.

1. Find the DC that have the PDC role.

netdom query fsmo

2. Issue the follwing command to sync the time with some of the pool.ntp.org servers.

w32tm /config /syncfromflags:manual /manualpeerlist:"0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org /reliable:yes /update "

3. After the time on the PDC again is correct, then issue following on the other domain controllers, that are not PDC.

w32tm /config /syncfromflags:domhier /update

4. let the clients resync there time, either wait for it to happen or issue the following

w32tm /resync

Freebsd – Going from stable to release

I outdated FreeBSD 10.1-Stable server needed to be updated for it to install packages again. Problem was, it was deployed from stable, i normally never use stable because it not production ready, its a development branche. But this server was stable and here are the steps to get it to a release train.

1. Update the source tree on the server.

cd /usr/src && rm -rf * && svnlite switch https://svn.freebsd.org/base/releng/10.4 /usr/src/

2. Follow the link below. Only change i did was to “-j 6” when making so that it used multiple cores.
https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html

After the merge-ui command you just choose “later” for all the promt with merge of config files.

And volia, The server is updated to FreeBSD 10.4-release. Then you can update with the binary process, freebsd-update upgrade -r 11.2-RELEASE.

In case you need to know more about the stable vs. release, here is a link i found en the FreeBSD forum.
http://srobb.net/release.html

P2V of a FreeBSD

Was looking for someone that had done a P2V of FreeBSD, but me google skills was not good enough. So here it goes.

1. Make a VM with disk a bit larger then the source phycial machine.
2. Boot the VM on a FreeBSD live cd
3. Give it a IP address

ifconfig vmx0 inet 10.0.2.59 netmask 255.192.0.0

4. Make NC listen for any input on port xxxx ( )

nc -l 6666 | dd bs=16M of=/dev/da0

5 on the physical maskine you run DD and pipe it to NC.

dd bs=16M if=/dev/ad0 | nc 10.0.2.59 6666

6. wait for it to finish….

Ceph – slow recovery speed

Onsite at customer they had a 36bays OSD node down in there 500TB cluster build with 4TB HDDs. When it came back online the Ceph cluster started to recover from it and rebalance the cluster.

Problem was, it was dead slow. 78Mb/s is not much when you have a 500TB Cluster. So what to do?

There a several settings within Ceph you can adjust. Here are the two settings that worked for me.

osd max backfills:
Description: The maximum number of backfills allowed to or from a single OSD.
Default value: 1

I set it to 8, and the recovery went to 350Mb/s. Set it to 16 and recovery was 700Mb/s, but clients where also affected. So 8 was a more moderat setting.

ceph tell 'osd.*' injectargs '--osd-max-backfills 16'

osd recovery max active

Description: The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster.
Default value: 3

Set it up a notch to 4.

ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'