Home » Archives for Mischa Buijs

Author: Mischa Buijs

Synology DS1618+ Memory Expansion

After replacing the Synology DS1515+ with a Synology DS1618+ last year it was time for another investment in the Synology DS1618+. Overall it is a great device that is running my VMware iSCSI storage for my ESXi Hosts but based on some metrics the memory was experiencing some issues, so it was time for a memory expansion!

The reason why I am expanding my memory is physical memory swap usage. Based on my monitoring tooling the system is swapping to disk and when that happens the storage latency is increasing extensively on the iSCSI volumes (2500 ms / 2.5 seconds latency dips). The hypervisor and virtual machine survive but they don’t experience it as a good thing ;).

After a good session with a Synology Engineer on VMworld 2019 Europe, he explained that the storage latency I am experiencing multiple times a day must be caused by the swapping to disk and refreshing the read-cache in the physical memory. Synology is using physical memory as a read cache to boost performance by default.

Synology Statement

Here is the official statement from Synology surrounding performance and memory: “Memory usage remains high because the system stores frequently accessed data in the cache, so the data can be quickly obtained without accessing the hard disk. Cache memory will be released when the overall memory is insufficient. High swap space usage indicates insufficient system memory, and will also affect the system performance. You can view the rate of swap in and swap out by choosing Swap from the drop-down menu on the top.”

To clarify, my Synology DS1618+ is only running iSCSI storage with two volumes with both SSD drives in RAID1. The only services that are enabled are SSH, SNMP and off course iSCSI. The machine has no other purpose!



Memory Swap

The metric here is showing the usage of memory swap, the value 100% means it is completely empty, so no swap usage. The value 0% means that all swap is allocated/completely full.

As you can see in the graph there was always some swap activity going on in the last months around 92%. On 03-09-2020 / 03-10-2020 I installed the 32 GB DDR4 memory in the system and it is a steady 100% (so no swap in use).

Memory Expansion

I bought the following kit from Crucial 32GB Kit (2 x 16GB) DDR4-3200 SODIMM with the following part nr “CT2K16G4SFD832A“. This is the suited memory for the DS1618+ as you can verify on the Crucial website. The memory configurator tool can be found over here: Crucial Advisor Tool.

Luckily expanding the memory in a Synology DS1618+ is quite easy! I created a brief write-up and some photos are located below.

Procedure:

  1. Poweroff the VMware workload.
  2. Poweroff the Synology NAS.
  3. Remove the DS1618+ from his rack/shelf.
  4. Flip the device, the memory hatch is located on the bottom.
  5. Remove the two screws.
  6. Open the hatch.
  7. Remove the original memory.
  8. Install the new memory.
  9. Close the hatch.
  10. Install the two screws.
  11. Install in the rack.
  12. Power on the system (the first time booting will be longer than normal. The DS1618+ is performing a memory check, in my case, it took about 15 minutes).
  13. Power on workload.

Source

Here is the some additional links surrounding the memory expansion:

Share this:

ProLiant ML10 v2 CPU Swap

In this blog post, I am talking about the HPE ProLiant ML10 v2 home lab servers that I have been using for the last three years. I had some performance issues related to the processor with the number of virtual machines and containers running on the little ML10 v2 servers. So it was time for a CPU Swap!

On the internet, there are a lot of speculations on which CPUs are supported in the HPE ProLiant ML10 v2. So that is why I did this blog post.

The servers were originally bought with Intel® Pentium® Processor G3240 CPUs. This was the smallest CPU available at the time. At first, I was looking at the Intel Xeon E3-1220 v3 CPUs but I decided to buy the Intel® Core™ i3-4170 Processor on Ebay.com for a couple of bucks. The choice was related to the pricing difference and the amount of power usage.

I can confirm that both HPE ProLiant ML10 v2 servers detected the i3-4170 CPUs without any issues. The systems are running 24×7 and the CPU temperature is around fourth to fifty degrees with the fans running on their lowest operating mode.

Comparison

As you already figured out the G3240 is a slow CPU compared to the i3-4170. So it was a well worth invested upgrade it for about 40 euro’s for both CPUs in total.

The hypervisor (VMware ESXi) and workload performance improved drastically. Because of the additional instruction sets like AES-NI and clock speed. So it was a good investment at least in my opinion.

Here is a comparison provided by the Intel ARK website. Click here for the link.



Screenshot(s)

Here are some screenshots of one of the HPE ML10 v2 server that was upgraded with the new CPU. As you can see the screenshots are from the HPE Integrated Lights-out or in short (iLO). The first screenshot is of the new CPU that is detected, the second one is the memory configuration and the third screenshot is the operating temperatures after running a couple of days with the workload.

Result

As you can see the Intel i3-4170 CPU is working without any issues in the ML10 v2 server. Currently, they have been running for about 100 days without any reboot. So I can confirm they are stable and do not overheat! The CPU swap is successful!

Notes:

  • I use stock cooling.
  • I do not use a modified BIOS.

Thanks for reading and see you next time!

Share this:

Dell EMC VxRail NSX-T Considerations

Currently, I have been involved in a Dell EMC VxRail design & deployment with VMware Cloud Foundation on Dell EMC VxRail. There were some noticeable items that you need to consider when using the Dell EMC VxRail as your hardware layer in combination with VMware NSX-T as a network overlay. So it was time to write down the items that I have learned so far surrounding the VxRail NSX-T considerations.

This blog post is focused on the NSX design considerations that are related to the physical level when using the Dell EMC VxRail hardware.

At first, I am going to talk about VMware NSX-V because a lot of customers are already running Dell EMC VxRail in combination with NSX-V and need to move to NSX-T in some time.

VMware NSX-V

In case you are already using Dell EMC VxRail with VMware NSX-V. Your physical NIC configuration would in most cases look like one of the following:

  • Scenario 01: Dual port physical NIC – 10 Gbit
  • Scenario 02: Dual port physical NIC – 25 Gbit

The default configuration that I see in the field at this moment is based on a single dual-port card with either 10 Gbit or 25 Gbit. This is for fine for VMware NSX-V but not for his replacement…



VMware NSX-T

When using Dell EMC VxRail with VMware NSX-T you are required to use four physical NICs! This is because of the limitation surrounding the Dell EMC VxRail software that makes a “PowerEdge server” a “VxRail server”.

The first official Dell EMC statement from there VMware Cloud Foundation on VxRail Architecture Guide: “NSX-T based VI WLD will require additional uplinks, whatever uplinks were used to deploy the VxRail vDS cannot be used or the NSX-T N-VDS“.

The second official Dell EMC statement from there VMware Cloud Foundation on VxRail Architecture Guide: “Note: NSX-T will use the next two available vmnics that are both the same speed for every node in the cluster“.

So this leaves us with three scenarios provided by Dell EMC for the VxRail nodes:

  • Scenario 01: Quad-port physical NIC
  • Scenario 02: Quad-port physical NIC (two ports used) with dual-port physical NIC
  • Scenario 03: Dual-port physical NIC with dual-port physical NIC.


Advise

Dell EMC VxRail is the only hardware platform currently on the market that requires four physical NICs to operate with NSX-T. This means you have to make sure your hardware and datacenter are capable of supporting this requirement. You need to make some choices surrounding the physical network cards, network capacity and datacenter rack space.

So let’s start with my list of VxRail NSX-T considerations!

Physical Network Card

When you are at a point of buying the Dell EMC VxRail solution, buy at least a quad-port NIC configuration. Personally, I prefer the double dual-port NIC setup. As shown here below:

I prefer this hardware setup because of the hardware redundancy created by two cards with there separate chips and PCIe slots. This reduces the change of losing all your network connections when a physical NIC dies.

Another recommendation should be to buy physical NICs that support 25 Gbit. It is a minimum price difference and will make the setup more future proof.

Top of Rack (TOR)

As discussed in the last paragraph: when you move to VMware NSX-T you are forced to use four physical NICs in each VxRail node. After installing the card you need to make sure you have enough physical ports in your Top of Rack switches/Leaf switches.

At the customer where I am currently working, they are forced to increase there Top of Rack switches capacity from two ports per server with NSX-V to four ports per server with NSX-T. This meant a full redesign of there datacenter rack topology and network topology. The spine switches were also not able to connect with that amount of leaf switches.

Keep in mind: This is only required of course when you are running a decent amount of servers per rack. In the customer case, they are running 32 VxRail nodes per rack. This means they require at least 128 physical switch ports per rack without uplink ports counted.

Here is an overview of the scenarios as just described, the first is the NSX-V scenario and the second the NSX-T scenario.

Near future

I know that VMware & Dell EMC are currently working on a solution for the VxRail hardware but time will tell. At this point keep your eyes open when moving from NSX-V to NSX-T with Dell EMC VxRail. Customers how are deploying greenfield also need to be aware that they need additional network capacity.

So that wraps up my VxRail NSX-T Considerations blog post. Thanks for reading my blog post and see you next time!

Share this:

VCSA 6.7 Out of Space (SEAT)

Today I was greeted by the following error message when logging into the VMware vCenter Server also known as VCSA: “Could not connect to one or more vCenter Server systems: https://%fqdn%:443/sdk“. So it was time for a quick write-up on how to resolve this issue.

The issues were already present a couple of hours earlier based on monitoring and logging. For example, Veeam Backup & Replication tried to perform a backup but failed because there were no vSphere Tags available. Veeam Backup & Replication generated the following message “Tag Backup SLA – Bronze is unavailable, VMs residing on it will be skipped from processing.“.

I’m running a VMware vCenter Server as in a VCSA 6.7 appliance and it has an embedded Platform Services Controller. The exact version of the appliance was at the moment of the issue “6.7.0.32000 – Build 14070457“.

Could not connect to one or more vCenter Server systems:

At first glance everything looks fine, the web-interfaces are online, authentication is working but after login, the following message appears “Could not connect to one or more vCenter Server systems: https://%fqdn%:443/sdk“. None of the pages are displaying any content. Here is a screenshot:



After performing a simple reboot nothing had happened, the result was the same. So it was time to dig deeper. Luckily the reboot did trigger a new event in the Appliance Management Page (5480). It appeared that the /storage/seat disk had filled up. The alert that popped-up was “File system /storage/seat is low on database storage space. Increase the size of disk /storage/seat or decrease the data retention.” Here is a screenshot:



Increasing Disk Space

After finding the error message it appeared to be an easy fix. Here is an overview of the commands I used. The commands are also usable for expanding one of the other VCSA virtual disks.

Keep in mind: before increasing disk capacity make sure you have a backup or snapshot available.

It this case we are going to expand this /storage/seat volume. The seat volume is responsible for Stats, Events, Alarms, and Tasks (SEAT) for VMware Postgres (Database).

# Step 01: Connect with the vCenter Server with an SSH Session (use for example Putty).

# Step 02: Login with the root account (root/your-password).

# Step 03: Enable the shell
shell

# Step 04: Run the command to verify the current disk space:
df -h

# Step 05: Increase disk capacity with the Host Client because the vCenter Web-interface is not working ;) (see screenshots)

# Step 06: Run the disk expansion command, the expected output should be: VC_CFG_RESULT=0
vpxd_servicecfg storage lvm autogrow

# Step 07: Verify the disk again, the disk should be bigger!
df -h

# Step 08: Reboot the VCSA
reboot

# Step 09: Verify the working of the VCSA Appliance after reboot.

Here is a collection of screenshots of me performing the procedure.

Conclusion

VMware made it easy for the system administrators to identify the issue and quickly expand the virtual disk from the vCenter Appliance. This is a huge improvement compared to the past. The only thing you need to watch out for is the number of virtual disks connected to the VCSA. If you do not watch out you could expand the wrong disk.

The reason that the disk filled up was caused by two things in my case. 1) I created and destroyed lots of virtual machines in the days before the incident. 2) The VCSA is configured as a tiny footprint so that is why the disks are relatively small.

So this was the write-up! If you got any comments or questions please respond in the section below.

Share this:

VMware ESXi Start a Virtual Machine from Shell

In this blog post, we are going to talk about VMware ESXi and the ability to start & stop a virtual machine from the ESXi shell.

Recently, I ran into a situation where vCenter Server was powered-off manually and the ESXi host that was responsible was not able to open the VMware Host Client to start VMware vCenter with a single mouse click. So it was time to figure out how to boot a virtual machine from the VMware ESXi shell.

I was lucky because SSH was enabled on the ESXi Host so I was able to connect and log in with the root account, but then I ran into the issue… Which command do I need to power on a virtual machine? I knew for sure that it was possible but it took me some time to find the right commands.

So based on that experience, it was time for a quick write-up to show you how to boot a virtual machine from the shell with VMware ESXi. To complete the article I also added the commands for powering off.

Note: The environment was running on vSphere 6.7 Update 2. So all commands are valid for vSphere 6.7 and probably older versions of VMware vSphere.



Start a Virtual Machine from Shell

Here is a step-by-step procedure for booting a virtual machine from the VMware ESXi shell.

# Step 01: Connect with SSH (for example Putty).

# Step 02: Login as a user with root privileges

# Step 03a: View ESXi host virtual machine inventory
vim-cmd vmsvc/getallvms

# Step 03b: View ESXi host virtual machine inventory with filter
vim-cmd vmsvc/getallvms | grep %VMname%

# Step 04: Write-down the VMid, in my case:
183

# Step 05: Verify the current power status
vim-cmd vmsvc/power.getstate %VMid%

# Step 06: Power-on virtual machine
vim-cmd vmsvc/power.on %VMid%

# Step 07: Command has been executed the virtual machine will be power-on. To verify you can use:
vim-cmd vmsvc/power.getstate %VMid%

Screenshots

Here are the screenshots of performing a VMware vCenter virtual machine startup from the VMware ESXi shell.


Stop a Virtual Machine from Shell

Here is the procedure for stopping a virtual machine from the VMware ESXi Shell.

# Step 01: Connect with SSH (for example Putty).

# Step 02: Login as a user with root privileges

# Step 03a: View ESXi host virtual machine inventory
vim-cmd vmsvc/getallvms

# Step 03b: View ESXi host virtual machine inventory with filter
vim-cmd vmsvc/getallvms | grep %VMname%

# Step 04: Write-down the VMid, in my case:
183

# Step 05: Verify the current power status
vim-cmd vmsvc/power.getstate %VMid%

# Step 06: Power-on virtual machine
vim-cmd vmsvc/power.off %VMid%

# Step 07: Command has been executed the virtual machine will be power-off. To verify you can use:
vim-cmd vmsvc/power.getstate %VMid%


Conclusion

You can easily perform this procedure if you know the right commands. There is not a lot of new information available about the vim-cmd command but I added the source(s) below.

Share this:

ITQ Transform! 2019

On the twenty-sixth of September 2019, it was time for our yearly event! For the third time in a row under the name ITQ Transform! This year the theme of the event was “Empower your Business”.

The event was located in Fort Voordorp near Utrecht in Holland. A central spot in the country with enough parking space to accommodate all attendees.

For the attendees, there was choice enough to select an appropriate session. The sessions were grouped into three main categories:

  • Accelerate your cloud journey
  • Empower the Digital Workspace
  • Modernizing your Apps

There were also multiple guest speakers how we’re responsible for the opening and closing keynotes:

  • Spencer Pitts – VMware
  • Wouter Sliedrecht – Pivotal
  • Robert Doornbos – F1 driver


TeraSky & ITQ

The biggest announcement that was made on ITQ Transform was about TeraSky & ITQ. TeraSky is a premier VMware Partner that is located in Israel and ITQ is an enterprise VMware partner that is located in the Benelux.

Combining there effort they join forces to help customers around the world. The company will be called SKY-TQ. Here is the announcement video that was shown on ITQ Transform:

ITQ Transform Session

This year I did a customer case session with a VMware Cloud Provider called ViaData. It was a combined effort with Jitse Hijlkema about the first Cloud Provider Pod that was deployed in the Benelux. Because of our experience with the product, it was time to talk about it in the form of a knowledge sharing session.

For those who are interested here is the presentation of our slide deck. The drawback is that the entire event was in Dutch… so we are sorry if you are an English viewer.

View Slide Deck:

SDDC Based on Cloud Provider Pod
Transform! 2019 – Jitse & Mischa – SDDC based on Cloud Provider Pod


ITQ Transform! Showcase

Here are some photos of the ITQ Transform! 2019 event. You can click on the image for the full-color version.

Note: Not all pictures are very clear… somehow my Pixel 3 phone didn’t like the lighting setup.

For those who are interested in the past, I also wrote a blog post about ITQ Transform! 2017.

Share this:

vCloud Director Appliance Keystore Password

Recently I was deploying vCloud Director (vCD) at a customer and I was experiencing some issues surrounding the Java Keystore password that is configured out of the box (OOTB) in the vCD Cell.

Before digging into the topic, let’s talk about the latest vCloud Director releases. Since the release of vCloud Director 9.5, the software is available as a virtual appliance. In the releases before vCloud Director was released as an installable package for on a Linux Distribution. So a lot has changed related to the vCloud Director software package.

Personally, I really like the vCloud Director appliance since it reduces the complexity and the deployment time at the customer. So that is a good thing!



Why Keystore Access

To install an SSL certificate on a vCloud Director Cell you need access to Java Keystore for replacing the default certificate. The default certificate installed is a self-signed certificate, so not usable for a production environment with a cloud management portal available in the outside world.

So to install a new SSL Certificate that has been issued by a third-party certificate authority we need access to the Java Keystore from the vCloud Director Cell.

The problem is there is no password listed in any of the manuals or online documentation. So that is not really ideal…

After a lot of searching and trying we came up with the following solution for vCloud Director 9.5 and vCloud Director 9.7.

vCloud Director 9.5 Keystore Password

To verify the default Java Keystore Password on your vCloud Director 9.5 cell you need to run the following commands:

  1. Open an SSH Session with your vCloud Director Cell with for example Putty.
  2. Login with your root account.
  3. Run the following command:
cat /opt/vmware/etc/isv/firstboot | grep 'keystore-password'

Here are two screenshots of the full code and the output from the command above.

In all the cases I have tested so far the standard password is “akimbi“. Not really the most save solution in my opinion. You can also create a new Java Keystore to work around the problem and to set a selfdefined password.



vCloud Director 9.7 Keystore Password

I also tested the same operation on a vCloud Director 9.7 Cell and the default password has been removed from the firstboot script. If you look closely at the firstboot code the entire script is overhauled and the script now responsible for certificate generation is “generate-certificates.sh“.

The root password configured at deployment is now the default password for the Java Keystore on the appliance. Keep in mind: When changing the root password the old password is still valid for the Java Keystore but not as login. Also, this is not a very ideal scenario in my opinion.

Here are two screenshots of the code:

Final word

This concludes my blog post about the vCloud Director Java Keystore Password. I wasted more time on this subject than expected… so it was time for a proper write-up for the vCommunity.

Share this:

VMware ESXi SATADOM Boot Device

In the blog post, I am showing you how to deal with SATADOM boot devices in your ESXi Hosts. Recently I replaced my SD cards with SATADOMs in all my ESXi Hosts in my HomeLab. This blog post is about my experience and configuration that was required for HPE ProLiant servers.

SD Card

In the past, I always used SD cards in my VMware ESXi servers as a boot media but overtime SD cards would wear out of fail. This is of course not ideal but the costs of replacing an SD card are quite low compared to 2.5-inch drives for example. So a nice alternative is a SATADOM, a fast, cheap and more reliable solution.

Here are some screenshots of my Home Lab environment with a failed SD card. The ESXi Host is still fully operational but has lost its boot device. In most cases, you can reboot the ESXi Host and it will work for about three days and the issue is back.



SATADOM

So after a couple of failures over the years, it was time to replace the SD cards with a SATADOM. The installation is quite simple but you need to verify some stuff… some SATADOMs use external power and some receive their power from the SATA connector (please verify this before buying).

In my case, I bought a SATADOM with an external power source. This because my ProLiant servers do not have SATA Ports with a power feature. I ended up buying a Delock SATA 6 Gb/s Flash Module 16 GB vertical (part nr 54655) in a webshop in Holland.

The “biggest” issue I encountered was configuring the BIOS in a way that the device was correctly detected. Here are the screenshots related to the BIOS settings and SATA port used on the motherboard. It appeared that the ML10v2 expected the SATADOM to be connected to port 5, on other ports, it was not working or it was not detected by VMware ESXi.

Here is a recording of the HP ProLiant ML10 v2 booting from SATADOM after a successful ESXi installation. Compared to the SD card the boot time has been reduced with 50%. Speed is of course always nice to have but how many times do you boot an ESXi host in a production environment? On the other hand… it could be very useful for a Lab Environment that is not running 24×7 and you boot your ESXi Hosts on a daily basis.

HP ProLiant ML10 v2 – Booting from SATADOM


VMware vSAN Requirements

So let’s look at the official requirements for VMware vSAN when using a SATADOM as boot media. Note: based on the amount of physical memory installed in your ESXi Host the requirements change!

  • When you boot a vSAN host from a SATADOM device, you must use single-level cell (SLC) device. The size of the boot device must be at least 16 GB.
  • If the memory of the ESXi host has 512 GB of memory or less, you can boot the host from a USB, SD, or SATADOM device.
  • If the memory of the ESXi host has more than 512 GB, consider the following guidelines.
    • You can boot the host from a SATADOM or disk device with a size of at least 16 GB. When you use a SATADOM device, use a single-level cell (SLC) device.
    • If you are using vSAN 6.5 or later, you must resize the coredump partition on ESXi hosts to boot from USB/SD devices. For more information, see the VMware knowledge base article at http://kb.vmware.com/kb/2147881.

Sources

Here is a list of sources I used for writing this article.

Share this:

pfSense: arpresolve: can’t allocate llinfo for X.X.X.X on emX

After some network maintenance, my virtual pfSense firewall started to cause some major issues and parts of the network started to stop functioning. In the end, it came down to the following error message (arpresolve: can’t allocate llinfo for X.X.X.X on emX). Because this wasn’t the first time… it was time to do a proper blog post about this issue.

In this post, I’m going to describe the issues and solution to get your network back up and running.

Environment

Here is an overview of my current environment where the problem occurred. I created a basic image of the setup:

Basic Network Overview
Basic Network Overview

Not very spectacular, just two routers/firewalls connected with a interconnect network and they use the BGP Routing Protocol. On both sides, you have a set of VLANs connected to the router with a trunk. A so-called router on a stick topology. The reason I use the BGP routing protocol is related to my daily job. The BGP Routing Protocol is kind of the preferred one for VMware NSX.



Problem

The problem started when… I upgraded my physical Cisco Firewall with new firmware and IOS. The interconnect network between pfSense and the Cisco stopped working completely. At first, I wasted about 1.5 hours on the physical Cisco firewall instead of the pfSense appliance. Because it looked to me that the IOS update was causing issues.

I was completely wrong… the Cisco Firewall was running without any issues… but pfSense had developed a new feature…

When I looked in the pfSense interface and went to the “System Logs > System > General” there was a serious error message. The pfSense kernel reported the following issue (arpresolve: can’t allocate llinfo for 192.168.80.253 on em8). Here is a screen capture of the message:

pfSense kernel message: (arpresolve: can’t allocate llinfo for X.X.X.X on emX)

It was unable to learn any new ARP entries in the interconnect network. This was causing the Cisco ASA & pfSense appliance to not form a BGP relationship.

In the case of sending a simple ping, it was not possible. Here is a basic diagram of the issue. All networks on both sides are functioning but the two routers are not able to talk and exchange BGP routes.

Basic Network Failure
Basic Network Failure


Solution

In the end, there are two solutions available for solving the problem:

  • Option 01: Restart the entire pfSense appliance > Problem solved!
  • Option 02: Deactivate interface and activate interface:
    • Connect with Putty to the pfSense appliance.
    • Activate the shell.
    • Deactivate interface (ifconfig em8 down)
    • Activate interface (ifconfig em8 up)
    • Problem solved!

The root cause is not completely clear to me… In total, I encountered this issue for about ten times. This is what I figured out so far:

  • The problem occurred for the first time after installing the package OpenBGPd on pfSense.
  • The ARP issue is only triggered when BGP neighbour states change, not always but sometimes.
  • The issue only occurs on the interconnect network… all other networks just work.

Article Update Juli 2019

After some frustration, I finally found a temporary workaround. After restarting the interface I have a couple of seconds to enter a static ARP entry before it stops working or does not allow me to execute the command:

  • Create two SSH sessions (one for restarting the interface and one for creating the static ARP entry).
    • Session 01:
      • Deactivate interface (ifconfig em8 down)
      • Activate interface (ifconfig em8 up)
    • Session 02:
      • Create static arp entry (arp -s hs-fw01.home-server.local 28:6f:7f:02:45:15)
  • This should fix the problem, in my case, it is working now for about 1 month.

Anybody experiencing the same problems? Anybody who has a definitive solution? Please comment below :)!

Share this: