ASIX USB NICs and A Massive Home Lab Performance Improvement

My home lab ESXi cluster is made up of Intel NUCs. They’re a popular option for home labs, largely given how well they squeeze a good amount of performance and memory capacity into a small, quiet form factor. The biggest downside is that they only have one onboard NIC and no PCI expansion slots. In very light applications a single NIC may not be a problem, but in my case it’s an issue given the fact that my storage is all iSCSI-based. So I need as much throughput as I can get.

Intel NUCs are a great combination of size, quiet operation, and performance — perfect for a home lab

That means using USB NICs for additional interfaces. VMWare doesn’t include drivers for the most popular USB NICs out there, but there is a solution: USB Network Native Driver for ESXi, a VMWare Fling by Songtao Zheng and William Lam. (By the way, Willam Lam’s blog is absolutely indispensable for home labbers in particular, but also for all things VMWare. Check it out here: https://williamlam.com/) The driver was first released as a Fling (according to VMWare, Flings “are apps and tools built by our engineers and community that are intended to be explored”) in 2019, but it has a history that goes back years longer, and currently supports chipsets from ASIX, Realtek, and Aquantia.

The Fling is hugely helpful to home labbers in general and me in particular, and overall Songtao and William have done a great job. I absolutely don’t intend anything else here to be taken as criticism of them or their effort; just an interesting issue I happened to run into and perhaps the beginnings of a means to address its underlying cause.

My initial build — and a hint of a problem

It’s been quite some time, at least 6 or 7 years, since I first built my NUC-based lab, so my recollection here is a little fuzzy. But as best as I can remember, I think there were some compatibility issues at the time with the ASIX-based adapters, so I went with Realtek-based adapters of various brands. I needed at least 3 additional interfaces, one each for vMotion, iSCSI, and vSAN (which I experimented with but no longer use — more on that below.)

Given that the NUCs of those generations only had 4 total USB-A ports and I already had two allocated — one to a USB thumb drive for the OS and one to keyboard and mouse — in some cases I used USB hubs with built-in NICs to give me more ports. Of course I knew that was less than ideal, but given my use case (I had planned to run a few VMs, nothing major, and just do some tinkering) I wasn’t too worried about throughput.

At the time, I was running vSAN as my primary storage, with iSCSI on my NAS as secondary. I could tell from the health checks built into vSAN that I definitely had some issues with throughput, but I chalked it up to the fact that my use of USB NICs was nowhere near optimal. Over time though I ran into multiple issues with vSAN: one issue was that because I was not using a UPS, even minor power dips would cause at least one or two of the hosts to reboot, and my data would get corrupted in a way that never seemed to happen to the Datastores that ran on iSCSI. Secondly, I did suspect at the time that corruption problem was likely exacerbated by poor throughput, but I again just attributed it to the overall deficiency of my gear.

After I set vSAN aside and moved to iSCSI exclusively, I again could tell that my network performance was problematic. Around this time, I got the sense that there was something more at play than just the generally suboptimal performance of USB NICs. But as is so often the case, I just didn’t have time to really dive into it and do some real investigation.

Increasing workloads reveal a real issue

When I first set up my lab, I only ran a few basic services: two Domain Controllers, a pair of VMs that ran DHCP and a few other services, a vCenter, and a couple basic utility VMs, all of which ran on three physical hosts. But over time, the areas in which I had an interest in researching/labbing expanded and gradually, so did my lab. I still plan to detail my lab in a later post, but it now consists of 6 total hosts, and I run a Kubernetes cluster and a whole bunch of other services on a total of 28 VMs. All storage is still over iSCSI, now to two NASes.

Of course more workload means more and more frequent changes. One thing that continued to get clearer and clearer was that my biggest performance bottleneck was storage: I frequently pushed my compute and memory limits, but those were addressed by adding hosts. What really showed something was wrong were storage vMotions. They would take hours.

When I started to look into the issue, it was clear that reads were performing about as I’d expect — usually between 40-60 MB/s — but writes were performing terribly: they’d rarely hit 10 MB/s and even when they did, it would only sustain that speed for a few seconds at a time. Of course there are a number of plausible explanations for seeing somewhat slower writes, but a difference that significant made me pretty strongly suspect that something other than just mediocre gear was at play. But again, other priorities kept me from really getting into it and properly troubleshooting.

Stumbling across a solution

It was completely by accident that I happened upon concrete proof that there was a problem — and a solution for it. I recently added a new host to my cluster. Just like I sourced my other hosts, I saw a good deal for a barebones 10th-gen i7 on eBay and got it. I already had all the other components (RAM, storage, etc.) but completely forgot that I didn’t have the same NICs for it that I do for my other hosts. But as luck would have it, I did have two StarTech dual USB ethernet adapters. I don’t know when or why I got them but I was happy to remember they were in my technology junk box. So I built the new host just like my others — using kickstarts that I may detail in another article — but with the StarTech NICs.

My tech junk box (and my slightly problematic penchant for hanging on to old computer components and electronics) comes through for me for once

After adding the host, I also decided to move some things around a little bit, rearranging some storage. That involved a number of storage vMotions, and much to my surprise, vMotions executed on the new host were substantially faster than on the existing hosts: operations that I gotten used to taking upwards of an hour were done in minutes. Looking at the interface statistics on the NASes, I noticed the write operations were executing at almost the same 40-60 MB/s as the reads. I knew right away that it had to be the NICs: I already had a 10th-gen i7 host that was identical to the new one except for the NICs. I’d finally zeroed in on the problem.

Even though they’re a little pricey, I immediately ordered 10 additional StarTech NICs for my other hosts (fortunately I was able to get 3 used for a slightly lower price.) I rebuilt my existing hosts with the new NICs and man, the difference is night and day. I don’t have any real empirical data to demonstrate this — perhaps I’ll see if I can collect some, but the practical conclusion couldn’t be clearer. Everything is faster: vMotions and Storage vMotions are significantly faster, my VMs generally perform better, container create operations in K8s are snappier, a pesky issue with NAS backups causing some containers to fail went away… everything experienced huge leaps in performance.

Conclusion and next steps

The most likely culprit? The StarTech NICs use ASIX chipsets, specifically ASIX88179. Some basic searching led me to this exchange between Songtao Zheng, one of the contributors of the Fling, and an individual who also uses the Fling. I’m not sure if the issue they’re discussing there was ever resolved.

Also, in thinking back on it, I seem to recall that some of the old NICs might not have exhibited the symptoms the others did. So to check on that, and hopefully come up with some data to help address whatever the underlying issue might be, I plan to test each of the old NICs.

In the meantime, I think it’s safe to draw the conclusion that if you’re in the market for USB NICs to use with ESXi, you’re probably safer going with ASIX-based, like the StarTech — which also has the added benefit of being dual adapters. 40-60 MB/s isn’t exactly anything to write home about, but it’s sure better than what I was seeing before. Hopefully this info keeps you from experiencing poor performance like I did.

Leave a Reply

Your email address will not be published. Required fields are marked *