I moved into my home just over two years ago. Luckily for me, it came prewired for networking. However, most of it is CAT5e, vintage unknown. For most of my time here on 80Mbit Internet, that wasn’t a problem. I happily setup a 1GbE capable network and got on with it. All was fine, until nine months ago, fibre was ran direct into my home; bringing me 3Gbit Internet; naturally, this meant I had to immediately upgrade my entire network to 10GbE capable, because… why not. Alas, this wasn’t the best judgement on my part and if I could do It all again, I’d have gone with 2.5GbE; it’s significantly cheaper, more efficient and less fussy. That a-side all was well… until three months ago.
TL;DR Had lots of packet loss. Went nuts. Don’t run copper cable, run fibre for long “hub” runs. When diagnosing a fault, start methodically from both ends and work your way into the middle common denominator.
Fault finding for the impatient
I’d be working from my home office, all seemed well, but then I’d start to notice the machine I work on daily not negotiating with the network well when it came back from sleep, naturally I thought this was a software issue so I updated it and carried on.
When the issue persisted some weeks later, I then thought it must be a hardware issue, my first thought was to swap out the machine I used; I did, but the problems persisted. I even tried manually negotiating my network down to 1GbE to no avail.
I then fixated that it must be an issue with the CAT5e run I was on, given I was pushing 10Gbit over it (out of spec) and over around a 30 meter run too. This is where the sequence of fun calamities began.
If in doubt, blame the cable and close your eyes
At this point, naturally I’d ruled out absolutely everything else immediately and jumped straight to “The old CAT5e cable is crap”, bought 100 Meters of “External direct burial CAT7” (don’t get me started on that, if you’re reading this, just buy CAT6a) and planned how I was going to run it from point A to point B in the least invasive way possible. Two days, 80 meters of cable, 30 meters of conduit, lots of drilling, polyfilla, more drilling and pulling cable later… I’ve got myself a brand new future proof cable run. I painstakingly terminate both ends (did I mention CAT7 was a bad idea) and plug it in with hopes of stability and speed.
My network now negotiates at 1000BaseT instead of 10000BaseT… huh? I test the cable, buy a new more advanced tester, re-terminate the cable, arguably worse but more complaint than before… result? 1GbE… oh dear. I’m now frustrated, I email the cable manufacturer, who send me the fluke test report, the cable must be fine. I then start to doubt myself, did I snag it somewhere pulling it so much? Is it running past a flux capacitor? Is it too long? Nothing helps… then when re-terminating it for the third time, I notice the SFP+ to RJ45 connector its plugged into is super hot. After running iperf3
repeatedly I noticed the speed would drop off over time, I’d sort of expected the opposite after a slow start.
OK, lets sort that out, I cooled it and it didn’t help, I then googled it for known faults and nothing came up… After some inspection of my original purchase (thinking the adaptor may be faulty), I notice it’s only rated for 30 Meters, it also uses 2.7 Watts…. I bought a replacement adaptor, rated for 100Meters, that uses 1.6 Watts, hoping for less heat and more stability. I plug in the cable, it negotiates immediately at 10000BaseT; things are looking good.
Predictably its always the TGTBT cheap thing that breaks
A couple of days of work later and sadly, the problems persist. I cannot believe it. I’ve changed a cable, SFP+ adaptors. Even the patch cable going from the wall to the Switch and from the Switch to my machine. Nothings helped. It then dawns on me… I’ve looked at everything, changed everything, except the dumb cheap switch I fixed to the underside of my desk. Out of sight. Out of mind.
I then take out the uplink and plug it directly into my work machine, run mtr
, ping
and iperf3
for 24 hours this time I’m convinced it’s working. After everything; it wasn’t the out of spec cable, it wasn’t the SFP+ adaptor (yet both of those things could have done with being upgraded to standard anyway). It wasn’t the patch cables, it wasn’t the expensive Ubiquiti switches I have everywhere. It was the £80 QNAP cheap 2.5GbE/10GbE unmanaged switch I fixed to the underside of my desk, suffering resets constantly, I googled it, I’m not the only one. It’s under warranty, but looks like replacements suffer the same, that switch gone and an Ubiquiti one in its place we are back to equilibrium and normality. Lesson learned. Time and money unnecessarily spent, but not waisted. Alas, I should have just run fiber given the length and topology and also debug’d the issue properly, I think I forgot about the cheap rubbish switch under my desk.
Now simply waiting for 40GbE networking to waste my time all over again for something I don’t need :)
Sequence of events
- Noticed network card not coming up
- Changed my machine
- Changed my long network cable run
- Re-terminated cables
- Re-terminated cables
- Re-terminated cables
- Changed my SFP+ to RJ45 adaptors
- Changed the cheap nasty switch under my desk
- Cable was fine, SFP+ adaptor was fine, machine was fine
- Everything now has an upgrade as a result and should be good for a while