By: Yogesh Ranade, Principal Director of Product Management
This is the second blog in a series to dive deeply into the features of the SmartZone Operating System.
One of the biggest challenges in Wi-Fi network management is accurately troubleshooting and identifying network connectivity issues. These issues can range from Wi-Fi connectivity, client authentication and authorization to IP address assignment and packet forwarding and routing. Regardless of where the actual problem may reside, the most typical user reaction is “Wi-Fi doesn’t work.”
The Ruckus Wireless SmartZone system is the largest, most highly scalable controller available today and has been widely deployed by both service providers and enterprises for over 5 years. We’ve seen such issues being reported in relation to the SmartZone system, but after troubleshooting, we realized that, on many occasions the connectivity problems had little to do with Wi-Fi. Instead, they were related to other systems on the network.
There are many possible causes: network configuration, portal systems, RADIUS server and DHCP server misconfigurations or scaling limits on backend infrastructure. Many IT administrators and service provider operators rely on traditional logs, alarms and events across these disparate network systems to analyze and troubleshoot such issues. And others use next-generation cloud software, like Splunk, which enables them to aggregate logs from all these systems and then run smart queries to identify the problem. These tools do a great job, but not all customers have the capital budget and/or technical expertise to manage these add-on systems.
There are several challenging scenarios to consider, including:
- Wi-Fi radio connectivity.
- Type of SSIDs and authentication models configured—for example, for open WLANs with portal-based systems, PSKs, or 802.1x, the call-flows vary significantly. With open WLAN and portal-based systems, clients are assigned IP address before being redirected to a portal page for authentication and authorization of service. In PSK and 802.1x handshakes, the client first goes through authentication and authorization process before being assigned an IP address. To add to that, different EAP methods may be applied across different SSIDs and venues—and this results in different authentication steps.
- State machine transitions across disparate network stacks.
- Client roaming considerations—when a client roams from one access point (AP) to another AP may result in full-authentication and authorization and/or IP address management.
To effectively render all these transitions and apply them across a large number of zones (venues) and match the scale (30,000 APs) of the SmartZone platform was an extremely challenging undertaking.
After many hours and days of brainstorming, our R&D team has delivered an intuitive and simple UX design for this highly complex process. The tool asks the operator the MAC address of the client that is experiencing network connectivity issues. The system then tracks the client through all of its internal processes and databases while rendering all the interactions involved in a visual ladder diagram. Most importantly, the system indicates where the problem may potentially reside. In addition, the system shows all the APs that actually “hear” that client and which AP the client is actually associated and connected with. All the associated radio metrics—RSSI, SNR and latency values—are displayed in a clear, visual format.
Below are some snapshots that demonstrate this valuable tool.
The tool provides mechanisms to pick a client to troubleshoot as shown below:
In the example below we see the clients WiFi probe requests on both 2.4 and 5GHz; noticing a higher SNR on 5GHz (channel 36). I told the client to “forget” the SSID; which starts the “VCD Engine” showing a re-authentication request from the client to the AP and the AP responding. I then connected the client and typed in the PSK; you can see the client do a four-way WiFi handshake with the AP and a DHCP Request and Response successfully.
In redoing the test, I mistyped the PSK. We see the failure which disallows L2 connection so no DHCP handshake occurs and in addition we see the client move from 5GHz to 2.4GHz but alas the PSK is wrong and it eventually just fails.
Ruckus plans to continue enhancing this new capability, providing additional use-case coverage, as well as adding more features to provide a well-rounded and effective Visual Connection Diagnostics troubleshooting tool. Reach out to me (email@example.com) for feature sets that you might think would be useful.
SmartZone Blog Series: