4 things your network operation team should know about your access networks | NETSCOUT
Datasheet
| Application Note |

4 things your network operation team
should know about your access networks

In the last decade, businesses have become increasingly reliant on connected associates and customers to get them engaged when they plan and conduct their business. Furthermore, the popularity of cloud-based Internet of Things (IoT) that conduct machine-to-machine (M2M) communication has built a love-hate relationship with the IT operation team as many of these devices may be operating on the network without their knowledge, posing potentially devastating security and operational risk. Needless to say, the switched network that eventually connects all things has evolved and is getting more complex both from a configuration and security point of view. This white paper discusses the state of the switch network and key attributes that the IT operation team should have visibility into. This paper discusses best practices that will aid in making the team more efficient in maintaining control of the switch network, improve collaboration of the IT team and keep the connected people and things up and running.

Today’s switched networks

The role of the switched network has changed over the years since it was first conceived back in the early nineties. With the deployment of more mobile users and more bring-your-own-device (BYOD) products, the switch network has taken a more important role in getting more diverse types of end devices connected while maintaining security of the corporate network. Let’s look into the four key functions of today’s switch networks that the network operation team must maintain control of, and suggested best practices:

Key Functions Key attributes to manage
Connectivity Offering Power over Ethernet, Duplex, Speed to the device and the Link Control mechanism to facilitate it.
Authentication & Addressing Device and user-authentication mechanism, and the address and access provisioning service.
Routing The switch and VLAN topology, and packet-routing services such as DNS, Gateway, NAT, that get IP packets from the client to its target.
Efficiency Network path bandwidth, packet loss, delay and jitter characteristic that affects transmission efficiency and hence user experience.

How a switch network works together to connect and provision access to connected devices

Connectivity

Power connectivity
Power over Ethernet has become a popular way to power end devices because it reduces the cost of deployment and maintenance. Many networking devices, such as Access Points, VoIP phones and more recently IoTs are virtually all powered by PoE exclusively. PoE is power rated by IEEE 802.3 standards, and devices are classified based on their voltage and wattage. There are two kinds of PoE devices:

  1. Power Supply Equipment (PSE) that provides power on the ethernet cable. In new deployment, PSE is usually the switch and is commonly referenced as endspan. A PoE injector, called midspan, can be placed between a non-PoE switch and the PoE powered device as a retrofit. Based on the PoE standard that the PSE supports, it falls in to a PoE TYPE, 0 – 4 (see Table 1). PSEs can operate in two modes: Mode A PSE supplies power using the 12, 36 pairs on the 4 pairs UTP cable, and Mode B PSE uses the spare pairs 45, and 78. It is important to note that the PSE defines which mode the power is offered. The standard does not require PSE to support both Mode A and B.
  2. Powered Device (PD) is a device powered by a Power Supply Equipment, and thus consumes energy. 802.3af and 802.3at compliant PD must be able to support BOTH modes A and B. Based on the wattage that the PD draws, it falls into a PoE Class, 0 – 4 (see Table 2).

POE Type Common Name Related standard Pairs Used Max. Power to PSE Port Max. Power to PD
1 PoE 802.3af 2 15.4W 12.95W
2 PoE+, PoE Plus 802.3at 2 30W 25.5W
3 4-pair PoE, PoE++, UPoE# 802.3bt* 4 60W 51W
4 Higher-power PoE 802.3bt* 4 100W

Table 1: PoE PSE Types
#: UPoE is a Cisco proprietary classification reference in their “Digital Ceiling” Solution.
*: 802.3bt is a proposed IEEE standard that is schedule to be ratified in early 2018.

PoE PD Class Type/Standard DC Voltage at PSE DC Voltage at PD Min. Power from PSE Port Power used by PD
0 1 / 802.3af 44-57V 37-57V 15.4W 0.44 – 12.95W
2 1 / 802.3af 44-57V 37-57V 4.5W 0.44 – 3.84W
3 1 / 802.3af 44-57V 37-57V 7.5W 3.84 – 6.49W
4 1 / 802.3af 44-57V 37-57V 15.4W 6.49 – 12.95W
4 1 / 802.3at 50-57V 42.5–57V 30W 12.95 – 25.5W

Table 2: PoE PD Classes

The 802.3 standard defines LLDP to be the protocol used for PD to communicate with the PSE, and the class it belongs so that the PSE can provision the right voltage/current. But there are PoE devices in the market that use proprietary protocols, such as the Cisco Discovery Protocol (CDP), before the standard was ratified. Not all PoE devices are necessarily fully standard compliant so we must check.

What could go wrong:
The challenge to the network operation team is that as more and more PD at different classes are deployed on the network, the power budget of the PSE as well as the interoperability between the PD and PSE needs to be managed and understood. In addition, not all PoE implementations are standards-compliant nor is existing cabling system capable of supporting PoE.

Symptom Possible causes
Cannot get power 1. Cable fault:
  • a. open/short
  • b. 2 pair cable used with MODE B PSE.
2. PSE and PD not compatible; different Type or Mode
3. PSE port not enabled for deliver PoE
4. PSE and/or PD are not fully standard compliant (e.g. PD does not have 25 ohms on the powered pairs or does not support both Mode A & B).
5. PSE does not have enough power budget to support powering up all the PDs connected to it.
Intermittently drop off 1. Cable fault:
  • a. too long (>100m)
  • b. too much resistance
2. PSE does not have enough power budget to support all PD connected running at full power consumption (e.g. when the motorized security cameras are scanning).

Best Practices:

  1. Train your staff to understand how PoE works.
  2. Read carefully the specification of the equipment and deploy only standard compliance devices. Avoid non-standard compliant midspan PSE, such as Ethernet Y-cables (already a no-no) or so-called “8-port PoE Passive Splitters” that simply bolt a 48VDC supply to all the “idle” pairs.
  3. Document the wattage of PSE and PDs.
  4. Check when changing and adding PDs to ensure that the PSE can support all PDs connected to it.
  5. Offer standardized tools and procedures for the team to validate the health of PD, PSE, and cable during deployment and troubleshooting (e.g. verify that the voltage and wattage at the PD side is available and meets requirement).

Link connectivity

The other consideration when it comes to connected devices is the linking process between the device and the network. The first thing to consider is that the cable between the end-device and the switch must be able to support the link. Most structured cabling systems today require all four pairs to be connected and certified with length to be <100 m during deployment. That will be sufficient to support all networks up to 1Gbps as shown in the table below. Table 2 below shows the minimum cable standard that is required to support different types of deployments. During upgrade, it is important to re-certify the cabling system to avoid wear and tear or undocumented changes from causing issues.

Standard Certification level Pair used
10BASE-T Cat3 12 & 36
100BASE-T Cat5 12 & 36
1000BASE-T Cat5 12, 36, 45, 78

The linking process is negotiated between the end-device and the switch to establish the speed, duplex and the cable pairs to allow the data communication to occur. It has become less of an issue as auto-negotiation has been the default setting on switch ports and the Network Interface Card (NIC) leaving interoperability usually maintained and well understood. The following table shows the hit-and-miss situation when either the NIC or switch is manually set to use a specific speed and/or duplex. The rule of thumb is that if one side is forced, the other side needs to be forced to the same. When one side is auto-negotiating, the other side should be as well. Even under the case when link is established when one side is set as auto and the other side is not, it is highly possible that the auto side will periodically try to re-negotiate, causing temporary loss of link. As the price of 10G-capable switches drop, more switches with 1/10G ports are begin deployed. In most cases, a 1/10G switch port does not support half duplex or auto-negotiation. Therefore, both the switch port and the NIC must match for connection to happen.


Link result for 10/100/1000Mbps Ethernet Switch and NIC based on link settings

What could go wrong:

Symptom Cause
Cannot link (no link light) Cabling fault
  - Open, short on transmit pair
Wrong fiber SFP used: singlemode vs multimode
Mismatch in link setting between switch and NIC
Less than optimal link speed/duplex & Intermittent reconnect Either NIC or switch was set as auto-negotiate while the other was set as fixed rate for 10/100/1000Mbps link
Cabling fault
  - split pairs

Best Practices:

  1. Always use auto-negotiation for NICs and Switch Ports on 10/100/1000Mbps ports. If a 1/10G switch port, hard code it to the speed needed.
  2. Document switch port settings and structured-cabling path used, and more importantly make the information easily accessible to all team members.
  3. Offer an easy way to check current switch port link configuration, either directly using LLDP or via management system. The best way is to have a passive tool that can be connected inline between the NIC and the switch to observe the link capability offered and the link/duplex that the pair settled.


Switch Port can be tested for PoE against a PoE Type/Class and display the TruePower™ available to the PD Class.

Link test shows the link capability of the switch port.

Inline analysis showing the speed/duplex advertised and used between the switch and the device.

Authentication

Before a device can start to communicate with other devices on the network, it will need to go through an authentication process for three purposes: security, address and access provisioning. Authentication allows an authorized device to access the network but also prevent a rogue device from connecting to the network.

In the past, only Wi-Fi devices required authentication while cable connected is pretty much plug and play. With the proliferation of IoT devices, device authentication has become more important than ever. There are many authentication mechanisms but the most commonly used is based on 802.1x and Radius, coupling with DHCP service. During the authentication process, such as that based on 802.1x, there are a minimum of three parties:
  1. Supplicant: the element that wants to be able to access the network, such as the security camera.
  2. Authenticator: the element through which the supplicant may access the network, such as the switch or the Wi-Fi access point.
  3. Authentication server: it contains the information which is used to decide if a supplicant may or may not access the network resources. It is typically a server running Radius protocol. The authentication mechanism can be based on the MAC address of the device, the user account such as a guest password to the guest SSID for BYOD, or the private certification programmed on a smart card of a security camera. The example below shows how a fixed-location security camera gets authenticated. In this case, EAP protocol is used for added security, commonly seen during Wi-Fi end-device authentication.


In the above example, the authenticator serves as the proxy to communicate the authentication request to the authentication server. After the device is authenticated, the device can send a DHCP request to the local DHCP server to obtain an IP address and not until then. It is important to note that authentication and the pool of IP address allocated needs to be in sync. Take Wi-Fi user authentication as an example:


After the guest BYOD device is authenticated to the network via the Guest SSID, the AP is setup to send the traffic onto VLAN 1 while Corporate Users connected to the Company SSID will be sent to the VLAN 101. These VLANs need to be setup on the switch for the WLAN networks, and each VLAN needs to be connected to a local DHCP server via layer 2 broadcast mechanism so that the IP address can be provided to the device. In some cases, a DHCP protocol bridge, such as the Wi-Fi controller, can be used to forward a DHCP request from the clients on different VLANs to a single DHCP server. Typically, the IP address for each VLAN are mutually exclusive as below such that clients that belong to different groups can access their distinct sets of network assets:

User Group SSID VLAN IP Address Pool Accessible assets
Guests Guests 1 10.10.10.1-10.10.11.255 Limited Internet Bandwidth, Guests Printer
Corporate Users Company 101 20.10.10.1-20.10.19.255 Internet, Corporate VPN, Corporate Servers, Printers…
Security Camera 201 20.10.20.1-20.10.21.255 Video Servers and Storage
Network Admin NetAdmin 301 20.10.30.1-20.10.30.127 Wi-Fi controller, Switch/Router management ports

In addition to assigning an IP address to the device, DHCP can offer other key information to the end-device that is critical to its operation. For example, VoIP phone receives the IP address of the configuration server that contains the address of call manager and SIP port# to use via DHCP option code 66 (TFTP server) or 150 (VoIP Configuration Server).
Table below shows the commonly used DHCP options and their code #:

DNS Option Code Description
1 Subnet mask (must be sent after the router option, option 3, if both are included)
3 Router
6 DNS servers, should be listed in order of preference
15 DNS domain name, should be listed in order of preference
44 WINS server (NetBIOS name server)
45 NetBIOS datagram distribution server (NBDD)
46 WINS/NetBIOS node type
47 NetBIOS scope ID
51 Lease time
66 TFTP Server Name (RFC2132) or in name field (RFC2131)
150 IP address(es) of VoIP Configuration Server(s) [has precedence over option 66 (RFC5859)]

What could go wrong:

Symptom Possible causes
Cannot get IP address Authentication issue
  - Wrong end-device setting (authentication protocol, wrong certificate)
  - No end-device configuration on Authentication Server
No IP address available
  - Not enough IP address poll
Network issue
  - DHCP server not reachable from VLAN
Wrong IP address Network issue
  - Wrong VLAN assigned
Wrong or rogue DHCP Server offer IP address when multiple DHCP Server is present

Best Practices:

  1. Document the VLAN configuration on switches, switch-to-switch uplink ports, VLAN to user group/address correlation, and DHCP provisioned for each VLAN/broadcast domain.
  2. Make the documentation accessible to team members responsible for setting up and troubleshooting the switch networks.
  3. Have standardized test workflow and procedures that allow any team member to verify the switch configuration from the client location to the proper VLAN and address.
  4. Have tools that can offer visibility to all DHCP responses from the network to detect rogue DHCP servers, the IP address and options provided to the client by user credential.


OneTouch AT supports 802.1x with EAP to emulate that of the user.

Determine if multiple DHCP responses are received and what were the parameters offered.

Gain visibility of VLANs configured on switch ports, and other status, such as utilization and # of devices connected.

Routing

Once an end-device obtains an IP address and key configuration information, it can then communicate with other devices on the network. Routing is the fundamental mechanism that the network uses to connect various IP-based devices together across private and public owned networks. There are a number of key and fundamental services on the network that facilitate this function that the network operation team should know or be made aware of:

Routing elements Description
VLAN Virtual LAN is a Layer 2 mechanism to allow switches to group end-devices and switch ports into a broadcast domain.
Router A router is a device that joins networks together and routes traffic between them. A router will have at least two network interface cards (NICs), one physically connected to one network and the other physically connected to another network. Some routers can be configured to only allow traffic on certain well-known ports. Applications that run on protocol with special ports will require configuration change to open those ports.
DNS domain name server, also called a DNS server or name server, manages a massive database that maps domain names to IP addresses. When you enter a URL into your web browser, the default DNS server uses its resources to resolve the name into the IP address for the appropriate web server.
NAT Network Address Translation allows a single device, such as a router, to act as an agent between the internet (or "public network") and a local (or "private") network. This means that only a single or a few recognized IP addresses are required to represent an entire group of devices with un-recognized IP addresses.

The way that these services work together can be summarized as the follow examples:
Client communicating to a server in the intranet

  1. Client knows the name of the server
  2. Client send query to default DNS IP (parameter from DHCP)
      a. If default DNS is not on the same IP subnet, send query to default router on its VLAN
      b. Router forward query to the router port connected to the DNS IP subnet, and VLAN ID will most likely change
  3. DNS server reply with IP address of the intranet server, via router if needed
  4. Send connection request to the intranet server’s IP address, again via router if needed
  5. Router forward query to the router port connected to the intranet subnet, and VLAN ID will change
Client connect to the internet
  1. The first four steps are the same as when communicating to intranet server except that the name of the server may be a website via a web browser
  2. Router forwards the IP packet to the router port connected to the internet link
  3. If NAT is used, the NAT will change the source address of client A to a recognizable public address before forwarding to the internet link

What could go wrong:

Symptom Possible causes
All users on same VLAN cannot connect to intranet server Incorrect IP address or default DNS failed
Router not reachable or failed
Broken or oversubscribed VLAN trunk path
All users on same VLAN cannot connect to the internet Router Port or link to internet down
Router not reachable or failed
DNS not reachable or failed
NAT failed
Some applications cannot run Router may have block protocol port that the application requires
VoIP call does not work Call Manager not reachable?
DHCP VoIP configuration server information not available or misconfigured?

Best Practices:

  1. During installation, have standardized tools and procedures so that technicians can sample check accessibility and path to the local router and critical servers, intranet and internet, from a VLAN edge using each credential.
  2. Document the correct default router, DNS IP address that should be provisioned to the client based on user/device credential for reference during troubleshooting. Make the information accessible to the team.
  3. For troubleshooting, have tools that can show traceroute and switch path used, and note DNS process pass/fail when accessing asset beyond the local subnet/broadcast domain.


Conduct TCP connect to verify DNS resolution, connectivity to server and response time of both.

Verify default Gateway/Router is reachable.

Determine the switch paths between the switch port and the target device.

Efficiency

With the connectivity, authentication and route verified, the last but not the least important thing is to ensure that the network can help to deliver application traffic efficiently. There are several key factors that can affect user experience of the application due to the network:

  1. Available bandwidth can affect class of service provisioning, especially on WAN links, as well as the amount of load on the network.
  2. Network path used can affect transverse latency as well as available bandwidth.
  3. Smart devices such as load balancer and WAN accelerators that may re-engineer the application transaction.
Since the design of the network largely dictates these factors, the network operation team’s responsibility is to validate the design such that the network is able to support the application before it is deployed and has no load, as well as after the network is deployed for use. The most commonly tested parameters are: information rate (IR) or bandwidth, jitter, delay and pocket loss. The three most commonly used network test approaches are iPerf, IETF RFC2544 and ITU Y.1564. The table below shows how the three tests compare:

  RFC2544 iPerf Y.1564
Frame Type UDP only TCP, UDP TCP, UDP
Key network tests Information rate, delay, and data loss. Jitter is optional TCP: Information Rate,
UDP: Information Rate,
delay, jitter and data loss
Information rate, delay, jitter and data loss, CBS, and EMS
Main Tunable parameters IPv4, DSCP, TOS and VLAN;
Seven frame Sizes (byte):
64, 128, 256, 512, 1024,
1280, 1518; Same Port#
send and receive
IPv4 or IPv6,
DSCP, TOS and VLAN;
TCP: Total byte sent,
MTU/MSS, TCP Window size,
and file to send; UDP:
user defined frame; size
Diff Port# send and receive
IPv4 or IPv6,
Layer 3 tag: MPLS,
802.1p, 802.1ad,
DSCP and COS;
Stream profile: MTU,
CIR, EIR, EMIX;
Diff Port# send and receive
# of simultaneous connections One Multiple Multiple
HW Platform Professional Test Equipment Window/Linux/Unix based computer Professional Test Equipment
Benefits Simple configuration for maximum bandwidth TCP and UDP test;
multi-streams test;
free under BSD license
TCP and UDP test;
multi-streams test;
short test time
Disadvantage UDP only
Requires dedicated HW
Transmission rate slave to NIC driver Command Line UI Complex configuration not typical in enterprise LAN; requires dedicated HW

Of these three test approaches, the RFC2544 is the first to be used and is still the most widely used. It has been sufficient to validate end-to-end network performance. iPerf has been gaining popularity in the network engineer community because of its ability to perform bandwidth tests with TCP flows, and its low cost of deployment. Y.1564 is used mainly for Metro-network link testing where SLA is a must. It has not been widely adopted in Enterprise.

What could cause application to be slow?
When a user complains that the performance of the network is low, there are a few questions to ask to determine if the network is the issue:
  a. What application is affected? real-time voice/data or data traffic
  b. If it is not a corporate application, ask questions to determine if the application streams are contained within the corporate network?
  c. How many clients were affected? How is the relationship between these clients?

Symptom What may be the problem
All users of only one intranet application experienced slow performance Application or server has issue;
The network leading up to the application server(s) is bad
All users of one internet application experienced slow performance Internet application issue;
Internet application flow blocked
One user’s experience with an application was bad Client device or account configuration;
Client to network connectivity issue, especially if connected over Wi-Fi
Few users in the same VLAN experienced bad performance VLAN to application network path issue
VLAN group provisioning issue

Best practices:
During Deployment:

  1. Run network performance test for end-to-end link between critical paths up to the maximum bandwidth of the weakest link, and against the SLA requirement of the weakest link. If there is no SLA parameter available, the following guideline could be used: one-way end-to-end delay <150msec, jitter <100msec and packet loss <1%.
  2. Document test result of the link for future reference.
During Troubleshooting:
  1. If it is a TCP application, try to run a TCP Connect test to the server. If the test completes 100% with little delay, it is very likely that the server itself is the issue, not the network delay. The next step is to prove that packet loss on the network path between the client and the server is good to totally eliminate the network as the cause. Typically, we want it to be <1% at the rate that the maximum information rate required for the application. If the server is outside the corporate network, you will only need to verify up to the point before the stream leaves the network.
  2. If the application is real-time voice/video, some VoIP phones may give you Jitter/Packet Loss statistic of the call. Otherwise, you can run a RFC2544 or iPerf test against the target end-point. Typical voice and video requires one-way end-to-end delay <150msec, jitter <40msec and packet loss <1%, at the UDP stream rate around that of the voice/media stream.
  3. Document all test results. If it is not the network, try to capture the application transaction at both ends of the application: near the client and the server. The best approach is to capture the traffic using inline TAP or via SPAN/mirror port.



Exercise connectivity to application server and verify DNS resolution, routed response time of both.

Ensure performance of the wired link to the router with the OneTouch AT’s performance test. Measure upstream and downstream throughput up to 1Gbps as well as loss, jitter and delay.

Capture packets inline between the switch and device, and apply filter to store relevant info to SD-RAM.

Getting it all together

For the network team to be efficient in supporting switch networks today, the team needs to be up-to-date on the technologies that makes it all possible. It is also critical for team members to effectively share information, not only about knowledge such as the configuration of the network made, but also on-site information during troubleshooting or deployment. Despite the best effort, not all members of the team have the same skill level. There are many freeware and tools available in the market, but not all team members have the knowledge to utilize the tools, as well as interpret and share the test results. Freeware is also notorious for lack of documentation and test report that can be easily shared. The ability to save and share real-time information about the network not only improves collaboration between teams during troubleshooting, it also importantly serves as an evidence when 3rd parties, such as service providers, need to be called in to resolve a problem caused by them.


NETSCOUT Handheld Network Test Tools not only give network operation teams the means to gain visibility, the family of tools offer two key attributes that helps the team to be more effective.

1. Automated test to support programmable standardize test procedure.
The tools offer an AutoTest that will provide visibility to all four aspects of the switch network with the push of a button with user programmable Pass/Fail limits and automated reporting. Three choices are available that offer different levels of detail and depth of testing:


AutoTest features             LinkSprinter
            LinkRunner
            OneTouch AT
Connectivity – PoE Type 1 Type 1 and 2 with
TruePower
Type 1 and 2 with
TruePower
Connectivity - Link 10/100/1000Mbps
Copper
10/100/1000Mbps
Copper or Fiber
10/100/1000Mbps
Copper or Fiber and
Up to 802.11ac
Connectivity – Switch ID LLDP/CDP reports
Switch name/port#
LLDP/CDP reports
Switch name/port#
LLDP/CDP reports
Switch name/port#
Authentication 802.1x/EAP 802.1x/EAP
Address DHCP & Static DHCP & Static DHCP & Static
VLAN ID
Routing Gateway,
Ping 1 IP device with
DNS resolution
Gateway,
Ping 10 IP devices with
DNS resolution
Gateway,
Ping, TCP connect,
EMAIL, FTP, IGMP,
WEB test for user
definable # of devices
Efficiency Response time for Ping test Response time for Ping test Response time for
route test
RFC2544 up to 1Gbps
Notable Tools • View test result
over Wi-Fi from
Mobile App
• Powered by PoE or AA battery
• Distance to fault
• Tone Generator for
cable tracing
• Wiremap for cable
testing
• Distance to fault
• Packet Capture
• Inline PoE & VoIP Analysis
• Device Discovery &
inventory report
• Remote Control
• Distance to fault

Table: Key feature comparison between NETSCOUT Handheld Network Test for Switched Networks

2. Collaborative work flow with cloud portal for test result storage and sharing.

To facilitate visibility and collaboration across the network operation team, all NETSCOUT handheld test tools share a cloud-based result and report management database called Link-Live. It is a free cloud-based service that supports automated upload of test results from all hand held tools. During network deployment, a progress report can be easily generated showing switch ports tested each day, their link speed and duplex distribution, and PoE test results. During troubleshooting, previous test results from a switch port can be compared against current test results for quick change identification.

Fig: Link-Live result dashboard showing summary of test results

Fig: Expanded test results showing detail information

Fig: Summary Report from Link-Live showing progress from the perspective of test results over a time period

Conclusion

The switch networks have evolved from simply connecting devices to the network to supplying power, authenticating a device and user, and routing their traffic effectively and automatically. The network operation team needs to maintain their knowledge of the technology adopted as well as how to gain visibility to the construct and make changes on their switch network, especially around the edge, where devices and users are constantly moving, and new M2M devices are added. Having a best practice that allows standardization of test procedure, sharing of information, from design and configuration to real-time, on-location status, will improve overall efficiency of the team. NETSCOUT hand held network test tools offer the best-in-class test features, and automated test procedures to allow the network operation team to be efficient and take control of the 4 key aspects of the switch networks.