Categories of Work in IT

I recently found myself listening to the audio book “The Phoenix Project” by George Spafford, Gene Kim and Kevin Behr. The plot of the book takes you on the journey an IT Operations Manager has to endure as he is gets abruptly promoted to VP of IT Operations. In his new role, the main character identifies IT procedural weaknesses and also observes how the demands from the business make The IT Department seem like a business roadblock.

Part of the book introduces a board member who provides a lot of insight and compares IT to a manufacturing production line, mentioning concepts such a the “Theory of Constraints” and “WIP (Work in Progress) “. This is my very favorite part in the book because it defines the four categories of IT Work and the comparison to manufacturing is very accurate.

By no means I claim to be an expert on any of the management procedures explained in the book but I have been a first hand witness of the four types of work and how they impact a business:

1. Business Projects are the ones with a direct link to the business. These projects get the highest priority since they are revenue regenerating for the organization in most cases.

2. Internal Projects are the ones handled by IT Internally, driven by IT Initiatives. These are for example a network equipment refresh or remediation, the migration to a new email platform or the installation of a new datacenter.

3. Operational Changes are the kind of work done to keep systems running or to enable/disable functionality based on changes in business requirements. The Novel defines Changes as follows:

a ‘change’ is any activity that is physical, logical, or virtual to applications, databases, operating systems, networks, or hardware that could impact services being delivered.

4. Finally the number one enemy of productivity, Unplanned Work. This type of work takes all other work and puts it in the back burner, it results in heroic battles to “fight fires” due to outages and loss of productivity in other areas of the organization. In general, this is recovery work, taking IT away from meeting its goals.

In my experience, when IT work is not “in tempo” meaning the production line has a backlog or it is not flowing properly, problems and outages happen. Once the constraint or bottleneck is identified, the pace is recovered, although in some cases this takes halting all projects to focus on freeing up the bottleneck. Not handling the issues with the constraint results on more unplanned work and more load on the constraint.

When IT is not a solution’s provider it is seen as the organizational constraint. If IT does not get in front of the business identifying current and future technological needs, the other business units will. Unfortunately when the other business units are determining the IT solutions, in a lot of cases they don’t fit supported models and/or are incompatible.  IT Must be providing solutions to the business, not problems. IT can do this through engaged leaders, working as a team with the other business units to identify areas of opportunity and to scale systems according to the business demands. When IT leaders focus on single minded solutions not inline with business needs they generate unplanned work which delays Business Projects resulting in more unplanned work. This can also also give the false impression of IT capacity issues.

IT can also be it’s own worst enemy. In the spirit of improvement and in an effort to be cutting edge, IT elements move forward with unplanned implementations, deviating form the predetermined growth plan. When this happens antiwork rears it’s ugly head in the form of unplanned tasks, unplanned remediation and unplanned scaling to facilitate poor service operation. An IT organization should have a growth plan, put together to meet current and future demands of the Business. Any changes to the overall architecture of this plan must be carefully reviewed and the changes communicated.

As you have seen, my perspective always puts emphasis on Business Needs and Business Requirements. Technology for Technology’s sake can create a lot of unplanned work which will reflect in missed business objectives. This applies when IT is a service to an organization. When the business IS technology, then Technology for  Technology’s sake in the form of innovation works, but this is a very specific case. For most cases, IT is a service an organization needs to create business outcomes.

In my experience when IT Develops a work flow control mechanism it becomes more efficient and an effective business service. This in turn lets the organization focus in growth and new opportunities.

“We can’t work on the strategic when we haven’t mastered the tactical, we can’t work on the tactical when we haven’t mastered the operational”

I highly recommend this book to anyone in IT. The book also gives anyone outside of IT a good level of perspective on what challenges are encountered by IT teams almost universally.

It appears at Interop 2017 there was a discussion precisely about how IT works and how it must change from comparing data centers to factories instead of museums.  I found this on an a blogpost on Packetpushers. The comparison and the image presented reveal how for IT to move fasts and react quickly to business needs, it needs to move to an Industrial model for the data center operations.

The goal of any factory is to operate a production system with the following characteristics:

  1. Speed
  2. Controlled costs  (predictable cost is a key factor)
  3. Consistent quality (preferably high quality, but consistency is key)

Link to the blogpost on PacketPushers.net

ESX to NFS Store Connection Troubleshooting

As Network Engineers we get involved in all kinds of issues that on the surface they appear as communication issues underneath they are configuration or system issues.

In this post I will go over an instance in which an ESX Host was unable to mount an NFS store and it all appear to point into the network as an issue.

The ESX Servers have vKernel IP Addresses but only the management address is the one that has the default gateway. Therefore the Management address is the one used to communicate with IPs outside of the ESX Networks.

For this particular case the IP Address of the NFS target is 10.232.213.102 and the IP Address of the ESX Server is 10.231.222.14.

The NFS network target was a NetApp. The following ports were identified in prior NFS connections and by looking at online documentation:

TCP/UDP 111 – RPC Bind.
TCPUDP 635 – NFS Mount.
TCP/UDP – 2049 – NFS Server Daemon.

The base topology has a firewall for which a security rule is in place that allows the NFS communication as well as ICMP. Below is the base topology:

ESX-NFS-TSHOOT-2

In order to validate NIC configuration we enabled SSH on the ESX host. To do this open vSphere Client, select the Host and navigate to Configuration>Software>Security Profile> click on the SSH label and if the daemon is not running click on “Options” and then “Start” then click “OK” to close the dialog box and “OK” once more to close the Security Profile configuration window.

ESX-NFS-TSHOOT-1

We then opened a terminal client and used SSH to connect to the host using IP 10.231.222.14.

The commands listed below were used:

vmkping -I vmk0 -s 1472 10.232.213.102
This command sends an ICMP ping to 10.232.213.102 (NFS Target) using vmk0 as the source with a packet size of 1500-28(overhead)=1472.

-I Parameter is used to specify the outgoing source interface.
-s Parameter is used to specify the number of ICMP data bytes to send. This can be helpful when MTU size is in question. For a Jumbo frame configuration use 8972 since adding the 28 bytes of overhead this will result in a frame size of 9000.

RESULT: vmkping was successful confirming routing reachability and firewall rule.

esxcfg-vmknic -l
This command displays the vmknic. These are all the vKernel IP, it includes management, subnet mask.

-l Parameter is used to list the vmknics.

RESULT: In the image below you can see the management vmknic as vmk0 with an MTU of 1500 as well as another vmknic configured with an MTU of 9000.
ESX-NFS-TSHOOT-4

esxcfg-route -l
This command displays the Network Routes. On the output of the command you can see the default route is tied to the management vmknic.

-l Parameter is used to list the route entries.

RESULT:
ESX-NFS-TSHOOT-3

nc -z 10.232.213.102 2049
This command is the Netcat utility to test connectivity to an IP with a specific port number.

-z Parameter specifies that we will only check if the port is open and not attempt to make a connection.

RESULT: There was no output, the command timed out, indicating that the connection was not successful.

nc -uz 10.232.213.102 2049
This is the same command as previously but the -u parameter makes the connection test use UDP instead of TCP.

RESULT:  The output showed a successful port verification test over UDP.

ESX-NFS-TSHOOT-5

Our firewall logs show that the connection attempts over UDP and TCP were both allowed. This ruled out routing and firewall as a possible problem but it was not sufficient proof that the network was not the problem in this communication flow.

To further validate that the network was in good operation and that the NFS communication was not getting blocked we researched a way to perform a packet capture directly on the ESX host in question.

VMware provides a “tcpdump” utility to perform a packet capture. The ESXi command to run the utility is “tcpdump-uw” we used it in the following way:

tcpdump-uw -vv -i vmk0 -s 9014 port 2049 or port 111 or port 635 -w /var/tmp/NFS-PCAP-02222018.pcap

-vv Parameter indicates a that a full protocol decode will be used. A single -v would have indicated verbose output suppressed according to information I found online but I couldn’t get the verbose to work.

-i Parameter is to indicate interface to capture

-s Parameter is used to indicate the Snap Length or SnapLen, this is the packet size that we will capture. Since in this case we could be dealing with jumbo we specified a size of 9014.

-port Parameter is used to indicate the port number to capture. “and”, “or” operands are allowed but keep in mind that using multiple “and” ports will result in no data being captured.

-w Parameter is used to specify the file used to store the captured data. For our use case we stored the file on the /var/tmp/ folder.

To stop the capture we used CTRL-C.

The output looked as follows:
ESX-NFS-TSHOOT-6

In this case you can see that we captured 9 packets. To retrieve the file we must use the SCP protocol to connect to the server.

Once we got the file, we opened it in wireshare and identified the following:

  • The TCP Handshake happened successfully for protocol TCP 111
  • The port-map operation was completed successfully and the portmap to be used was identified as 635.
  • There was one retransmit for the portmap operation.
  • the TCP Handshake was successful for port 635.
  •  The “mount” operation was attempted over port 635 an error message was sent by the NFS Target. The error was  “ERR_ACCESS”
  • There was a retransmit of the error packet.

Below is a screenshot of the capture:

ESX-NFS-TSHOOT-7

Based on the message we were able to identify that the issue was located at the NFS Target. Once the configuration was verified on the NetApp Controller it was found that the permissions were not properly configured to allow the connection from the ESX server.

Once the permissions were corrected the NFS mount operation was successful.

Below are the online resources we used to identify this issue:

https://kb.vmware.com/s/article/1003967 
https://communities.vmware.com/thread/474503
http://www.virten.net/2015/02/esxi-network-troubleshooting-commands/
https://kb.vmware.com/s/article/1031186
http://docs.netapp.com/ontap-9/index.jsp?topic=%2Fcom.netapp.doc.dot-cm-nmg%2FGUID-49D0B88F-42CF-4766-A688-1C77A0AE8BD5.html
http://pubs.vmware.com/vsphere-6-5/index.jsp?topic=%2Fcom.vmware.vcli.getstart.doc%2FGUID-C3A44A30-EEA5-4359-A248-D13927A94CCE.html

 

SNMP Load Balancing using F5

Load Balancing an SMTP gateway can be tricky if the platform used is not well understood. In this post we will go over the iterations of load balancing I had to go through protect an SNMP Gateway that had two servers.

The Problem: Critical vulnerabilities identified with TLS v1, v1.1 and v3. Our SMTP gateway has a web portal for end-users to manage their spam, it also uses TLS over port 25 SMTP (Not identified on initial discovery) The SMTP gateway did not provide a method to address the vulnerabilities in TLS. They had to be addressed in networking.

The original configuration had the MX record on DNS pointed to three A records, two of the A Records pointed to the same IP Address.

SMTP-LOADBALANCING-1

The firewall configuration in place allowed for a direct Public IP to Server DMZ IP address translation to happen directly. PUBLIC IP1 Translated to NODE1 IP and PUBLIC IP2 translated to NODE2 IP.

SMTP-LOADBALANCING-2-1

Whenever TLS was used either for HTTP or SMTP the vulnerability was hit because weak ciphers configured on the NODE 1 and NODE 2 servers.

Solution: Place SMTP gateway servers behind an F5 Application Delivery Controller. Proxy all connections for TLS (443) and SMTP (TCP 25).

In our initial discovery of the SMTP Gateway operation we identified that two servers were present and due to lack of documentation it was presumed that both served HTTPS and SMTP traffic. An additional requirement was that we could not modify public DNS records since that could result in extended outages due to DNS Propagation.

NOTE: In addition to the F5 Application Delivery Controller a Firewall was present for public NAT and additional threat prevention.

The first iteration of the solution utilized the following:

  1. An http/https virtual server. We used an iApp to create the server and load the certificates and create the redirection from http to https. SSL Bridging was used.
  2. Manually create a SMTP virtual server.
  3. Retain the original NAT for PUBLIC IP2 in order to make sure some form of mail delivery continued while the transition to the load balanced virtual server took place.
  4. Used the same server pools for HTTP/HTTPS and SMTP.

SMTP-LOADBALANCING-3

The solution was immediately rolled back when the following issues were identified:

  1. NODE 1 and NODE 2 did not have the same capabilities. NODE 2 could not serve HTTP Requests.
  2. It was identified that NODE 2 also allows STARTTLS over SMTP. This required SSL bridging in order to also protect communication to NODE 2 and NODE 1 SMTP Communication.
  3. It was identified that both NODE 1 and NODE 2 require DNS and SMTP Outbound access to be able to send email out.

The next steps to address the issues identified were the following:

  1. Create a new pool in which only NODE 1 was included in order to handle the HTTPS requests.
  2. Add STARTTLS Support for the SMTP virtual server. This required loading the certificates and attaching the client SSL profile and server SSL profile to the virtual server. In addition an SMTPS profile was tied to the virtual server.
    1. New SMTPS Profile with Activation Mode set as “Required”SMTP-LOADBALANCING-4
    2. New SMTPS Virtual server configuration including the addition of the SSL Profiles and the SMTPS profile.

Once the Source Network Address Translations were added for DNS and Outbound SMTP the configuration ended as follows:

SMTP-LOADBALANCING-6

 

This configuration was considered stable enough to remain in production but another set of issues were identified that needed to be resolved.

  1. SNMP STARTTLS connection was not working properly for the SMTP virtual server. The connection was not completing and since STARTTLS was required, clear text was failing.
  2. The SMTP Gateways were not able to perform IP Reputation checking. Since “SNAT Automap” was used the connection source IP was being masked as the Floating IP Address of the F5.

To solve the issues above the following was done:

  1. F5 Support Recommended the removal of the Server SSL Profile. This way TLS will be offloaded and not bridged, connections to the SMTP Gateways would be unencrypted while connections to the F5 Virtual server will be encrypted.SMTP-LOADBALANCING-7
  2. Removed SNAT Automap the virtual server and set the configuration of Source Address translation to “None” In addition the SMTP Gateway servers had their default gateway configured with the F5 Floater IP.SMTP-LOADBALANCING-8
  3. The last change was the modification of the STARTTLS Action Mode option under the SMTPS profile. The option was to set to  “Allow” instead of “Require”Blog-Image-11

The modification of the Source Address Translation setting helped with the IP Reputation issue but broke direct communication to the SMTP Gateway server since the IP Default Gateway was changed to the F5. This configuration was rolled back

The solution to the default gateway issue was the configuration of an F5 IP Forwarding Virtual Server.

The function of an IP Forwarding Virtual Server is to respond to IP traffic for which the F5 does not have a socket (IP and Port) configured. This allows the F5 to respond to communications from the nodes to the rest of the network as if it was the Default Gateway.

Below is the configuration of the IP or L3 forwarding virtual server:

SMTP-LOADBALANCING-9

A new issue came up after configuring the L3 Forwarding virtual server and configuring the SMTP Gateway with their Default Gateway as the F5. Some of the communications were still failing. We could no longer manage the SMTP Gateways over the network.

Doing research we found the following:

If a different router exists on any directly connected network, you may need to create a custom fastL4 profile with “Loose Initiation” & “Loose Close” enabled to prevent LTM from interfering with forwarded conversations traversing an asymmetrical path.

Link to F5 Configuring IP Forwarding Virtual Servers

The new custom  FastL4 profile was configured as shown below and applied to the L3 Forwarding Virtual Server, this profile includes the “Loose initiation” and “loose close” settings enabled.

SMTP-LOADBALANCING-10

SMTP-LOADBALANCING-11

Once all these elements were in place the SMTP Gateway were able to communicate properly in their load balance and secure configuration. The final diagram of the communication flow ended up looking as shown below:

SMTP-LOADBALANCING-12

 

Aruba VIA VPN

In the past VPN Solutions were limited to connecting to an enterprise network, nowadays the requirements are changing. Diverse profiles matching specific use cases per user group and above all mobility have brought an increase to the requirements to VPN solutions.

In general, VPN Clients make your primary connection to the Internet a transport media by which a encrypted data barrier is established with the other end of the communication. If any of the peers in the communication does not agree with the authentication parameters the “data barrier” or tunnel is not established. When a tunnel is successfully established, the communication inside of it becomes encrypted and any attempt to tamper with the data is identified.

 

anyconnect-on-router.jpg

Most VPN Software clients are IPSec or SSL with SSL being the preferred method because it is a protocol normally allowed to communicate in the network. The VPN Client depends on the flavor of firewall the organization utilizes. Fortinet firewalls use fortigate clients, Cisco has their Cisco Anyconnec client, Palo-Alto has their Global Protect Client, etc.

The majority of us in IT and our end-users have experience anomalies or difficulties running this software clients. The reason for the difficulties some times boils down to the user having to manually establish a connection and interact with a piece of software that may be too abstract for them to understand.

In the past, the answer for making the end-used interaction with the VPN easier was to use automated logon scripts to have the VPN software establish the connection without the need for the user to open the client. This becomes problematic when corporations don’t have a standardized remote device policy and with BYOD clients.

The first VPN Client I am going to talk about is VIA from Aruba/HP.

vpn-1-revised_580x350.jpg

The Aruba/HP VIA offering appears to be exactly what a lot of people in my field have been looking for, a Zero-Touch user experience. The end-user does not have to know he needs to establish a tunnel, what a tunnel is, what client version he is running, what type of internet connection he is using. It is all pre-configure and managed by a centralized controller. The user simply powers on the workstation and begins to work, helping them focus on their job and not on troubleshooting a VPN connection. The client even automatically selects the best Internet connection to use to establish the tunnel.

VIA is multi platform, supporting IOS, Android, Windows and MacOS. VIA also offers a hybrid IPSec/SSL tunnel with military based encryption, this means that whenever forming an IPSec tunnel fails due to connection restrictions, the tunnel uses SSL as a transport method to establish the IPSec tunnel.

The architecture is simple requiring services already present in an Aruba/HP network such as Airwave, Clearpass and a Mobility Controller.

VIA as many other VPN clients recognize if it is in the enterprise network or outside in an untrusted network. Based on the network type VIA determines how it should connect. This can all be made transparent to the user and for them the experience is as if they were always on the enterprise network.

In comparison with Cisco Anyconnect, Palo Alto Global Protect the client offers a very easy to use interface.

Below you can see the connected client, a big green(connected) or gray (disconnected) indicator and underneath the type of connection that is being used.

One of the trade-offs from having a Zero Touch client is the lack of additional features such as Malware protection and Local Web Inspection but this may be consolidated by a Policy Controller and a Centralized traffic management approach.

In summary the VIA solutions has been well liked by our end-users due to the simplicity of the the interface. With clearpass it has offered a very easy to navigate method of troubleshooting authentication events. In combination with an Aruba/HP network the client makes the experience to the end-user a very good one.

For more information visit:

HP Aruba VPN Services

Cisco AnyConnect VPN

Palo Alto Global Protect