Monthly Archives: February 2016

SDDC Network Monitoring

Thank you Sasha Velednitsky and Hsien-chung Woo from NetFlow Logic for contributing this post!

Monitoring Network Metadata in Real Time

Network devices are rich source of information about the network’s traffic, in the form of NetFlow, sFlow, or IPFIX formats. This metadata is voluminous and most valuable for operational and security purposes.

You get the best insights when the data are captured and analyzed in real time. This is where the data processing engine in NetFlow Integrator comes in. It can process hundreds of thousands of these records per second. Users can apply a myriad of solutions to understand the health and robustness of their networks, as well as the imminence of security threats. The results of NetFlow Integrator processing and analytics are then visually displayed via vRealize Log Insight.

Most network management tools use LLDP or CDP protocols (designed for topology discovery) to reveal network device connectivity, and do not identify the actual network traffic. On the other hand, NetFlow Integrator’s analytics are based on real network traffic. A useful analogy: if you are driving within a city, a city map will be helpful. However, it is much better to have both a map and a depiction of the traffic congestion, so you can navigate more efficiently.

SDDC Monitoring

One of the biggest operational concerns for IT Operations and SDDC Administrators is the lack of visibility between the virtual and physical networking layers — how to trace and troubleshoot connectivity issues. Typically, SDDC management tools monitor virtual network devices, such as vSphere Distributed Switch (VDS), Distributed Logical Routing, Distributed Firewall, Edge Services Gateway, and others. What if a performance degradation or outage is caused by physical device failures or overloading?

How do we know where virtual network traffic is encapsulated, and how it traverses the physical network?

Legacy tools break down at the virtual to physical boundary. Lacking correlation between logical and physical networks leads to longer time to resolution, and unacceptable outage time frames for many customers.

For complete visibility you need to collect and analyze flows from both virtual and physical devices. Luckily, most vendors support some sort of flow generation technology (Cisco  – NetFlow, Juniper – jFlow, Dell, HP, Arista, Brocade – sFlow, VDS – IPFIX).

Configure all of your flow-capable exporters, such as Top of Rack switches, core and aggregation switches, routers, and virtual switches (e.g. as VDS or Open vSwitch) to send NetFlow/sFlow/IPFIX to NetFlow Integrator for visibility of virtual and physical networks.

Network Counters

NetFlow Integrator accepts network flow data, applies algorithms to the data to extract the information needed to address desired use cases, converts the processed data to syslog, then sends that useful information to other systems for visualization. The granularity of these counters is configurable.

Network bandwidth is typically consumed by a relatively small number of users or applications. With NetFlow Integrator and Log Insight, SDDC administrators can identify which applications are using the most network bandwidth. Log Insight dashboards, shown below, provide this information by source IP, destination IP, ports and protocols.

1 2 3

Micro-segmentation enables organizations to divide SDDC logically into segments, and to implement security groups and firewall rules down to workload levels.

East-West network traffic patterns by application ports and protocols enable administrators to plan and implement micro-segmentation using VMware NSX.

As NetFlow Integrator receives flow information from physical network devices, it reports network bandwidth consumption by each physical network device interface. The following counters are provided:

  • Traffic In Rate (Bytes/sec)
  • Traffic Out Rate (Bytes/sec)
  • Relative load %
  • Packets In Rate (Packets/sec)
  • Packets Out Rate (Packets/sec)
  • Relative Packets Rate %

Virtual traffic is encapsulated at Virtual Tunnel End Point (VTEP). For each VTEP the following counters are provided:

  • Traffic In Rate (Bytes/sec)
  • Traffic Out Rate (Bytes/sec)
  • Packets In Rate (Packets/sec)
  • Packets Out Rate (Packets/sec)
  • Flow count

Advanced Analytics

Application performance and availability could also be impacted by a variety of factors, such as DDoS attacks. Sophisticated DDoS attacks are notoriously difficult to detect on a timely basis and to defend against. Traditional perimeter-based technologies such as firewalls and intrusion detection systems (IDSs) do not provide comprehensive DDoS protection. Solutions positioned inline must be deployed at each endpoint, and are vulnerable in case of a volumetric attack. Typically, solutions require systems to run in a “learning” mode, passively monitoring traffic patterns to understand normal behavior and establishing a baseline profile. The baseline is later used to detect anomalous network activity, which could be a DDoS attack. The building of these baselines takes days or weeks, and any change in the infrastructure makes a baseline obsolete, resulting in many false positives.

In contrast to systems relying on the baselines, NetFlow Logic’s Anomaly Detection – Traffic solution is based on flow information analysis. Thus it is not susceptible to volumetric flood attacks. Additionally, since it does not rely on baseline data collection, NetFlow Logic’s anomalous traffic detection solution can be operational 15-20 minutes after deployment.

1

NetFlow Logic’s solution is based on statistical and machine learning methods and consists of several components, each analyzing network metadata from a different perspective. Results of these analyses are combined and a final event reporting decision is made. The result of this “collective mind” approach is the reduction of false positives.

Monitoring changes to VMware vSphere Template

Template is a common features used by many VMware Administrators. There are articles such as this on how to manage the version. So I will cover something that I could not find in google, which is how you prove to auditor that your templates have not been modified by unauthorised person. If a template has been modified, you want to know who did it.

The good thing is there are only a few things you can change to a template. The bulk of the changes require the template to be converted into a VM. The changes you can make to the templates are shown below:

0 what vCenter captures

You can see that you can rename the template, change the permission, and convert it into a VM. All these are tracked in vCenter. This means a log analysis tool can visualise it better for you.

Let’s see who rename it. Perform a text search on template and renamed. You can see an example below.

1

As most changes on template require the conversion into a VM, let’s see who converted a template into a VM, and vice versa. Log Insight already has a field for it, so it’s a matter of specifying it. Choose the field, and specify that it should contain mark*

summary of changes

In the above, I only have 1 template that I changed. You can see that it captures information such as who did it, what time, to what template and in which cluster.

If you want to see only the changes to VM, you can filter the field further, as shown below.

who made the VM a template

Hope you find it useful in entertaining, I mean assuring, your auditor team.

Announcing VMware Performance & Capacity Management, 2nd edition

Img-5235

I have been waiting for a long time to be able to post this. The book started around Dec 2014, when the writing of the 1st edition was complete and the publisher did a cut off date for changes. I knew many items were not covered. It was 1.0 after all.

Fast forward to February 2016, and we have revamped the content. From both the amount of effort, and the resultant book, this to me is more like 2.0 than 1.1. Page wise, it is 500+ pages, doubling the 1st edition. You can see the structure of the second edition in this link. I have tried to codify the knowledge I have into a structured process.

It’s a surprise how much things changed in just 14 months. I certainly did not expect some of the changes back in Jan 2015!

  • Major improvement in monitoring beyond vSphere.
    • vRealize and its ecosystems have huge improvement in Storage, Network, Application monitoring. This includes newer technology such as VSAN and NSX.
    • Many adapters (management packs) and content pack were released for both vRealize and Log Insight. I’m glad to see thriving ecosystems. Blue Medora especially have moved ahead very fast.
  • Rapid adoption of NSX and VSAN, that I had to add them. They were not plan of the original 2nd edition.
  • Rapid adoption of VDI monitoring using vRealize. I had to include VDI use cases.
  • Adoption by customers, partners and internal have increased.
    • In the original plan, I wasn’t planning of asking any partners to contribute. So I’m surprised that 2 partners agreed right away.
    • It is much easier to ask for review, as people are interested and want to help.
  • vSphere 6.0 and 6.0 Update 1 were released.
    • Since the book focus on Operations (and not architecture) and monitoring, the impact of both releases is very minimal.
    • Not many counters changed compared with vSphere 5.5
  • vRealize Operations had 6.1 and 6.2 releases. Log Insight has many releases too.
    • Again, this has minimal impact, since the book is not a product book.

Release Notes of 2nd edition

The existing 8 chapters have expanded and reorganized, resulting in 15 chapters. It now has 3 distinct parts. The 3 parts are structured specifically in that order to make it easier for you to see the big picture. You will find the key changes versus the 1st edition below.

More complete

  • More explanation on Performance and Capacity Management.
  • Elaborates the Performance SLA concept as it has resonates with customers (from engagements, events and blogging)
  • Add Network monitoring, with focus on NSX
  • Add VSAN monitoring.
  • Add Horizon View monitoring. Practical tips like this.
  • Incorporate monitoring that is better done via VMware Log Insight
  • Add application-level monitoring. I asked Blue Medora to contribute as they know this better than I do.

More practical, less theory.

  • Move contents that are more theoretical to the back.
  • Add more examples, and structure them so readers can see the relationship.

Easier to read

  • Less of long sentences, long paragraphs or complex tables.
  • More bullet points.
  • Break long chapters into smaller chunks.
  • Add more white space on places where it’s full of text.
  • More diagrams, to complement explanation.
  • Lighter words. Friendly chat among friends, not formal research paper.
  • Add humour.
  • Add adult picture. Ok, this is not a good idea.
  • Clear picture. Some pictures were too small.
  • Clearer heading and layout. The style heading 2 is relatively too big to the text.

Fix title

  • The book is actually not just for vRealize Operations user. It’s for the broader VMware team. This is more of a vSphere book than vRealize book.
  • This would also make the darn title shorter 🙂 Yes, in future this will evolve to just SDDC Operations Management.

What does not change?

  • It remains focus on Performance and Capacity. I’m not adding Configuration, Availability, Security, etc.
  • The book will also remain a solution book, and not a product book. There is already a great product book here by fellow CTO Ambassador.

I will provide as many free information as possible, that the Publisher allows me. We are looking at early April publication, so in the meantime, here is what they have made available. When they have officially released it, I’d add more information, such as proper acknowledgement to those who have made the book possible, and certainly discount code.

If you want to publish a review in your blog or LinkedIn, I'll link you with Packt.

I hope you find it useful. Any correction and suggestion, let me know at Twitter or LinkedIn.

Capture

PS: No, please don’t ask me about the 3rd edition. Right now I need a break 🙂 and to spend time with family! Below is wife, my 2 girls and my 1st niece.

11057344_1674393329473505_6520074326172583340_o