Architecting Log Insight as Log Management Platform

I shared in this blog the reasons why your VMware environment needs a proper Log Management Platform (LMP). There are criteria that an enterprise LMP needs to have before it can manage your VMware platform.

This blog shows how you can architect Log Insight to help you manage a large scale and distributed VMware environment.

Let’s look at the requirements, and this time we will show how Log Insight meets the requirements.

Global visibility

  • It is common for business to have branches. This can be small remote site with just 2 ESXi Host, or large data center with multiple vCenter servers.
  • You want to be able to see all the logs in one place. This makes it easier to query and analyze too.
  • Log Insight can be deployed as Forwarder in the remote site. All it does is simply forward the entries.
  • Log Insight can forward different rules to different servers. This is useful because the Master sites are active/active across 2 data centers.

Remote site traffic should be compressed and encrypted.

  • The WAN link is typically a constraint and shared, so compressing the traffic gives you higher chance that your syslog arrives at the HQ safely. Log Insight Forwarder compress the syslog entries and use its own proprietary protocol to send it to HQ.
  • Log Insight also provide protection when the link is temporarily down. It holds the logs locally.

Disaster should not result in loss of visibility.

  • The LMP becomes a critical piece of your infrastructure. As a result, you want to protect it with DR. Using replication technology results in a active/passive architecture. It also means you are using another technology, which has no awareness of the LMP.
  • Log Insight allows you to have application-level active/active setup. We achieve this by getting the Forwarder to forward to both Global DC Log Insight instances. The 2 main Log Insight instances are completely independant of each other.

The LMP should scale to thousands of ESXi Hosts.

  • You achieve this via clustering. Log Insight has built-in active/active clustering. In the architecture below, the Main Log Insight instances are all clustered.
  • The local Forwarder is not clustered as their role is to simply forward data. They do not hold weeks of data (1 week is more than sufficient, as you should not have 1 week downtime). Also, you do not login to the Forwarders, so they do not have to handle queries. Some queries, especially against a large data set, are resource intensive.

The LMP should handle special users or use cases.

  • One use case of log is Audit and Compliance. Your vSphere provides a wealth of info that auditor or security team want to see. Unlike the rest of the data in your vSphere, this data need to be kept for years.
  • Most data in vSphere only need to be kept for weeks. Take performance or availability data. After 4 weeks, the data is unlikely to be relevant. If they can wait for weeks, then it’s not an issue 🙂

So…what the architecture look like? Below is an example.

Log Insight - Overall Architecture

The resultant architecture results in a lot of Log Insight instances. This is where vRealize Operations come in. You can create a custom group for all your Log Insight VMs. You can then manage and monitor them as a group.

I hope you like the above architecture. The next question will be, how do you test it? Below is a sample setup you can use in the lab to validate the architecture.

Lab setup

Hope found it useful. Here is a great write up by Manny Sidhu explaining his experience. Customer says it best!