Improving A Security Operation Center With Splunk
A longtime FedData customer wanted to modernize their existing Security Operations Center to provide a better user experience for their analysts and improve their capability to detect and respond to security events throughout the enterprise. The customer was flexible in their approach but had picked two companies as the base for their modernization: Splunk and Nutanix. FedData was already an existing partner with both companies and had Subject Matter Experts on staff to guide the customer through this moderation effort.
Our first task was to understand the existing environment and identify the areas that needed improvement. This involved talking to the customer to gather their existing pain points as well as reviewing existing technology deployments and their effectiveness. The list of pain points was long but centered on the limited capability of the existing SIEM solution and the lack of flexibility to answer questions about the current state of the enterprise environment. The existing SIEM was deployed over a decade ago when ingesting terabytes of data per day was not a requirement. As the years went by and the amount of data being collected skyrocketed, the existing SIEM deployment was beginning to show its age. While it was still functional, it did not allow the analysts to answer all their questions. As a result, analysts had to use several tools to piece together the chain of events which was both time consuming and required experience with multiple technologies.
The limit on the amount of data the existing SIEM solution could analyze restricted the view the SOC analysts had about the enterprise. The analysts wanted more information about the environment to determine if an event was real or a false positive. Lack of access to these critical data feeds lead to wasted time as analysts spent too much time researching an event only to determine it was a false positive.
After a complete review of the customers event management platform, FedData identified several areas that should be improved to ensure an optimal deployment of new SOC Splunk instance. The existing logging infrastructure was inadequate to continue to scale with the additional data sources required to give the SOC analysts a complete picture of the enterprise. Utilizing the existing infrastructure to feed Splunk would be challenging due to its use to duplicate source data feeds and its limited transport mechanisms. Adding new data sources was technically challenging when new collection mechanisms needed to be utilized. The organizational silos also represented roadblocks as several groups were responsible for routing events to various analytical platforms.
The existing SOC analytical infrastructure was difficult to manage. Little automation was in place so routine tasks such as patching and configuration management consumed a significant portion of the limited system administrators time. With the anticipation of quadrupling the size of this environment, the operations and maintenance methodology was critical. The customer knew the environment would require additional staff but was not in a position to have four times the current staff size. Automation was one of the keys to make this deployment successful.
Solid requirements are the key to successful projects. Instead of focusing on the size of the Splunk deployment, we focused on why we are deploying Splunk. Our initial requirements discussions hit the high-level requirements such as the types of expected output, the data sources involved in generating those outputs and what part of the organization owns those data sources. We also quickly came to the realization that a SOC analysts workflow was needed. This was a perfect fit for the Splunk Enterprise Security app. The customer also had organizational requirements focused on various technical and operational areas including security. These requirements needed to be built into the deployment from the start. Knowing these requirements up front also let us incorporate them into the initial design limiting our risk for implementing after the system was in production. With our requirements defined, we developed a solution architecture to meet each identified requirement.
Our solution combined several major technologies into a single comprehensive solution. !t started with the using of Nutanix to virtualize the entire infrastructure. Not only was the Splunk deployment going to be virtualized, the rest of the customers security team’s infrastructure was also going to be migrated to Nutanix. Several Nutanix clusters were deployed across multiple data centers to provide the base for all the security infrastructure. FedData worked with Nutanix to ensure all the identified functional and security requirements were met. Special consideration was given to the Nutanix infrastructure being utilized by Splunk. Both Nutanix and Splunk were consulting to provide the best configuration possible to ensure Splunk would work as intended.
The operating system of choice for this customer was RedHat Enterprise Linux. RHEL was going to be the base OS for the Splunk deployment and several of the other security tools riding on the Nutanix platform. Due to the significantly larger number of RHEL instances that would need to be managed once Splunk was deployed, FedData recommended the RHEL environment incorporate RedHat’s Identity Management product. This would allow better integration with the Microsoft Windows infrastructure while keeping many administration tasks within a familiar RHEL environment. RedHat’s Ansible product was also suggested to automate the deployment as well as the operations and maintenance to help limit the burden on the support staff. FedData supported this in-depth integration effort with staff knowledgeable in each technology domain to ensure optimal execution.
To address the identified data collection issues, FedData recommended a centralized event collection architecture be deployed. This “broker” would allow a variety of data sources to be sent to a central collection tier that was capable of scaling from tens to hundreds of terabytes of data per day across multiple data centers. The “broker” also controls event flow to the various analytical environments that are currently deployed, including Splunk. This allows the analytical tools to be highly available using their best practices instead of relying on backend replication that may not be possible with all toolsets. Another benefit of the “broker” collection architecture is that the organization can monitor the source and destination of each event. Intelligent decisions can be made as to which analytical environment is the proper destination or if multiple environments should receive the event. For tools that are licensed by ingest rate, this can be quite helpful in limiting license fees. This can also help reduce storage costs by limiting the number of times the event is stored across the enterprise.
These technologies all help support the SOC analysts protect the environment by making the Splunk platform available. Each technical component has its purpose in this deployment and the system as a whole is what allows the SOC analyst to protect the environment. Every day the customer expects Splunk to identify and track incidents with the Enterprise Security App and help the SOC analysts protect their valuable data. Splunk is the core platform that makes this possible but FedData helps its customers deploy these technologies in a secure manner with meeting their operational requirements.