You’ll be able to understand what are the functions of a security operations center so you have a clear overview of what your job demands and the key responsibilities/roles you’ll be fulfilling.
Alert and Monitor
SOC analysts might receive thousands of alerts every minute, from numerous different network devices and appliances and across many protocols and activity types. Handling all this by yourselves is impossible, but we can use a centralized repository to manage all this.
Security Information and Event Management (SIEM) is a software that aggregates and analyzes information from several different sources across the entire infrastructure. So, instead of going through every security appliance’s console, you get one centralized solution to find the alerts and react as necessary. This solution not only centralizes the information, but also perform deep analysis over the information to find hidden patterns and see if your company is under attack.
For example, a single activity on a firewall might not be suspicious or might not throw an alert. However, when this activity is combined with other actions registered in the antivirus, the intrusion prevention system, the data loss prevention and many other devices, this could actually show that a malware infected the system.
As you can imagine now, a SIEM is just as good as the information it receives. Sometimes, users have the misperception that a SIEM, by itself, is good enough. However, if you don’t feed the SIEM the appropriate tools, you’ll be blind. This is known as “gray area”. As an example, consider a SIEM that does not receive logs from the antivirus nor the IPS, and only receives logs from the firewall. If the antivirus detects an activity, the SIEM will never see it and you’ll never receive the alert. This is why it’s very important to have a clear understanding on the company’s network and what appliances are actually reporting to the SIEM.
The only problem with the SIEM technology is that even though it makes it easier to control the alerts you’ll still get several alerts about various and different attacks. Some of these attacks might need immediate attention and some of them might not.
To deal with all of this, we have the orchestration software along with well-defined playbooks. Security Orchestration, Automation and Response (SOAR) will help you with threat and vulnerability management, security incident response and security operations automation. It’ll bring order to the alert chaos and you’ll be able to see what steps come next and what actions need immediate attention. Now, a SOAR is useless without a list of playbooks. In simple words, a playbook is a well-defined set of steps and roles so all the members of an organization have a clear understanding of their responsibilities and steps before, during and after a cybersecurity incident. The playbooks can have any format you want. They can be a file describing the steps, a workflow image, a combination of both and many more.
As you can see, a SIEM, SOAR and a well-defined list of playbooks can really help with the Incident Response process. By using all of them together, you’ll be able to collect threat-related data from a range of sources (in this case, it could be only from the SIEM) and automate responses to many threats and alerts.
For the incident response activities, you also need to establish and follow a clear incident response process. This process might vary from company to company, but here are a common set of stages:
This might be the most important steps in the process as it entails creating the strategy to tackle the incident. Some common actives performed at this stage are:
- Creating incident response policies and procedures
- Stablish communication paths
- Create a continuous improvement plan and tests
This is where the SIEM, SOAR and PlayBooks come handy, as this is where you need to assess the alert and see if it’s a false positive or if it’s, in fact, a malicious activity.
As this stage, you try to minimize the threat and the impact. This can be done by minimizing the paths a threat can take (e.g. disconnecting the affected machines from the network) or by applying specific countermeasures.
This is where you start trying to eliminate the threat once and for all, for example by formatting the affected systems or bringing the system to a non-affected state (e.g. a previous snapshot of a virtual machine).
Sometimes, companies need to move their operations to another location due to the incident. This step is where you bring everything to the original operational state.
This is where you document everything, from beginning to end. This way, you’ll have a knowledge base if the incident happens again.
Now that you have all the information from the SIEM, SOAR and other tools, it’s time to take a deep dive into the problem. This process is commonly known as root cause analysis (RCA). A RCA is the process that will lead you to the first or main factor that caused a problem and should be permanently eliminated to avoid future occurrences. In other words, the RCA is the process to determine the core issue that will trigger the entire cause-and-effect reaction that ultimately leads to the problem(s) or incidents.
Each problem might require different actions, like escalate to a threat intelligence analyst, follow a specific playbook, create a report, and so on. If you found a malicious or suspicious file during the investigation, you can rely on free online tools such as:
These tools will provide valuable information about the suspicious file you detected.
As you enter the cybersecurity world, you’ll realize that is impossible to win this battle all by yourself. You need, can and should collaborate with other teams around the world to take advantage of all the intelligence you can get. These teams are often known as ISACs (Information Sharing and Analysis Center). ISACs are organizations that provide a central resource for gathering information on cyber threats and also allow two-way sharing of information between the private and the public sector about root causes, incidents and threats, as well as sharing experience, knowledge and analysis.
You can use frameworks to safely share information about an incident (anonymizing critical information, of course) such as the VERIS Framework. The Vocabulary for Event Recording and Incident Sharing (VERIS) is a set of metrics designed to provide a common language for describing security incidents in a structured and repeatable manner.
You can also collaborate and rely with local and global CERTs and CSIRTs. A computer security incident response team (CSIRT) is a team dedicated to perform the steps we just described. Hence, they contain precious information about incidents and you can take advantage of that knowledge base to apply countermeasures or handle alerts. The term CSIRT and CERT were used interchangeably in the past. However, the Carnegie Mellon University trademarked the CERT acronym and now it’s a certification your team needs to earn.
Last but not least, you need to recover from the incident. This might entail a minor task, such as just closing the alert, performing an AV scan, rebuilding the system/computer or might entail following a large disaster recovery plan.
A disaster recovery plan (DRP) is a process that describes how an organization can quickly resume work after an unplanned incident. A DRP is an essential part of a business continuity plan (BCP). It is applied to the aspects of an organization that depend on a functioning IT infrastructure. A DRP aims to help an organization resolve data loss and recover system functionality so that it can perform in the aftermath of an incident, even if it operates at a minimal level.
Now, the DRP is purely focused on technology and the BCP is focused on the business. As an example, consider that the company lost the ability to sell online since their webserver is down. The BCP policy can dictate to continue selling over the phone, a messaging app, email, etc. The DRP, on the other hand, will try to restore the webserver’s availability to continue selling online. In other word, the BCP is focused on the business and the DRP on the technology. This is why having both is very important for the incident response and recovery.
As a SOC analyst, you’ll see several alerts about different events every day. You must know what tools are available to you, how to use them and provide the necessary action to mitigate the incident.