AI Security Agents in the SOC: Co-Worker or Co-Risk?

Fri, 01 May 2026 05:09:15 +0530

As the number of AI agents increase in corporations, we can’t expect that cybersecurity would be the only domain left behind. Many VC-backed startups like Simbian have been launched, and Google also announced a partnership with Wiz for cybersec agents, etc.

Firstly, what is an AI agent, and why is it different from our plain ol’ automation? Automation follows a set of predefined rules to complete tasks—fast, consistent, and predictable. AI agents, on the other hand, operate with autonomy: they can reason, adapt, and make decisions based on dynamic inputs. For eg: if the goal is to parse a document and then send a mail with the extracted data. If we create an automation script, we would need to first use a parsing algo and then define words that need to be extracted using key-value pairs or regular expressions. After that, we also need the exact template with substitutions defined for the mail to be sent. Whereas if we create an AI agent, we need to use an algo focused on extraction and just define the expectations, like write a mail with a formal tone or make sure the mail is concise. The agent will use its autonomy to draft the mail and send it. Both the AI agent and automation script would still need access to the mailbox, permission to read the doc, permission to send the mail and authentication credentials. Pretty cool to see how far NLP algos have come but also a bit terrifying.

Challenges we may see if we use SOC-AI agents:

One of the key challenges AI researchers are working on is Explainability. According to ISO/IEC TR 29119-11:2020, it means the level of understanding how the AI-based system came up with a given result. Let me elaborate with an example: If we consider a decision tree algo or K-means, as a researcher/developer, one can look at the input and give a reasonable guess why a certain output was derived. But what about LLMs or Facial Recognition algos like FaceNet, at any given point a researcher or expert cannot say with certainty why a certain output was derived. You can try it yourself if you ask Claude to write 5 sentences on any topic, like “Tell me about the movie Jurassic Park.” Now, how can you say with certainty what 5 points it will cover? It can talk about actors, directors, movie reviews, CGI or anything.

This is different from interpretability, which is basically the level of understanding how the underlying (AI) technology works, which we can explain, like Claude is an LLM which is based on Transformer architecture. Or FaceNet uses a deep convolutional neural network to learn a mapping/embedding from a set of face images to a 128-dimensional Euclidean space and assesses the similarity between faces based on the square of the Euclidean distance between the images’ corresponding normalized vectors.

So in the context of SOC agents who have more autonomy, tool access, LLM access, etc., how do we account for explainability in a way that is accessible to all the analysts?

The idea that the system responsible for monitoring everything and generating alerts uses a mix of probabilistic and deterministic algos to derive answers is a bit scary. The probabilistic algos come into play during the first half, where the agent tries to assess the telemetry for context. They apply behavioral baselines and knowledge graphs to assign risk scores and dynamically select response paths based on asset criticality, rather than following fixed “if X, then Y” rules. As you may know, any system is not 100% accurate, and a probabilistic system may lead to errors that have a huge impact on business. Not a big deal in the grand scheme of things, but has anyone noticed that since we started using AI agents for coding in corps like AWS and Microsoft, there have been more outages? Big organizations may be able to absorb the impact, but smaller orgs will face more challenges.
With AI being the IT-girl of the industry, it is also the prime target of hackers. The constant attempt to use AI agents or shared LLMs for attacks and malware propagation is very real. Imagine the system you depend on for alerting you of these incidents and exploits and attacks itself gets compromised because of a shared model or Agent-to-Agent sprawl or even a sci-fi-style rebellion.
Lastly, since AI agents have more autonomy, their mistakes are also unpredictable. As per many experts across domains, note that the agent is generally reliable and doesn’t make errors, but when it does, it’s pretty hard to trace the error because the steps taken aren’t logged. In a controlled environment, with limited variables or just 2-3 use cases, it is still manageable, but in prod, where the SOC system handles 1000s of use cases and logs, attack signatures, etc., things become challenging. Also, the butterfly effect, where a small mistake is insignificant in itself, may lead to business outages or legal repercussions.

Then what is the solution? Because analyst fatigue is real, and with fixed security budgets, any help is appreciated. The solution is human-in-the-loop, where the analyst uses SOC for summarization, data extraction, references, and alert generation, but at the end of the day, the analyst makes the final call to dismiss or act on the alert.

Risks on ~bhavya

AI Security Agents in the SOC: Co-Worker or Co-Risk?