Powerful but Unpredictable – What Leaders Should Know About GenAI

Generative AI offers outsized opportunity, from faster development to broader security coverage, says Patrick Sullivan, SVP and CTO at Akamai. It also introduces novel risks by breaking traditional assurance models and opening new attack surfaces. The creative, non-deterministic nature of GenAI forces tough choices between innovation, predictability, and security – but Sullivan says a bedrock of first principles reduces the risk in the sector’s increasingly non-deterministic environment.

Transcription (edited for clarity)  

Elizabeth Heathfield, Chief Corporate Affairs Officer, FS-ISAC: Welcome to FS-ISAC's podcast, FinCyber Today. I'm Elizabeth Heathfield, Chief Corporate Affairs Officer at FS-ISAC. As generative AI moves from a cool, innovative approach to an absolute economic imperative, firms and teams need to learn to think not just about AI, but think like AI. Patrick Sullivan, SVP and CTO at Akamai, spent some time geeking out with me on how security teams can learn to harness the non-deterministic nature of AI tools.

Heathfield: Thanks so much for being here. I appreciate it. I'm super excited to talk to you about this because I know that we're both AI geeks. So let's talk about managing non-deterministic risk. Lay the ground rules, set the basics out here. What is non-determinism in GenAI models?

Patrick Sullivan, Senior Vice President and Chief Technology Officer, Akamai: Perfect. Yeah, I think it's important to set the table first. You know, I think when we look at the generative AI models that we're all so excited about, at their core, what they're doing is a lot of complex matrix multiplication and then trying to complete the next word with the most probable outcomes. Depending on how you configure things, it will be the most likely [word] or you could sample a less likely alternative.

But what that means and why we say it's non-deterministic is, if you run [matrix multiplication] with a set of inputs – and even though that model is completely static – there's no change to the system. The next time you run that, you're going to get a different set of results, right? And you run it again, a different set of results still.

In some ways, that's different from a lot of the systems that we traditionally have run. So there's a mindset shift that people need to wrap their heads around for things like security testing. If you have maybe a latent payload for pumped injection that doesn't detonate once, that does not mean that you have assurance that you're not vulnerable. Because the very next time you play back that exact same payload against the exact same application, it may detonate it, and you end up with a negative result. So I think it's important for people to wrap their heads around that.

And what I've found is that understanding that intellectually is insufficient. You really need to see this in action. You really need to have your fingers on the keyboard, your eyes on the monitor, and watch this phenomenon where you have a payload lie latent, and then you run everything identical, and then you see that have an impact. And then that helps you internalize what that means and the ramifications for security.

Heathfield: How should people be thinking about doing this, getting their heads around it? Especially at the enterprise level, right? I think it's important for people to know we’re not talking about chatbots. We're talking about more complex workflows, agentic workflows, and stuff like that.

Sullivan: Definitely. So I think the good news is that on any security team today for any [FS-ISAC] member organizations, I'm sure you have some eagles that are out there availing themselves of all of the abundant training that's available. And they're way ahead of this.

The area that leaders should focus upon is the baseline. How do you raise the baseline and make sure that everybody across the broader security compliance organization gets this enlightenment, right? And that includes everything, all the way from audit to your red team to your AppSec [application security] teams. Some of those folks are probably gonna be bored by the baseline training, but I think it's important to set a baseline. And maybe you do that with a workshop, an AI workshop, or an AI hackathon, where you get people very easily with their hands on and see that phenomenon. That seeing-is-believing phenomenon is important.

And then, somebody who's doing audit … what does it mean to have assurance? Just because a test showed green, how do you think about that in a broader sense? The next time you run it on a static system, might it be red the next time? You want everybody on board.

Heathfield: Okay. So we've set our table stakes. All right. So let's start and talk about some of the opportunities that you're seeing the financial sector leverage. Especially the opportunities with this new characteristic of GenAI.

Sullivan: Perfect. So I think we're seeing the business adopt AI technology at different paces in different organizations, right? Some leaders have come out and said, ‘We may be a mature organization, but we are not going to leave the opportunity of AI to the scrappy startups and fintechs, we're gonna avail ourselves of these opportunities.’ It's great to see that risk appetite.

But as they do that, one place that we may see organizations first deploy some of these AI tools is things like development co-pilots, right? We're starting to see some of the yield there with increased productivity from developers. There's a recent study that I read last week from one of our partners at Apiiro, where they said, as you develop GenAI co-pilots, a 4x increase in the number of updates to software as based on PRs that you commit. And then there may be a 10x increase on average across all of the various AST tools that are firing. A lot of those are duplicative, but in general, you're going faster, which is great for the business. But if the security team, AppSec in particular, doesn't find some way to automate, you're gonna get steamrolled by the business.

So that's an opportunity. I think another opportunity might just be coverage. We have so many alerts from all of our great tools firing. And if you read through breach reports, you often will see that there was a bad incident that happened. And it wasn't as though no tool fired an alert. It’s often [that] something fired over here, something else fired over there, but it just didn't make it above the threshold of a human analyst. So if we have more coverage, maybe from an AI system that can action some of those things, and drop that threshold of what gets more scrutiny, that's an opportunity for us as well.

Heathfield: So, okay, we've talked about some of the opportunities. Let's talk about some of the risks. What are the risks that you're seeing that organizations are having to contend with, and think about risk management possibly in a slightly different way than they had before with GenAI as they implement it? What are the risks that come along with that?

Sullivan: Sure, one is if we look at the AppSec domain, me personally, I’ve spent probably two decades dealing with AppSec risk. One of the core anti-patterns that we've seen when things go wrong with AppSec is when you commingle code or instructions with data, right? You want your data over here. Instructions over here, SQL instructions go in the database, that's a set of problems. Operating system instructions, if that gets interpreted, [there are a] different set of risks. And one of the things that we see is that with these GenAI applications, [people are] taking instructions and data and putting them in the blender, and it's a cocktail. That's problematic.

[That model is] the toxic trifecta. If you have external data – whether that's a query that's coming in or the training data that is entered into your system – plus internal data with a RAG or some other type of system that is accessing confidential data, where we have a lot of rules about how we preserve that confidentiality and who exactly gets access to that. Plus the third leg, which is a communication channel. API (Application Programming Interface) direct is sort of the worst case, but also access to things like email or software control modules, a public repo, Slack, any of these types of tools – when you amass those three in that toxic trifecta, that probably means you have some risk, right?

And you're going to need some really, really strong countermeasures to counterbalance that. If you can limit yourself to two of the three, that probably gives you a much better chance than if you see all three deployed.

Heathfield: Are you talking about this in reference specifically to tools that firms are availing themselves of that are built on top of the frontier models? Is that where the blender is happening?

Sullivan: It could be. The frontier models will give us that external data. So right off the bat, you have all kinds of external data, which could be manipulated by an adversary, right? If they know something that's in there, they can tickle that with an input that they generate. And then you combine that with the others, you find yourself in a difficult time. That's certainly part of the risk we're now exposed to.

Heathfield: How do you manage for the risk where it seems right, but it's not right? How do you control for that? We used to live more in an it's-on-or-it's-off, it's-black-or-it's-white kind of world. And now the model's returning something, and it all looks good, and it kind of checks out – should we be using more than one model? How should we be figuring out how we verify this?

Sullivan: Yeah, that's a tough one because I think these systems are often right, sometimes wrong, but always confident. There's a variety of techniques there. One would be a model that's supervising the model to look at variance from previous requests. Other systems, depending on the criticality, organizations say, ‘we're not ready to pull a human out of the loop yet. We'll have a human safeguard.’ To be fair, we often say in security, the human is the weakest link. So it is a little bit ironic that that's what we're falling back to. But those are some of the compensating controls.

But it's definitely something that we need to keep in mind, that we will see very confident responses that are very, very wrong at this stage in the maturity.

Heathfield: I want to talk about that in terms of this idea of how we know what failure is. And obviously there are evals. There's a whole world of trying to design basically AI checking AI. And then I've also seen this whole human-in-the-loop thing. It could actually end up taking the humans more time to verify it than it would have been if they didn't do it in the first place and didn't bring in the full new expensive system. There's a variety of ways to try to evaluate, to check, whether that's automated, whether that's human, or whatever. But we also have a lot of regulatory requirements. And how do you think that we should be thinking about keeping up with our just general GRC (Governance, Risk, and Compliance) requirements and everything like that while we're figuring all of this out, because … it’s gonna take a while.

Sullivan: For sure. You know, there's a tremendous velocity mismatch between the pace of innovation that you're seeing in this space – which is a faster half-life than anything I've seen in my career – and then the pace that we know that regulatory operates within. We certainly want to be empathetic to our friends in GRC who are managing this mismatch.

But at the same time, a lot of really, really strong regulation is based on first principles. And a lot of those things will carry forward into the space. You know, the basics – do you have a good asset inventory of where these technologies are deploying? We talked about deterministic versus non-deterministic. Certainly, it should be pointed out that if you know of the model and you have administrative control over that system, you can manipulate that. You can configure these systems, you [can] set the temperature to zero, you can force them to be deterministic.

And that might be an area where security and compliance, applying some tension to the business, could help. There are certain systems that may not need to be non-deterministic, right? If it's a quote-unquote boring application in a legal domain, or accounting, or some of these things, you don't need wild creativity, where maybe you want to favor predictability for an agent. There's other areas where some may argue that you want that wider sampling of less probable outcomes to generate what humans perceive as creativity from the systems. But I think asking the tough questions, and do you really need that level of non-deterministic behavior within a system, that should be challenged and a case provided.

Heathfield: Yeah, so you want to try to potentially make it as deterministic as you can. One thing I've also heard is people break down a task … this sub agent only does this task so you can actually trace when things break. Whereas if you just have one workflow of everybody doing like a zillion steps, it's very hard to see where something went wrong. So if you can break it up … into its tiniest component parts, then you're making it more deterministic, even though you're still having it all be automated.

Sullivan: Definitely. I think that's another great example. And that's an area where, again, some of the scrutiny that you get from security and compliance delivers a better outcome. That's almost like separation of duties, where you do some of those things in. We do have the benefit of having the bedrock of really, really strong first principles that we operate within. In some cases, treating the systems as though they're not different and imposing some of the traditional scrutiny will be helpful for us.

Heathfield: On the other hand, though, threat actors are leveraging the non-deterministic nature of this to be really opportunistic. They can spray and pray and hope that a creative outcome happens that we haven't thought of. If we get so deterministic that we limit the power, we're also potentially giving an advantage to the threat actors. Would you agree with that?

Sullivan: Yeah, I’ll go back to the AppSec example. You know, we traditionally have been building APIs and applications for a human to leverage in one way or another, guided by a search function. We may look forward and find that we're developing APIs and web applications to go feed an agent or a chatbot. If people are going to make a decision of which financial institution is most appropriate for them, given a set of conditions, maybe they're doing that within an agent or within a chatbot. And the way that we service those bots that inform the training, and the way that we respond to those agents, that may need different treatment to optimize for that use case, as opposed to what we do for a generic web visitor today.

That's the happy case, where we've got a valid customer behind some of this technology. Obviously, then you get into impersonators and people leveraging these same tools for nefarious means. And what we're going to have to deal with there is just more automation, more bots coming to our site, and wrestling with delegated identity. If I'm the legitimate owner of my identity, but I want to have an agent do things on my behalf, that's something that we're already starting to see, but we're at the very, very early days of that. That's going to present challenges for teams on the IAM side, on authorization. There’s a lot of work to be done there.

Heathfield: Yeah. This is something we talked about earlier in our panel, right? The new dimension of third-party and supply chain risk, or nth party risk. Because if it's agents talking to agents, our suppliers’ agents talking to other agents, there's no visibility at all into that. So you're like black boxes on top of black boxes, black boxes all the way down. So, how should we be thinking about that? We already had the challenge of managing supply chain risk. Now we're adding this whole other layer on top of it where we have even less visibility, you know? How should we be thinking about that?

Sullivan: The first challenge in front of us is that on the client side, if somebody's leveraging an agent, identifying it appropriately, giving it the right treatment, and validating that it is the delegated identity that we want to authorize – that's our first step. And then, on our back end, we're leveraging third parties that are then calling other parties; that’s how we manage our supply chain risk.

And I think one of the [FS-ISAC] member firms probably nailed that at the beginning of the year with Pat Opet's open letter to SaaS (Software as a Service) providers, really, really calling out some gaps in the ecosystem today. Some of that solution is going to come from SaaS providers providing more transparency of what's happening downstream. We're seeing adversaries forcing that. It was a very prophetic letter.

Heathfield: We've talked a lot about LLMs and that’s kind of the default, what everybody thinks about with GenAI, but you're also hearing a lot of chatter about the potential power of small language models. Especially when you have really highly specific use cases, maybe you don't need the ocean when a lake might work. Is that a potential path for managing some of this non-deterministic risk?

Sullivan: I think it is. I think this is another area where security is applying some tension to the business, asking those tough questions: ‘Do you really need a full LLM and all of the risk that goes along with that?’ Threat modeling against an LLM is absolutely crazy, right? Because you have to now think about, ‘Can I go ask a financial application how to build a WMD,’ right? And that could be part of your threat model, depending on what you train on.

As security teams challenge the business, as we talked about, [they are asking] ‘Do you really need to have a non-deterministic model, or could a deterministic model work in this scenario?’ And that's within the configuration control. Similarly, I think some scrutiny on ‘Do you really require an LLM, or would an SLM (small language model) suffice in this use case?’ may be another example of where security could guide the business to a better outcome for everybody, frankly. That's a question I see more organizations asking themselves. And I think that's a positive evolution that we're seeing.

Heathfield: Yeah, so there's a model. We've talked a little bit about temperature. There's also controlling the context, right? You don’t just let the context go on forever and ever, because that obviously degrades the quality that comes out. And the more output you get, the more inputs you have, the more expensive it gets. And then we obviously have the prompts, and maybe there are also ways of making those more predictable. There are ways, right? We have prompt templates, we have prompt libraries. If you make those more predictable, at least then the inputs that are going in are more regimented. So you may also be able to manage some of the non-determinism on all these different dimensions, right? One of the things that I am seeing is that we have to educate our people that there are all these different dimensions so that they know what the tools are. And it's not just one thing. What do you think?

Sullivan: Totally agree. If we look back historically, as we've embarked on new trends, unfortunately, we've seen the downside from a security perspective. When we started on the path of e-commerce, there was a whole wave of SQL injection. There were just entire databases flooding out web applications. [We learned] from that and [used] better behavior on how to inspect inputs to our application. And then when we went to cloud, we saw a similar set of patterns emerge. And I guess the hope here would be, as we embark on this trend, are there lessons that we can take from the introduction of some of these web applications hooked up to sensitive data on the backend? Or some of the lessons we learned with cloud? Can we learn from those prior experiences without having to see some organization really suffer, and then everybody else learn from their suffering? Hopefully, that's the case here, and we can reduce some of that risk before it leads to some really nasty breaches.

Heathfield: Okay, we've covered a lot of ground. If you could have one takeaway from this conversation that you would want people to leave with, what would that be?

Sullivan: You know, I think there are a lot of interesting points. I would say that the key is just ensuring that the information security team understands the basics. Back to, let's establish a baseline level of training across AI, hopefully hands-on, where people can kind of play around in a workshop setting or a hackathon. Because I think we have a lot of the first principles. And when we understand, we see these things, we'll know where to apply them. So I would say that's probably it.

FinCyber Today

FinCyber Today is a podcast from FS-ISAC that covers the latest developments in cybersecurity, contemporary risks, financial sector resilience and threat intelligence.

Our host Elizabeth Heathfield leads wide-ranging discussions with cybersecurity leaders and experts around the world who bring practical ideas on how to confront cyber challenges in the financial sector, improve incident response protocols, and build operational resilience.

Amid the clutter and noise, FS-ISAC Insights is your go-to destination for clarity and perspectives on the future of finance, data, and cybersecurity from C-level executives worldwide.

Patrick Sullivan

Patrick Sullivan is SVP, CTO of Security Strategy for Akamai. He joined Akamai's Security Team in 2005 and has been a leader working with leading enterprises to design security architectures and thwart...

Elizabeth is a storyteller at the intersection of technology and money. Layer in geopolitics and the criminal underworld and you get today's issues in cybersecurity for the global financial system. Crypto. Web...

3.0. Quantum. AI. Ransomware. Privacy. Regulation. Zero-days. Supply chain attacks. Developing new and diverse talent. How to protect the future of money. These are the topics Elizabeth asks top executives and experts in the field about on FinCyber Today.

Powerful but Unpredictable – What Leaders Should Know About GenAI

FinCyber Today

Patrick Sullivan

Hosted by Elizabeth Heathfield

Posts by Topic

WANT TO CONTRIBUTE?

Subscribe to Insights

FS-ISAC members around the world receive trusted and timely expert information that increases sector-wide knowledge of cybersecurity threats.

Site Map