Mastering Cloud Incident Response with Hilal Ahmad Lone

TLDR;

  • Having a defined Incident Response Process and Alignment among the team are key for a successful Incident Response Program
  • Monitor your MTTD and MTTR and use that to constantly improve your Incident Response Program.
  • When it comes to the Security of your organization, Open Source or Native security tools or vendor tools do not provide 100% coverage. It always needs customization as these lack the context.

Transcript

Host: Hi everyone, this is Purusottam, and thanks for tuning into the ScaletoZero podcast. Today's episode is with Hilal Ahmad Lone.

Hilal leads the Information Security Department at Razorpay, which is revolutionizing payment solutions for businesses. He possesses an extensive understanding of threat landscapes, risk management, and compliance. Hilal has demonstrated a consistent ability to implement and maintain effective security strategies. With over 16 years of experience, he brings in a robust technical background and expertise in network security, application security, data security, positioning him to guide Razorpay to continued success in an ever-changing threat environment.

Prior to Razorpay, he held positions as Senior Vice President and CISO at Dream11 Dream Sports and CISO at Traveloka as well.

Hilal, thank you so much for taking the time and join with me

Hilal: My pleasure, Puru!

Host: Okay, so before we start, do you want to maybe briefly share about your journey? How did you get into security and what excites you on a day to day basis?

Hilal: Yeah, so I started my career in security. I mean, it's the story that most folks actually who get into security have, right? So I started as a systems engineer, and did some networking stuff.

I started with actually as an email security professional because at that time, email security was a major thing, right? So you had a lot of issues with phishing and spam and malware distribution through email.

So at that time, I found it really interesting how to kind of like sanitize emails and all that. So got very interested in it.

And then over the years, like I worked with quite a few companies, very great companies and worked in almost all the domains in security. So I worked as a security analyst, like a SOC analyst. I worked as an application security professional. I worked as a network security guy. I worked as a security developer. I worked as a security architect. And now I'm CISO.

So my career has been basically I have seen all the dimensions possibly that you could potentially see, including basically the GRC aspect where you could go through the governance, compliance, and regulation because most of the time that become part of my job.

I was very lucky and fortunate to work with diverse industries. I, if you look at my profile, so I probably have not worked with two similar industries.

So all the companies that I worked were in different verticals. So that gave me a little bit more richer information into the security requirements in different landscapes and allowed me to have a broader vision in terms of basically how does it fluctuate the security requirement, the compliance requirements and what dominates what and what influences what.

So I think that was my journey in security. Yeah, it has been fantastic. I would not change anything about it. At the same time, I think it's very interesting because you always have something to solve in security. It never ends.

So the security journey is like you can never say that I'm done or I achieved something. There's a constant learning, a constant excitement, constant flow of problems that we need to solve. And a challenge keeps getting interesting. there's never a dull moment. So I think that's what I love about security.

Host: I love that you have touched on all the different types of roles and at the same time you have worked at different verticals also that gives you that as you said like diverse set of knowledge which you can apply when securing organizations. I hope we can touch on some of these aspects today. Before we get into the security aspects, one of the things that we ask all of our guests and we get unique answers.

Hilal: Absolutely!

Host: what does a day in your life look like? So what does a day in Hilal's life look.

Hilal: I wake up have a breakfast. I think that's the most interesting part But of course like then you go through basically different Priorities that you have for the day, right? So you want to this like Finish your meetings like there are endless meetings that you have to attend then you look at basically the any operational challenges that are there in the day -to -day activities that you need to resolve. You want to see the progress on different projects that we have undergoing at the moment. You also want to basically meet with people who are going to be in charge of those projects and see exactly like if we are making progress on those.

It all involves basically different things, consultation with my peers, stakeholders, my own team, and collaboration with different verticals in the business units in the organization. Making sure that okay the strategic elements are actually progressing but it does not happen on a daily basis but it does actually happen with quite a bit of high frequency.

Apart from that I think one of the most important aspects of my job is to basically make sure that my team has the necessary tools and necessary basically like privilege that they need to get things done. that's what the usual day to day looks like but most of the time like it's about ensuring that the team is actually focused they're actually making progress and at the same time we are also innovating in the team.

Host: So enabling your team for success. So that helps your organization to succeed as well. Love that. So yeah, with that, let's get started.

So today we are going to talk about incident response and detection engineering in the cloud and how emerging technologies are impacting it. So let's dive in.

So today most of the organizations have adopted cloud or are migrating their workloads to cloud.

Hilal: Absolutely. Absolutely.

Host: Considering the dynamic nature of the cloud environments and challenges it brings, planning plays an important role when it comes to incident response or detection. You spoke about making your team successful just a bit ago,

How do you structure and equip your incident response team so that they can effectively handle cloud -based incidents?

Hilal: So to begin with, think it's incredibly important that the team actually is on board with basically the incident response process that we are going to adopt. So which means that before we even think about basically doing incident response, the basic toolkit needs to be there.

Say for example, you cannot do incident response if you are not able to effectively detect incidents that are going on. So you need a platform that can detect incidents. So that platform needs to be something that the team is actually comfortable with. At the same time, obviously, there have to be like you need to have those playbooks that you're going to potentially going to use in an incident. Right.

So and also you want to make sure that you have some standard operating procedure, the SOPs are designed for different kinds of incidents. And of course, like then there is basically the team needs to understand including basically the preventive setup that we have, the reactor setup that we have, and how do investigations need to happen, so how the triage is actually needed to happen, so that needs to be established.

So the incident response policy needs to be there. Like for example, OK, if we are going to be escalating something, what are we going to escalate? And when are we going to escalate? So those things are extremely important. when actually starting the incident response, I think some of these things are going to be incredibly important.

The SOP is the play box, the platform that you're going to use, and basically the escalation policy that you're going to adopt. Because that will play a crucial role in terms of incident mediation.

So when you have a team basically completely insync with you with respect to all of these things, then I think instant response becomes a lot more easier.

Host: So I like how you structured it, right? You need to have alignment with the team from a process perspective. Then you have the playbook in place. Then you have the operating procedure in place so that you don't panic when there is an incident, but you have a defined process that you're following.

How do you approach developing this? And I know that it's not a one-time activity, right? You have to continuously maintain it. So how do you approach developing and also maintaining the incident response plan?

Hilal: So you obviously want to see like how are we improving with respect to our mean time to detect or mean time to respond to an incident. You always want to see if you are going to be improving on that. And then we actually understand that, if our incident is, sorry, mean time to respond is not actually coming down, why it is not coming down? Is there any challenge in our process or in our tooling?

So we want to identify that. If tooling can improve our MTDT or MTTR, so then obviously like we will invest in that, like let's see if, or we'll probably build something that will bring in automation with respect to incident response to basically bring it down.

So if it is process level, obviously change that need to happen, then the team basically provides feedback on what exactly is not working in the process. Is it basically like the response time from our stakeholders is not high or is that we are not getting enough information with respect to the incidents or there could be lot of different things that could potentially be happening in the process, right?

So we want to just make sure that that does not happen. So optimizing that incident response policy is incredibly important because it's something that we always think that's going to be important with respect to responding to incidents.

You really want to be near real time when you are dealing with this because If for example, you get this like here have an incident and you don't know what to do for like 15 to 30 minutes. That's enough time for an adversary to basically like keyword within the organization and impact the systems. So that's very, important to understand because we want to make sure that those kinds of things don't happen.

So keeping the incident response policy is going to take both the collaboration of the team as well as basically analyzing and assessing whether the current incident response policy is actually working for us and if we are going the right way with respect to reducing the time to detect or time to respond to incidents. I think that allows us to have that roadmap where we can actually devise effective incident response policy.

Host: So continuously evaluate yourself on the process and if you are improving, good if you are not improving, see the areas where you can improve and then keep maintaining your incident response plan accordingly. So for this, there are two parts, right? One you touched on is the tooling. The other is around the process part, right?

So do you see you need to work outside of your organization or do you need outside help or you should always do it -house?

Hilal: Tooling at a certain level can be managed within the team. Of course, you need support from IT, DevOps, and engineering to make it effective, but most of it can be managed within the team. that's on us. The tooling is on the security.

But the process is something that has stakeholders that are going to be external to the team. So you probably always have to work with other teams to respond to an incident. Because we may not always have the information with respect to different applications or data or identities and things like that. want to be, basically it's always important for us to reach out to other people to kind of like get information from them and help us actually respond to an incident.

So I think it's, that's why I think some of the things that we have done well is basically establishing an incident management team that has representation from other departments as well, which can be pulled in as and when it is required.

That way it is like pretty streamlined, but at the same time, our process is heavily influenced by stakeholders as well. So sometimes they also have this critical role of basically like disputing or disagreeing with the priority or the severity that we are set to an incident.

So they may not agree with or they may agree with or they may actually want to upgrade or downgrade the severity or this is the impact of the incident. So that's something that they bring in which is very valuable because then we can align ourselves accordingly and respond to incident with the severity and the priority that it actually requires.

Host: So what I'm getting from your answer is, even though security is the owner of some of these things, it's a very collaborative effort. It's not that security team is doing it behind a black box and nobody knows what exactly is going on from incident response tooling or from a process.

Hilal: Absolutely! That's absolutely correct, right. So, all of the incidents that basically we have to do, some of the incidents are straightforward like there's a malignant infection, we can take an action on that, right. But if there's a DDoS or there is something else, right, so which is basically which needs help from other departments. So, meaning that they are the right people to basically like kind of like get involved into that and basically respond to those incidents because they have much better visibility into their own environments.

And that towards as the agility and basic efficiency needed to respond to an incident. So, they are very critical to basic success of the incident response, incident response processes.

Host: Make sense. So when you are setting up security programs, you have to define these incident response plans. And I believe SOC 2s and like HIPAAs, CIS, things like that, they recommend that you have incident response plan in place.

So if I'm starting my security journey, setting up my incident response process, what are the three areas top three areas that I should consider when defining my security controls, say around incidents or vulnerabilities?

Hilal: Now, there a of things, right? So one is what is the sweet spot in actually saying that, this is going to be my reasonable time in which I respond to an incident. So which can be, as you mentioned, some of the regulatory organizations may ask us to have a particular timeline to respond to an incident. For example, in India, have a action group that has to dislike inform them within six hours of an incident.

So now that's actually demanded from them and that's something that we need to build in with our process as well to make sure that we respond to them in six hours. So that can influence on the speed and the SL is that we have for instance response.

But at the same time we also are basically aware of the fact that whatever regulatory requirement might be there so it will not always be the best for my organization.

Six hours is too long for us to respond to the incident. Right. So, meaning that when we are thinking about this like putting an incident response plan, it's always depends on what are the critical basically assets that we need to protect and what kind of recovery do they need. So, it all boils down to basically having enough information about different assets, critical, particularly the critical assets in our organization and how we can actually achieve those.

So, responding to events, making sure that, okay, we have a reasonable enough timeline to respond to an incident is very important. So, it can be influenced by a regulatory, but that's actually important because it allows us to basically enforce those controls.

But at the same time, it's also important to have like a good, basically, SLS for our own organization level as well. Because that allows us to put in the necessary tooling for security operations that we have. And it allows us to basically build programs that target that kind of excellence that we need in incident response process.

So to summarize, think the external influence is important that actually gives us some guidance with respect to creating an incident response process. But at the same, It should never be your basically not start your north star should be exactly what is required for your organization to recover from incident and How fast are you able to basically contend an incident? I think those are the guiding principles for any incident response process plan.

Host: Makes a lot of sense. Now, one of the things that you touched on earlier, right? Like having playbooks and everything in place so that you don't panic when you are in an incident, like debugging an incident.

So effective incident response requires experience and also quick thinking. And you have been doing this for a while. You have been in the security space for a while, and you must have seen a lot of incidents as well.

Can you share an example of a security incident and how you dealt with it?

Hilal: Yeah, so one of the things that's important to understand is that even though like I mentioned that you need to have playbooks in place, right? So you can only have a finite number of playbooks. So unless like you create some kind of automation that dynamically creates playbooks for you, you are not always going to be relying on playbooks.

So I mean, how many playbooks are you going to create? Like 100, 200, 500? Like you are always going to fall short.

So whenever we think of basically playbooks, it should be the first and the basic level of preparation for a response plan, meaning that whatever is obvious and whatever is basically like something that's binary in nature, for example, yes, no, allowed, deny, right? Something like that. So for those, can have playbooks, but when you deal with advanced incidents or things like that, it's not always going to be possible to create a playbook for everything.

Just a word of caution, so don't rely on playbooks all the time. Make sure that you have some automation orchestration in place to make sure that, you can actually respond to the incident if you do not have a playbook handy.

So now coming back to incidents, so one of the incidents that I recall way back was when one of our application servers was getting constantly hit by application layer DOS. You know the slow LORIS and things like that. So it was not exactly slow LORIS but similar kind of incident happened to one of our application servers. Not at this time, it was way back.

So at that time I was actually the incident responder. So unfortunately, so it was very difficult first of all to detect what exactly is going on because for application layer DOS, it does not come across as a straightforward security incident. Because it actually impacts the system components like memory, CPU utilization, disk space, and all that.

So it makes it unavailable for people to use, right? So we always tend to figure out, think that, OK, this is a system level problem. This is an availability issue. the teams need to basically fix it on their end. And they would actually put more memory. They would put more CPUs. They would scale it more, right? So it doesn't come done. Right? Because the more you put, the more it consumes.

So it was very unique way of sort of like Malware impacting our systems. Then somebody said, okay, like, let's do a deep dive. So I actually got on the call with the developer and then we started basically like doing a system analysis. Okay. When did it start? Like, started kind of like a drawing a trend. it started at this time. At that time, then we say, okay, what happened at that time? What were the actions taken at that time.

So actions taken at that time was somebody installed a package and that package was basically like, it did not come from the artifact that we use. So it had come from a different source, unfortunately.

I think he had good intentions to basically fix some things, right? But it turned out not to be the case. And that's the time it did got infected with the malware. Then we realized when we unpacked that basically the package and we realized that this was basically source prompt, very questionable. A source, right?

So then we analyzed it and I actually took help from different tools to analyze the signatures of that particular application. turned out that was an application later on. So we reverted the system back to its last known configuration, which was good configuration.

And it took us more than like 12 hours to basically do all the debugging and all the investigation to figure out, okay, it was not a system level problem. It was actually a security problem.

And that incident I can never forget because there was no playbook for it, right? So we never thought like we'll have playbook for it, right? So that's when we had to kind of like do the trend line. We had to kind of figure out, okay, what were the changes made to the system and how did this happen?

Fortunately for us, did not spread and we were able to contend that incident. So that was a very interesting because it did not have any known IOCs also. There were no indicators of compromise that you could actually look at and that would point us to a particular investigation method, right. So, that was very interesting for me at that time.

Host: So it's funny, right? It was a very small change. Like a developer with good intentions maybe updated a package. And that led to this incident, right? So which shows that there is no, it's not always clear that it is coming from one place, right? You have to have enough monitoring and all of those things in place so that you can figure out and also you can roll back if needed. As you said, like you had a previous artifact, you could roll it out while you were debugging what's going on, right? So that your production systems didn't come down.

So from this incident, what were some of the lessons that those were learned?

And you mentioned that you guys created a playbook after that. Yeah, what lessons did you learn out of it so that our audience can utilize that?

Hilal: Yeah! We thoroughly conduct an RCA, so root cause analysis for that, which is very important. So like you mentioned, we created a playbook that said, OK, if this kind of incident happens, these are the steps that need to be taken, and don't ignore any of those things.

But at the same time, what we also did was we set up alerting, which earlier we had alerting that, if the threshold goes beyond 80, we get an alert. So then we also started basically setting up alerts with automation in place for incidents like that.

So for example, if there is a sudden burst of basically consumption of resource on a server or application, so we would immediately get notified about that. So which means that somebody on the incident response team needs to take a look at that. So earlier it was not my team that needed to take care of that, right? So because it was considered like an availability issue or a performance issue,

But then we categorized that as a security issue as well. So meaning that when an incident like that happened, the engineering team, as well as my team, will get on the call. We created that SOP for that particular incident. And then we said, if something happens that like this happens in the future, incident response team has to be involved from the start so that we don't lose time. And that's, think, the RCA creation of Playbook, creation of an SOP. I think that is important after every incident.

And of course, like make sure that, okay, what worked for you. For example, as you mentioned, we had a last -minute configuration available. So that worked for us. And how do we basically optimize and improve on that? Preparedness is also important.

Host: Yeah, these are some good suggestions, right? That basics at the end of the day, these are basics which you need to follow having your artifact, artifact repository so that you can always go back having a clear communication plan, having a pre book and things like that. Makes sense.

One one thing that you touched on, which is around the package and things like that, right? You are a big believer in open source.

How do you let's say scrutinize an open source library, let's say if you want to use or if you want to upgrade, how do you scrutinize and evaluate it so that you can use whether you should use it or you should not use it. What parameters do you use?

Hilal: Yeah, so this is a very difficult question to answer, right? So one of the things that's important is basically, you know, people talk about creating software block materials and things like that, right? So to ensure that, okay, they have visibility who is using what software and all that.

But I think what really works is basically when you validate packages when they are getting installed or when they are getting downloaded or things like that, So you want to ensure that your developers actually know what is basically like authorized and what is not authorized. So you need to give them that education. You need to make sure that, like they have at least a basic hygiene check before they actually install any package or download any packages.

Now the other thing that works really, really well for us is basically creating golden images for different kind of software packages. So we have created a lot of golden images, and that works really well because what happens that you have a hardened and which has like baseline security when you are defined those images that you can actually deploy and those do not necessarily have any kind of external weaknesses in them.

Like so if for example you want to upgrade that you still need to upgrade the golden image you don't upgrade it right there on the server. So I think that works really really well. So meaning that one of the things that is extremely important is basically you cannot potentially create golden image for every single requirement.

So that's why you need basically monitoring basically as well and testing as well for different kind of resources that you use. So before the code gets committed, make sure that the libraries are scrutinized and they are sanitized. And you want to ensure that they're actually listed in the basically whatever SBOM or whatever is getting created to make sure that okay the versioning is actually proper if it is upgrade it's proper and it goes through different levels of scrutiny like say do like static code analysis and things like that that actually works right.

And particularly for libraries it works really well because you have known signatures known weaknesses in libraries and it will not allow you to basically commit code and that has vulnerabilities in them.

It's a difficult, it's one of the most difficult things to do in security to be honest, to make sure that everybody uses basically appropriate security mechanisms and choosing a library. But these are some of the things that we can potentially do to reduce the risk that actually comes from third party or open source software.

Host: Yeah, I totally agree. Like when it comes to using open source, like majority of world software now runs on open source as well. So you are not away from some of these issues. You have to deal with it. And most of the supply chain issues are because of some of the open source dependencies. Maybe they are not well maintained or they are not, if there is a security vulnerability, it is not fixed quickly, and things like that.

And you highlighted on monitoring slightly. So to dig deeper onto it, like we reached out to Aseem Shrey and he had this question.

What do you suggest for continuous monitoring for AWS and multi -cloud organizations? And how effective has OSS been in that?

Hilal: So, I'm a big believer of open source software and data security right. I could name a number of tools that we use from over source, I mean for continuous monitoring and things like that. So, some of the unique things that we have done, for example, we use Falco for runtime security or Kvernel for basically containers.

So, what they do is basically some of these tools actually allow us to customize or with our own requirements. So we do a lot of engineering on these systems. So when we get Falco, which is from Sysdig, so when we get something like that, so we do not basically go with the best functionality that they offer. We analyze what potentially, how it can be optimized, how it can be improved, and how does it actually fit in our environment.

So when actually we put that tool in place, it gives the good results to begin with. So because you get good monitoring and it alerts you, you have different good integration with different systems and you get actually started very well.

But at the same time, what happens is that you hardly get any kind of contextual information that you really need. Say for example, like, how are the, is the behavior normal or not? Is the communication that happens between different clusters is basically like normal or not? So you normally do not get that kind of contextual information.

What we did was basically we combined the capability that the open source software actually came with the system components. For example, we had ABPF and all that we actually leveraged for understanding if there is anybody trying to basically like export or kind of like exfiltrate any large amount of data. So we could actually like stop that from happening.

Similarly, along with Falco plus eBPF, we get good visibility. We get good contextual information about different things happening in our environment with respect to our runtime security. So that's the constant security. get constant monitoring actually applied on that. So that works really well. But it requires a lot of engineering. It requires a lot of effort from our side to basically customize it.

At the same time, there are other tools. So a lot of other tools that we use in different scenarios allow us to basically customize and basically fit them to our needs. So we did lot of work with Semgrep, we did lot of work with other libraries, independent bodies, things like that, to ensure that they work according to our requirements. So it's not like you will get one solution that will actually solve all your monitoring problems. That's a fool's errand.

But at the same time what happens is that like you get enough visibility across your most demanding workloads Where you need to have constant monitoring so you can get actually like those things done So if somebody had to ask me, okay, so but of course like it's not possible for a set ops team or an applications here team to keep Visibility across all of these tools and ensure that okay, we have good continuous monitoring on our environment particularly in the cloud.

So my answer is very simple. Let's figure out a way to basically centralize all of that. So create a data link or create something, a central repository where you can all send all this data and then put an analytics platform on top of that. Like you can use different tools. I mean, you don't have to use AWS tool or whatever.

Even though like I'm a big fan of native security right so meaning that whatever works for you is just adopt that for us the data lake works really well so we create queries on top of that data lake and then we create a dashboard so different open source platform again Grafana and things like that so meaning that like Even though you have different problems solved for different types of tools But at the same time you can still have a centralized platform that can be obviously it's like engineering intensive. It's a resource intensive.

And you may not get it right the first time, so that always happens. But there is a way to basically accomplish all of that through native security. You don't really have to buy very expensive tools to basically accomplish all of that. So in different scenarios, different tools, it's up to you, like what do you want to adopt? And what do you want to work on? And what is the motivation for your team to basically do all of that engineering and customization? But in the end, it really works. It really, really works

Host: The key point that I got from your responses, customization, whether it's open source or native tools or any tool that you bring in, the vanilla version will not provide you all the coverage. You have to customize because it lacks the context of your vertical, of your company, of your unique architecture, things like that. So yeah, that's a very good point that you have.

So I just want to like pay to the next topic, is around emerging technologies. So we spoke about incident response and you have been in the industry for some time, you have seen different trends. There is a new shiny technology called Generative AI now and most folks are raving about it. And there is a constant battle between Google, Meta, OpenAI, Anthropic and others to build better models and also to provide better capabilities using it.

So now, can I use these generative AI models or capabilities to generate incident response playbooks or forensic analysis, things like that?

And would that solve all of my pain points from a security standpoint?

Hilal: First of all, nothing in the technology world right now can solve all my security problems. So no matter what you, I mean no matter what somebody brings to the table, it's not going to solve all my problems. So we should be very realistic about how it can help us.

So there are definitely a lot of good and effective ways we can use generative AI in security challenges, right? So one of the most effective way to use it is to basically optimize your incident response process altogether, right?

So if you go back like six years, seven years or 10 years, right? So running a query on an SIEM used to take days, right? So, and correlation was always suspect. I never believed that it did any correlation.

All right, so why that is important is because correlation always happened on systems that were in scope and that were under monitoring or observability. So you have two systems like an application or a network instance and due to correlation, if there's any correlation between them that there is incident or there's a security basically compromise over there.

So that never worked. mean, I want to really basically see one person who said that correlation worked for him. So that was A to B correlation, B to C correlation, C to A correlation. So it did not work. So then people started doing auto correlation.

So auto correlation, will basically create an inventory. And on the inventory, will randomly basically create those correlations. So point is that when we have those legacy systems, these are actually still prevalent. So the correlation that they do and the analytics that they provide is always going to be questionable.

I really feel that it's not just because of performance. I can live with this giving me data response or whatever. But what I could not live with was basically the accuracy of those results. So was always like, is this accurate enough? So that was always challenging because we were basically working on data that they used to parse and they used to enrich and they used to do all of that on the platform.

And it was always challenging to basically say that, this platform really works for me in incident response. Now, how that has changed with the help of generated AI is first of all, it has reduced the time it needs to do analysis. So what a query if I run previously will take a day, takes like minutes now because it good contextual information about the data. It works on that data in our format. So I get those results pretty quickly.

Other thing is basically like earlier, you had to have a person who knew SQL queries, YAML queries, and things like that to kind of like understand or run queries, basically, and get results, right? Now you don't need that. You can run natural language queries, and you can actually like, basically, help. But able to do this, anybody can do like an analysis on an incident. Only it usually requires sophisticated skills.

So, So it has definitely brought like a lot of efficiency and it has brought a lot of agility and faster resolutions and things like that. And it has helped reach the skill gap as well. But what it has not done and it will not, I don't think it will be able to replace when you are doing detection engineering, for example, because you have to do anomaly detection or you have to do behavioral analysis and things like that.

So for that, you need very, very good way of basically interpreting the data and making sure that like you do not just basically like look at the data that's with you right now or in the past but you also want to basically like see how the trends have changed over a period of time how the tech levels have moved from different timelines so for those kind of scenarios where you want to do behavioral analysis or you want to do any kind of advanced basically detection engineering that involves more than one data set potentially or involves basically looking into the resources and things like that.

It does not actually help with that. And you also mentioned basically does it help us in actually creating playbooks or SOPs or forensics, whatever. It definitely does, but you cannot just take it and use it. You still have to basically customize that and figure out if this is really the correct one that you are going to use.

So you need to basically make sure that it is basically usable before you, and it can get the results that you actually want, or you can implement it in your organization without causing issues. You still need to do that reviews to make sure that it doesn't cause more problems than solutions.

So there is benefit definitely, but at the same time, I would advise strong caution when actually using something like that in security incident response.

Host: Yeah, so I hear the same thing when it comes to code generation as well, right? Like using these Gen .AI tools, you can generate code as well. But they also have the capability of hallucination, right? So it's always recommended that even though you get some code, you should test it out properly before you put it into a production system. So what you're saying is very much in alignment with that, right? And I totally agree on the natural language part.

Like earlier, If you want to query your database or query your sim systems, you need to know the native language of that platform. But with chain AI, that challenge is gone in a way, right? Now you can write natural language and that gets translated and then system starts giving you the relevant output.

So we spoke about emerging technology. How do I stay up to date to understand what is happening, let's say in the threat landscape or in the incident response space. How do I stay up to date and can I use GenAi for any of this?

Hilal: Maybe, I don't know. Like, GenAI works in that context. I have not used Jani yet to stay up to date. I mean, I wish it could do that. I would definitely love to use it. But I think it's important that you are plugged in the right kind of communities where you get the information related to threat trends or emerging technologies as far as securities or anything that's actually innovative happening in security world.

So I would rely a lot on some of the things that are really going to come from authentic sources who do not have a bias towards a particular product or an organization or a company. There's a lot of different things that you can potentially do to keep updated.

And I think the community works really well. So you should figure out. I mean, all of us should look for communities that are working a lot with respect to basically making sure that they get the right kind news, kind of information or updates in security products. Updates with respect to incident response processes and things like that.

So I think that will work, that works really well because in communities what happens is you get a lot of information from different sources that you do not necessarily have always access to. And you may not, somebody experiences something and then they share that. That actually is very valuable with respect to basically keeping a person up to date.

Then of course, then the usual thing, like you read blogs, like one of the blogs by, I really love the blog by Phil Venables, like from the Google Cloud CSO. He is super great at writing those things. I think it makes sense, at least to me, it makes a lot of sense. And then there a lot of different other sources that provide you with necessary information.

I think we could read a lot on different white papers. I'm a big fan of I mean, I did it basically white tapes and technical specs of different products because that gives me a kind of like a hint on how the security world is actually moving for example, if the tech specs actually include something related to say Elastic so I said, elastic is being used in security now and let me explore that.

Or it actually said that, we are going to use a lot of Gen. AI, insecurity incident response. So I would go and look it up. And I think those news feeds and amazing RSS feeds that actually come to you on a daily basis, that allow us to basically stay up to date.

And of course, being hands -on, there's no substitute to being hands-on. So you really want to explore that technology to get hands -on. That's the best teacher for you and that's the best way we can actually give ourselves updated.

Host: Mm -hmm, totally. One of the things that we started with was how to build a team and things like that, what tools maybe your team needs and things like that.

So a follow-up to that is what skills and expertise do you generally look when you are building a high-performing detection engineering team?

Hilal: Yeah, I think, see, one of the things that I believe is that it's not everybody's cup of tea, basically doing detection engineering or even being part of the Incident response team.

So I think it's one team that I believe requires like really smart people, really, really smart people who think on their feet, are able to basically create something out of nothing, are able to basically grasp technologies really, fast and can.

And you need to have a lot of common sense. I cannot stress it enough, like how much common sense that they need. So meaning that they need to be street smart, making sure that, okay, and they need to have calm personalities, like you are going to be under pressure all the time. So how do you handle that pressure, making sure that you keep yourself calm and make other people calm as well.

So that's there's an amazing quality that you want to look into a person. Of course, there is no substitute for technical expertise, particularly in direction engineering. You should know your shirt.

So for example, if you need to understand machine learning, need to understand machine learning. You need to understand analytics, you need to understand analytics. You need to understand web servers, you need to understand web servers. So these are the things that are irreplaceable.

So not only does this person who is going to be working in detection engineering need to be reviewed good at technology, but he also needs to have very strong behavioral characteristics like making sure that you actually invoke the right kind of escalation for example, because you're going to escalate and when to escalate something you are going to be making good decisions within a snap.

So, and you may not always have the luxury of calling people and asking them for advice or recommendation on what to do in the particular scenario. So, what is the best you can do at that time? So, you will be tested on that. So you need to be really mature about making about the decision making process.

So all those things are extremely important when you're actually hiring teams for detection. I know it's a long list and long ask and sometimes it's tough to find people like that, but it's well worth the wait.

So for example, if you can find somebody like that, you can rest assured that, okay, he's going to give you great results and just nurturing them making sure that okay they have the right tooling is always going to be like something of a great, great fit for the organization and the team.

Host: Makes a lot of sense and totally in agreement like especially when it comes to detection engineering You you really need smart people to figure things out one of the things that you highlighted was around like stress and things like that, right?

So in this year's Gartner security conference, a Key statistics were presented which was like 70 around 73 percent of CISOs and security leaders Felt burnout at some level in their work life.

What's your take on this? How do you handle stress and burnout and any tips for first time security leaders?

Hilal: That's absolutely true, right? So because There are a lot of things that are riding on you, right? So it's a lot of stakeholders who expect you to be at your best every single time All right, so you cannot see as I say so or a security leader you cannot afford to be seen as somebody who does not take this to the company security or the employees security seriously.

So which in turn needs basically a lot of things that are expected out of you. And you have to kind of like manage your executives, your board and your stakeholders, but you also have to manage your team and the team is going to be highly opinionated, particularly the security teams. So they have their own basically challenges and then you have own.

So I think. there's a lot of burnout in terms of this I don't think is the workload of that's like massive it's more about expectations of the role that caused the burnout.

So the other thing I think how to deal with that is basically like there are some good things that you can do you can compartmentalize and basically prioritize your basically KPIs that you need to basically achieve so that's one thing that you can do compartmentalization of KPIs number one priority.

The second thing can also do is empower people around you. So empower the teams to basically take decisions. Empower the teams to basically make the right kind of calls when they need to. Give them the right time and up to date and give the right kind of support that they need.

So which means that most of the things that are going to consume your time is not going to consume your time anymore because you have a very efficient team that you empower. To be missing in those parts when you actually delegate a lot that allows you to basically have flexibility and freedom to focus on things that are not going to be operational heavy.

Alright, so you can focus on strategy. You can focus in vision you can focus on Educating people you can focus on the branding of the team and things like that So that's actually very important. So the compartmentalization plus basically making sure that You sort of like empower a team.

Also, take time out for yourself and invest in you your own growth as well. So for example from time to time whatever this like helps you for me for learning a new skill for example. So I invest in that like that actually keeps me grounded at the same time take every scenario at its own merit.

So never panic when you see an incident happening or something not going the right way because take time to basically decide what you're going to do about it. Right? So it does not, nothing is going to basically, the world is not going to come to an end if it cannot take an investigation like in one hour or two hours.

So now there is time for you to basically prioritize that as well. So I think if you are able to basically have, if you have, if you have opportunity to adopt some of these things, I think the stress levels can be managed. But I do agree with that there are a lot of high-stress situations.

Host: I love your advice. Often sometimes when you're in the moment, you are like, go, go, go. You need to figure things out. But you should also take a step back and analyze and plan your own personal growth, not just think about how to fix the next security incident.

So yeah, that's a great advice. And that brings us to the end of the episode.

But before Before we end the episode, have one last question, which is, do you have any recommendation, any reading recommendation, or a blog, or a book, or a podcast, or anything that you would want our audience to take away from this episode.

Hilal: Of course, particularly for leaders, I think and I think it will benefit everybody. I told earlier also that Phil Venables actually writes very, very good blogs, right? So amazing. mean, most of these articles are like amazing. They make a lot of sense to me. So I would strongly recommend actually following him and reading his stuff.

Then there is a Bruce Schneider, all his books. I think you should read all his books. They are so full of wisdom not just technology wise but also he explains things in a way that resonate with me. For example, he says, okay, what does a food crisis have to do with cryptography, for example. That's a very unique way of actually putting challenges and understanding that that's really, really great. Then of course, like Bruce also and his newsletters are great and they're very thorough, very detailed.

I don't know how you guys time to write that but those are like amazing. Then I think among the books I think I've lately not read a book but I did read basically some of the other books which are like around Salaf, Dalloc and all that for example.

We read books from Outshins All for example, very nice book.

Then lot of different things. mean, I read a lot of history. I don't think people actually really expect that. But I read kind of, so I think it's a mix of technology plus self -improvement. Yeah, non-tech.

So you should diversify your reading habits. That way you get different kind of context to life also. And I think from tech side Bruce and Phil Venables, if you follow those guys, you are set. You will be like so wise after reading those stuff from those guys. Strongly recommend actually following.

Host: Okay. Thank you. Thank you for sharing that recommendation. When we publish the episode, we'll make sure to tag so that our audience can start reading these blogs and podcasts as well. So yeah, that brings us to the end of the episode.

Thank you so much, Hilal, for coming and sharing your insights with us. Thank you!

Hilal: Puru, it was my pleasure and thanks for asking the relevant kind of questions, right? So, and thanks for your knowledge as well. I think you are very accomplished and it was a privilege talking to you.

Host: Thank you so much for your kind words. Thank you. And to our audience, thank you so much for watching. See you in the next episode!

Hilal: Thanks a lot!