Nov 20, 2024

Unraveling the Mysteries of Privacy Engineering: A Deep Dive with Apoorvaa Deshpande

TLDR;

Privacy engineering focuses on implementing privacy best practices for users and organizations. The core focus is user's privacy and responsible handling of their data. Finding the balance between experience and fatigue is key.
Privacy by design enables teams to incorporate privacy engineering practices to each steps of SDLC. Ensuring less rework, better prioritization and to avoid hard conversations with other teams. Communication plays a major role in this entire process.
For gen AI data privacy, promising technologies such as secure training and inference, machine unlearning are being developed. For today, companies should invest in data sanitization, data governance and using synthetic data.

Transcript

Host: Hi, everyone. This is Purusottam, and thanks for tuning into Scaletozero podcast. Today's episode is with Apoorvaa Despande. Apoorvaa is a senior privacy engineer at Google Cloud working on privacy design, analysis, and data governance for Gen.AI products. Prior to that, she was a tech lead at Snap Inc. with a focus on unlocking utility for feature teams while maintaining a high privacy bar.

Along with privacy reviews and analysis, she led the design and execution of several innovative privacy-enhancing technologies, and products at Snap. And prior to that, she completed her PhD in computer science in cryptography from Brown University.

Apoorvaa, thank you so much for taking the time and joining with us for the podcast.

Apoorvaa: No, thank you. Thank you for having me, Purusottam. Yeah, it's a pleasure.

Host: Absolutely. So before we kick off,Anything else you want to add to your journey? Like what motivated you to get into privacy engineering or how did you get into privacy engineering? If you want to highlight that.

Apoorvaa: Yeah, that's actually a fun story. I was close to my graduation. I was finishing up my PhD and I was thinking of what's next for me. Academia was definitely on the table, right? I was considering a few postdoc offers as well. But I was also curious about industry and how cryptography works in action in real world. And so for that, I attended this conference called Real World Crypto, which as the name suggests is about how cryptography is applied in industry in real world. And so there I attended this talk by Snapchat by my then future manager about privacy engineering, about end to end encryption in fact. And that was kind of fascinating that opened up a new world for me. And I really was, know, it was a new world that cryptography can be applied in such ways in industry. know, these are such interesting problems at scale.

So yeah, that was kind of how I got connected with the team at Snap and I ended up choosing to go there after my PhD. And now in hindsight, like privacy engineering has been such a good fit for what I love and also what I'm good at in terms of expertise.

Host: Yeah, that's definitely like a unique story, right? Where generally, When it comes to cryptography, the first thing that comes to mind is maybe encryption and things like that. How do you map it to privacy, data privacy, user privacy and things like that? And then making that jump, right? Leap of faith that, yeah, maybe cryptography can be utilized in a better way in privacy as well. And I guess that after hearing the talk that motivated you to get into that field and you have been loving it, it seems.

So now let's say; Like you have been working in the privacy engineering space for a while. What does your day to day life look like?

Apoorvaa: Typically, the things I do vary a lot. But I would say they fall mainly into four major buckets, I would say. And this is from my experience both at Snap and Google. So one of the main buckets, I would say, is privacy reviews and analysis.

So understanding products and features deeply and how they are affecting privacy, what are the components to it in terms of the data flow, and then kind of thinking deeply about what are the different surfaces, what could go wrong, so that attack or that having that mindset, that's one component.

The second is then once you identify what could go wrong, then how do we fix it? What are the solutions? And that's kind of also, I would say the most interesting bit to me, how can we solve those problems using some of the privacy techniques like cryptography? So that's where I get to use my...research background and also learning more.

And then again, the exciting part there is to then have new proposals, innovations, and then I've worked on a couple of patents and papers at Snap as well. So that's kind of the second bucket problem solving.

And then the third is related. Once you have that problem, a solution, how do you execute on that? How do you design, architect, you know, with the right teams? How to take that collaboration forward?

And the fourth, I would say is more communication because privacy is such a cross-functional activity that writing, and articulating your thoughts well, whether it's your proposals or your analysis, design docs, and a heavy part of it is also just cross-functional meetings and alignment.

Host: I think communication plays a major role for every engineer's life, right? Especially in case of privacy, I can imagine touching every part of the organization, right? If you are a user facing app, then it's the user experience, your data, then there are legal aspects, things like that. So you have to work across different teams. communication has to be one of the primary skills that you would need. So yeah, totally. That makes sense.

Apoorvaa: And Privacy Engineer is in between all of these. Exactly. They're almost playing a role of first understanding the requirements from the product and engineering, what are we trying to do, then kind of understanding the legal perspective there, bringing that user privacy perspective and then using all this to actually define the solution space and then define the problem space and then coming up with solutions.

Host: Yeah, totally. Today, I hope we can touch on some of these areas. So today we'll be focusing primarily on privacy engineering and privacy enhancing technologies. One thing that I want to ask before we get into some of these questions is, do you call it like P-E-Ts or PETs? How do you call them?

Apoorvaa: PETs, yeah, PETs is how they're usually referred to. So we could use that.

Host: So I'll also refer that during the recording. So to begin with, so. There is a growing trend of privacy being one of the fundamental parts of system design. Let's say some examples I can take is let's say Google's privacy firstly designs across the product suite or Apple's design initiatives where they give access to control to the users over location or data sharing and several other aspects of privacy, particularly in the GenAI we are in. There are so many more examples. But before we dive into some of these, Maybe let's start with some of the fundamentals. Not many folks are familiar with privacy engineering.

So can you help our audience understand?

When someone sees privacy engineering, what does it mean?

Apoorvaa: Yeah, there are a couple of definitions around privacy engineering. And of course, they have a lot in common. But the definition that I like or I personally go by is I feel a privacy engineer should be a technical advocate for an organization for their end users privacy. So a technical advocate within an organization for the organization's end users.

So privacy engineering in turn, feel is systems and processes that enable this advocacy, right? And so let's understand what I mean by advocacy, right? So for advocacy, or advocacy of user privacy is basically how user data is treated, how the interactions with the users are treated inside the organization, how the data is collected and used.

So then that goes back to what I described in terms of the day-to-day, right? That first is about understanding and reviewing features, understanding the attack surfaces, a major component of privacy engineering.

Then having in terms of the systems, it's also having the right annotations and labeling for data. Then if you want to have a proactive approach or rather you should have a proactive approach which can include red teaming to some extent, what could go wrong in terms of user data and the privacy.

Then as we talked about what, how do we solve for this if there are some issues or if we want to use data in a privacy safe way or in a way that respects users privacy, then PETs is an option that we talk about. So thinking about that or having the infrastructure to design those solutions, that's part of privacy engineering.

The other systems and processes involve having a mechanism or tooling to do these reviews, analysis, having infrastructure for problems that keep recurring, let's say deletion at scale, or some libraries, pets libraries that might be needed often to solve problem statements, yeah, basically having that, you know, also the set of advocates or other allies within the organization who understand the importance of privacy and keep that in mind when they are doing their work.

So that's how I think of privacy engineering as it has evolved now.

And from a user standpoint, it's more about how privacy engineering is about thinking about whether we are being transparent to users, whether users are understanding well how their data is used or processed, whether they have enough agency choice control over their data.

Host: I love how you highlighted that what organizations and within the organizations, how you work with different teams, you have advocates for privacy and also from a user's perspective, what should be my expectation from an organization that they are, when it comes to my own data, right?

Now you highlighted some tooling and libraries. Do you have any examples that you can share? Like what type of tooling may be used by privacy engineers or what type of libraries may be used by privacy engineers?

Apoorvaa: I mean, it varies across the organizations, but most organizations, at least fairly mature ones have some tooling to manage the review process, whether it's something like JIRA or internal bug tracking, ticket tracking systems, and then have a specific documentation format for engineering designs and then the review notes and then kind of to record that, this design, these features have been approved by the privacy engineering team and legal.

So kind of to manage that workflow, most places have some tooling. Now it could be custom or I know there are some standard tooling by some companies that also people do opt for.

Because privacy reviews are typically blocking, yeah, so those, features, there has to be documentation or paper trail that this feature, this has been approved both by engineering and legal. So typically we need a formal way of recording all that.

Host: So like when it comes to engineering and security, often folks look at security also as a, even though it's an enabling function, very similar to privacy, it's seen as a blocking function, right? In a way. And privacy takes it to the next level where it has to be reviewed not just by security leadership, also like privacy leadership, but also there is legal aspects that come into picture.

So, Similar to privacy engineering, there are some additional concepts which often are used interchangeably, even though maybe they mean different things, like things like privacy by design or privacy by default.

Do you see them as the same thing or do you see them as different things and how are they related to each other?

Apoorvaa: Yeah, so privacy by design is essentially a framework that I feel privacy engineering is trying to implement within an organization. So privacy by design, the framework is essentially about embedding privacy into the complete product development lifecycle and basically keeping a very proactive approach towards privacy that it shouldn't be an afterthought, but it is something that organizations and engineers should be thinking about at every step right from the conception of a feature and then having that embedded in the design. So that's one key aspect of privacy by design framework.

Then the second, as I mentioned, being responsible stewards of user data. So the analogy I like to think about is it's as though a friend has trusted you with their know, a secret or something, a sensitive piece of data.

So that's how we should be thinking about user data and treating it with respect and using it very responsibly, using it only for what you said you will use it, deleting it afterward and communicating that clearly to users. So this is all part of privacy by design.

The other very interesting aspect which we were just touching upon is privacy by design talks about this positive sum or win-win approach, right?

So when I mentioned that privacy reviews are blocking it, what I meant was it's more that the approval has to happen before the feature is launched.

But in general, I know that privacy or security can be looked at as overall blockers. But the framework itself encourages everyone to think of privacy or overall think of what is a win-win approach, right? Not look at user privacy in silo.

And this is something I also truly believe in and that's what I have followed personally. I don't believe in you know, saying no to teams or you know how this is not a good idea. Sometimes it is not, but then it is my job as a privacy engineer to come up with a solution to find something that works for everyone.

Host: Yeah. And I think I can see a lot of parallels with security also, like security leaders also are seen as a team of NO, but at the end of the day, they are trying to ensure that whatever engineers are building are secure, right?

And they're enabling you to maybe write better code or write secure code. But I liked how you phrased that. I don't want to say no, rather I want to work together to build a solution so that it sort of covers both the requirements.

Apoorvaa: Right, sorry, go ahead.

Host: No, I was just saying so that we can move forward at the keeping privacy in mind as well.

Apoorvaa: A lot of times I tell my PMs, product folks that I work with is privacy is a feature and think of it as a product feature because users are going to end up using the product more because of it if we market it the right way and if because users are increasingly more aware of this. And if, for example, Apple relies on it heavily, Their marketing is all around how our products are so private.

So, yeah, that's like a business case as well for privacy.

Host: Yeah. Yeah. Once you have trust with a brand, then you would for sure come back to use their products or services.

Apoorvaa: Right.

Host: Yeah, you were adding something.

Apoorvaa: I was adding about privacy by default. That's the other term you mentioned, is also part of the privacy by design framework. But privacy by default, I would say is more tied to specific features or specific product. For example, location being off by default is privacy by default. Or Messages being end-to-end encrypted is privacy by default, or your data getting processed only on device is privacy by default. So these are specific settings that I would say is privacy by default.

Host: So how do you use the concepts of privacy by design? Like you highlighted, there are many such sections like privacy by default.

How do you use those concepts of privacy by design when you are, like let's say designing for privacy engineering, or you are speaking with other teams or you are presenting an idea. How does privacy by design help you with that?

Apoorvaa: Yeah, so this is as we spoke, it has to be reflected in the privacy engineering or the overall engineering to a large extent. For example, the review tooling that we discussed is enabled privacy by design, right?

Because there's a framework in place to review and analyze. And then there's this notion of shifting privacy left. then privacy by design is something that would encourage organizations or to have more collaborations or early discussions between product teams and privacy engineers. Now that that happens a lot once you have that good relation, then no product, the teams that I had at snap, for example, good relations with the PMs and would come in advance and just, hey, we are thinking about this. Do you see any direct red flags or anything you would want to flag at this point.

So that is the ideal thing in my opinion, when teams come to you at a design phase and just when they're thinking about features. And that's where we can have that collaborative space rather than when things are ready and then, but this is not, you are violating. This is not how the data should be handled. So then it becomes more in the zone of conflict.

So when you can get things early in discussion, that's a win for everyone, I think, and for the organization.

Host: In security also, there is a huge movement of shift left. And I heard that in privacy also, we're trying to shift left so that we start engaging with other teams sooner than later. So there is less rework and the conflict that you said, right? Like if the product, if it is designed and developed now, during the QA cycle, you find out that there is a privacy challenge, then you'll have to go back to the drawing board and have that conversation with the other teams leaders, right? So which brings friction. So it makes sense to start early when it comes to that dialogue with other teams.

When it comes to building privacy first applications or services, generally organizations utilize privacy enhancing technologies or PETs from their toolbox. Maybe for our audience, can you highlight what are PETs?

Apoorvaa: Yeah, so privacy enhancing technologies and PETs as they're called. I feel the term has come to include a bunch of technical solutions together in a bucket. Because the bucket itself is big, right? It's privacy enhancing technology. So lot of technologies fall into that bucket. it is mainly about providing utility or value from data while preserving privacy, while not compromising on user privacy. So gaining utility from data while, you know, maintaining privacy. So that is the high level use or that is where pets come into play.

At high level again, I would divide these technologies in two categories. One is execution privacy. So maintaining privacy while we are operating on private sensitive data. So what would come into this bucket is multi-party computing, just a cryptographic technique, then confidential computing, or even end-to-end encryption, because while you're sending messages, they are encrypted, or what MPC, multi-party computing, again, it's a very fascinating cryptographic primitive. What it helps us do is, you and me or even other parties, we decide that we want to compute this function on data.

Let's we want to find who has the maximum salary or something sensitive, where we don't want to reveal the actual numbers, but we just want an output. So MPC allows us to do that computation without revealing anything about the input data, just revealing the output. This is now being explored or used even in machine learning settings where one entity has a proprietary machine learning model for disease prediction or in the medical space.

Another entity, let's say a hospital has the sensitive medical records, would, you know, maybe we want to train on or have some insights on.

So both the things are very private, sensitive, and not something people want to share in the clear. But MPC will enable this competition so that in the end you have only this trained model.

Host: I love both the examples. The first one was super simple. The second one was a little technical, yeah, I love both the examples. Yeah, please continue.

Apoorvaa: So that's on the execution privacy bit, keeping things private or even secure, right? To some extent while you are operating on them.

The second category is output privacy. So when we want to actually release some insights, some statistics, making sure that they are still private. So differential privacy is a big one in that category.

In fact, differential privacy has this great property that any output that is differentially private, can kind of do anything with it, can post process in any way, it will still maintain its properties.

Another one that comes to mind from cryptographic space is zero knowledge proofs where you are maintaining this output privacy, you're making a statement. Zero knowledge is again a very fascinating technology. That's something I worked on in my PhD as well. It sounds very magical. It's basically; How can I prove something to you? How can I prove that I know a secret without actually revealing the secret?.

For example, how can I prove that I solved, let's say a very hard Sudoku puzzle? So I want to convince you that I did solve it, but I don't want to give away the solution. this is what zero knowledge proves make possible.

Host: So do you have any, like, can you share any successful case studies where PETS has been effectively deployed? You gave great examples, but if you have any case studies or anything where it addressed privacy concerns that you can highlight?

Apoorvaa: Yeah, I mean, for differential privacy, for example, the main thing that comes to mind is it's now something that is used for the US census outputs. So all the release of insights from the US census reports are now actually private. work with experts, scientists, differential who have expertise in that.

So that's a really good step because that's sensitive data about populations and there are all these demographic cuts which could be sensitive. So that's really good privacy positive place where this is being used. The other thing, is federated learning, that's another thing that is used for auto-complete on our smartphones.

A lot of times, I mean, in our keyboards, we use the auto-complete feature. So those models often, federated learning is the way that those models are improved so that individual privacy is maintained.

Mozilla has a really nice implementation where they worked with a bunch of cryptographers for private aggregation, basically learning some trends, right? How many people are visiting some websites and just understanding some trends, but without revealing the individual browsing history. So they used a design build with secret sharing and, in fact, some version of zero-knowledge proofs.

So that's really something at scale, a very exciting implementation. And the last one I'll mention, actually, it's my own research that has been implemented at scale in WhatsApp actually. So this is WhatsApp has end-to-end encrypted messaging. But even to enhance that further, one of the questions that comes up in end-to-end messaging, even if it is end-to-end encrypted is how can we be convinced that we are really talking to the right person?

In my contacts showing up and my messages are into and encrypted to you but am I really talking to you? So how can I get that assurance? basically there's no man in the middle or I am really talking to the person who I think I'm talking to. So this is kind of an extension of public key infrastructure and this is again a place where some version of zero knowledge proofs can be used to maintain this attestation of public keys from each user, which the other users can verify. And this is something WhatsApp recently enabled, I think maybe last year or so, which is based on one of my papers from my PhD.

And especially for sensitive conversations, imagine if you are in a war zone or some you want to reveal a sensitive story to a journalist, then you have to you want to be very, sure that you are talking to the right person. So then you have this mechanism of verifying independently and then initiating that conversation.

Host: So I know that you gave so many examples, but two examples, which stuck to me. One is auto-complete. I generally don't think about privacy being a factor in that. I'm just typing some characters. It's just like find me the possible word from dictionary, right? Like something like that. That comes to my mind, but that's amazing to hear that there is like privacy plays a huge role there. Yeah. other thing that,

Apoorvaa: sorry, I'll just add why privacy is important there because if you are using something standard, even, then, You know, you might have a specific style of using certain phrases or certain words, which your keyboard will learn and you want it to learn because that's your style and want.

But the other aspect could be, you know, my address is and you end up say sending that to a bunch of people. So it could have that PII information, which maybe you want your keyboard to memorize because Yeah, you are sending it to a bunch of people but that's not something that should go into the global model or Any other place

Host: And the second example that you gave around WhatsApp I remember I think last year when like a couple of months after it was rolled out it showed a I think when I updated it showed a message that hey you can verify the other person and I was like Why would I need to verify? know that I'm speaking with Apoorvaa, right?

But I see the point, right? In the example that you gave in a war zone, if you want to share a story with someone or even, let's say, you like in India, particularly nowadays, a lot of organizations have started interacting over WhatsApp, right? That you can send a WhatsApp message and you can connect with your doctor or something like that. In that case, you want to be doubly sure that you are sending it to the person you think you are sending before you send any health related data. yeah, that's amazing. It's amazing that you have written that as like you have worked on that and it's been rolled out and at that massive scale. Congratulations to you. Apoorvaa: Thank you. It was a team effort. It was in fact a paper with all four women authors as well. that also proud of and yeah, and that work that have been follow ups as well by some of the folks at Meta and WhatsApp. And I'm very happy to see that being taken forward and implemented at scale.

Yeah, for any researcher, it's really nice to see their research being used at scale.

Host: So absolutely. So now I want to talk about the other side of the privacy, right? Like as an organization, I have privacy engineering, I have to follow the best practices, I have the pets and things like that. When it comes to the users, it sometimes adds friction to them, right? Like let's say one example could be Apple's app tracking transparency. When it gives power to the users, it disrupts advertisers significantly.

So do you have any example of for our audience, where do you see that privacy can add some of this friction to the user experience?

Apoorvaa: Yeah, iOS, mean, the app tracking transparency, actually, that's a very good topic and something I've extensively worked on at Snap, specifically, you know, making the ad stack more private at Snap. That has been my, that's an area I've thought about a lot.

And what happened just so that everyone has, we are on the same page on what we're talking about. I'll share some background on what the app tracking transparency was and how it affected the space.

Basically the people with iPhones might remember, or even today, you get these prompts, right? If you install a new app or even when.

And sometimes when an existing app updates, you get this prompt that whether you want to allow the app to track or don't allow.

Host: Locations, contacts, things like that.

Apoorvaa: This is specifically in the context of advertising. So the prompt specifically asks you if you want to allow the app to you across different sites and websites. So what that means are;

Let me start with some history, right? Or background. What happens is all our phones, iPhones, as well as Androids, there is a device identifier specifically for advertising.

So on iOS, it's called IDFA, ID for advertising. On Android, it is Android advertising ID. The phones already have these and that's always been around. This identifier has been very important in the ad space generally.

Because what happens is when advertisers advertise on social media, they want to learn how their campaigns are doing. Nike spends a bunch of money to advertise on Snapchat or Instagram.

So they want to learn, they want to see how their campaigns are doing. So what that means is they want to see how many people are clicking on the ads on Snapchat or Instagram, and then actually going and buying their shoes. Right. So, and that, that's how they are going to make their marketing decisions how they want to spend. So, Yeah, it makes sense for them to understand these metrics, right? That's where IDFA or these device identifiers were useful because let's say you saw an ad for Uber and then you install that app.

Then wherever you saw the ad, that platform knows your ID and once you installed Uber or you bought something on Nike, that app also knows your ID. And that's how you can make that connection and then learn that, OK, this user actually saw an ad and actually then installed the app or did something.

And this is useful information for advertisers. It makes sense that such an ID exists. But of course, users should have a choice in that, whether you know, they want to allow earlier it was default YES. And then Apple basically made it default. NO.

So that was the change that happened. And this default not allowing tracking is of course a positive thing for user privacy. But the way it was done was very sudden. And I think it affected small businesses a lot.

And sometimes if you make something, a big decision like this very sudden, it could have negative, in fact, negative effects because then people might be incentivized to use more invasive identifiers. So there is risk of making privacy worse off for users.

And like that prompt, I mean, In general, this is a challenge, right? How do we communicate things properly to the user that even if we are doing something that is positive for user privacy, how do we communicate that or how do we present the choices correctly to the user so that they can actually understand what's in their best interest? So that's how it can be. Even those cookie pop-ups that we constantly see, right? Most people… find it annoying and they don't really understand what's going on.

So that's really a gap.

Host: No, I totally agree. I'm again drawing parallels with security. Generally, every website you go to, they would ask you to have complex passwords. Unless you use a password manager, you end up writing it somewhere, which defeats the purpose.

So finding that balance is often key. So a follow up question to what you highlighted is

How do you find the balance between the user experience and the security without compromising on either?

Apoorvaa: Yeah, so for this example, something that I worked on extensively, And an ad space is traditionally, I mean, it's known to be intrusive, right? For user privacy. Because the first example that comes to mind when we think of user privacy is, you know, these creepy ads that track you across internet. In fact, when I talk to some of my friends who are not in this space, or just generally they feel, you know, I sometimes hear this defeated attitude that, there's no privacy anymore.

So I feel that's.. that's not great, first of all, and we need to empower people that, yes, digital privacy is not only possible, it's your right. we need to have all the organizations uphold it and do a better job there.

So specifically, mean, in this example that I shared, that advertisers might want to learn how their campaigns are performing. So the key insight there is they don't necessarily need to know that, you know, Purusottam saw this Uber ad and then installed Uber. They care about general trends or insights that, okay, so many people in this region saw this ad today and then ended up installing. So this general insights are something we can actually enable through privacy and privacy enhancing technologies.

So for example, MPC, what I mentioned, there is a specific protocol such as private set intersection size or private join where you don't have to reveal which person exactly did an action, but just understand aggregates. Just it's like a set intersection. The advertising, the social media company has a set of users who saw an ad, they have that set. And then the advertiser, has the set of Nike has the set of users who actually purchased. Then we just want to find what is the number of users that are in common.

And then we could also use differential privacy on top of it too, so that, the output is not uniquely identifying or the output is also has that privacy guarantee. So that's how pets can be useful. So there are technical solutions, but also we need to educate users better to really understand what the choices mean for them.

And in the ad space, especially, right, I've thought about a lot. Again, there is a potential to create really this win-win approach, I feel, which I talked about and which I really prescribe to because most users understand that, okay, they are using all these free services. ads are inevitable for a free internet to some extent. users like, I mean, me personally, right? If I'm looking for like last week, I wanted to buy a new in a bottle or like a new mug I was looking for something. if you involve the user that, OK, the user is actually looking for something in this space, and then you can maybe show those specific ads. And then it's a win-win because the user is really looking to purchase something around that. then you are actually providing value to the user and then not showing something that, how did you know that I'm looking for this or having that creepiness thought and then which erodes the trust.

So that's just my opinion. And I think some apps are doing things around those. For example, Snap had this in settings. There's a way to choose what topics you're interested in. And even Chrome is doing that with topics, API.

So there are efforts in this direction. So in general, my view is this is a space where you can involve the user more, be more transparent, and it is just better for everyone.

Host: Like one of the keywords that you used is like users have choice nowadays, right? They can say that, don't track me throughout all the websites I go to or throughout all the apps, like the Apple's app tracking system, similarly for other platforms. With more choices, do you feel like there is a decision fatigue that is coming to the users?

How do we design user-facing privacy features in a way that empowers them without the decision fatigue coming into play?

Apoorvaa: Yeah, definitely. mean, decision fatigue is a big thing. It's in general, I feel, not just in the privacy space. It's just because there's just so much information out there, information overload inevitably causes that fatigue. Trying communicate the most important thing in a friendly way. I mean, there is an extreme where you just try to confuse people. I have seen that with a bunch of cookie pop-ups where the choices are really not clear. And even once you make the selection, one of the options is that accept all. I mean, why would I go and choose? if I wanted to accept all.

So there are extreme examples where the design is made to confuse users. So at least let's stop that and then make it more friendly for the user to understand, perhaps through a journey or through some sort of gamification.

Host: Yeah, I mean, I can relate to that cookie example, right? Nowadays, any website you go to, if you're going there for the first time, you see three options. We want to accept all, customize, No, then when you go to customize, like for a user who is not in IT, it's very confusing, right?

So yeah, like decision fatigue would definitely come into picture.

Apoorvaa: Yeah, and potentially now, generative AI with these agentic solutions, there's potential to enhance user privacy through that, right? Like you, you, you tell your preferences once and for all to, you know, an agent or somebody else in your browser. then that is going to take care. You don't have to do this for every website.

Host: Hopefully, soon, hopefully soon. So speaking of Gen. AI, I definitely wanted to touch on that topic. So this year you were in a panel at B-Sides San Francisco, right? Where the topic was around combating GenAI privacy abuses.

And one of the things that you highlighted is LLMs, like large language models, are primarily trained on past data. And sometimes, like, users may not have consented for it.

One of the recent example is LinkedIn has started training on all the users' data. And by default, it's enabled. You can go and disable it, but by default, they are using it for trading. So where do you see the biggest privacy vulnerabilities in today's Gen.AI era?

Apoorvaa: Yeah, the main thing, as you pointed out, in Gen.AI or these huge language models, is basically the amounts of data that is needed for training. then we don't completely understand how that those massive amounts of data are actually reflected in the outputs are actually part of the usage of LLMs.

For example, the biggest privacy risk or vulnerability that you might have heard is called memorization, which is essentially that LLMs are known to memorize training data verbatim and then spit it out in a query response. So for example, if the training data contains something like Alice lives in Santa Monica and her address or phone number is something, it's possible that the LLM completely memorizes that exact thing.

Yeah. And in fact, there have been a lot of attacks that demonstrate this very and we really don't understand how LLMs were because this attack which I think there was a paper last year or maybe earlier this year how ChatGPT was attacked and the prompt was pretty bizarre they prompting it's one single word poem and they kept prompting it for many many times and I think the prompt was something like keep outputting the word poem as many times as you can.

And eventually, LLM spit out some PII, some email address. So it's really bizarre. We really don't understand how LLMs can output any of the data that they are doing. So we have to be extra very, very careful around what data is used for training. And that's really the big vulnerability. The other is also membership inference attacks, like trying to find out if a certain piece of data or a certain sample was used in training a model.

More than LLMs, this is problematic for specific medical applications, let's say. So you shouldn't be able to figure out if you know, someone's sample was in the training data that as if someone really had a particular disease. So that's again very sensitive information.

Host: Absolutely. Yeah, true. So let's say if I'm as an organization, I'm using some data to train. You highlighted some of the vulnerabilities, right? Somebody can use prompts to maybe get into the memory and get that information, sensitive information at some times.

What other risks do you see organizations might face if they do not like either build their own models properly or they're using an external model and they're feeding their context and things like that?

When you're building AI powered features, what other risks do you see organizations might face?

Apoorvaa: Yes, in terms of risks, basically, having some data sanitization is very important. basically having a good handle on your data, what is going into training? What is the lineage of that data? How are you sourcing that data? How are you processing that data before you send it for training?

Then even the training process itself, there are ways in which that can be more private, are you thinking about that? Once the model is trained, what kind of testing are you doing? This is again a new evolving space which organizations should invest in once their models are ready to be deployed, having some privacy testing in place. It's similar to testing, think this privacy test suites is also something that is going to evolve and something organizations should invest in. Then thinking about even the inference chain, can that be made more secure?

There are some works around that, that the complete query and the response flow is, let's say, encrypted or within a confidential compute. So users have that additional guarantee about how their sensitive queries are being processed. Those are a few things.

Also, regulations are moving pretty fast in the AI space. So staying up to date, the AI Act to some extent is talking about how your training data needs to be annotated or there needs to be proper lineage around it.

Similarly, there are some new laws being enacted within in California as well, which are along these lines.

Host: So I think some of the things that you highlighted are part of the privacy enhancing technologies that should be followed, right? On top of many other things like, but one of the things that maybe we touched on slightly earlier is many of these pets are reactive, solving like existing privacy concerns.

How do you see pets evolving in next, let's say five to 10 years?

Apoorvaa: Yeah, I see a lot of investments or research already in place and there's of course potential for more interesting things coming out. And right now, it's really a good time to think about because these are open-ended problems. These are not solved. is not like encryption. have no encryption is standardized for so many years.

You know, the right key sizes, everything is pretty standardized. Whereas in generative AI or generally AI privacy space, lot of problems are very, very nascent, and open-ended, which makes it exciting.

Also, the right time to think about it. Even using something like differential privacy, which is now kind of established as a mathematical framework, but how do you apply those techniques in training models.

That's again, you know, a ripe area of research because each setting is slightly different. And then there's a lot that can be explored there. The test suites that I talked about, there's a lot of potential to keep thinking about it, innovating in that space.

Even within crypto, I've fully homomorphic encryption is another thing I'm very excited about becoming very, very efficient. mean, of course, FHE is established and things are progressing and it is at a stage where it is feasible, but not scalable.

So that's something I'm excited about. There's a startup called Zama, which is heavily working on this. They are trying to innovate at the chip level making FH very, very efficient. So that's something I'm excited about.

Other very interesting line of work, which I'm also, again, very excited about, is called machine unlearning, which, again, seems very magical that once the machine a model has learned something, how do we make it forget something? This has a bunch of application, let's say we want to delete something or given that the training process itself for LLMs is so huge and expensive, we want to avoid retraining things. So is there a way we can edit out or make the model forget certain aspects of the data that it has trained? fascinating area of research. We don't have great solutions as of now, but I'm optimistic, hopeful that some cool ideas, some research will emerge in that space.

So I think there's a bright future for pets and we all can together be proactive about it. So it's a very exciting area and I'll encourage more people to think about these problems. It's a very interesting space.

Host: So I think machine unlearning, unlearning, had not thought about that that much, but I can see the point, right? Like if you have spent millions in training and if there was some incorrect data set or something like that, and you might want to take that out of the memory, it could be very valuable.

One of the things that you, one of the keywords that you touched on is proactive, right?

Do you see any proactive privacy solutions like close to becoming, close to being available to everyone? Any proactive privacy solutions?

Apoorvaa: Yeah, so these are the ones I mentioned. These are probably still under the development phase and yeah, but soon enough, I think it will reach folks.

You know, models or in terms of just the proactive approaches of sanitizing data sets, those are being already used to a large extent. Basic, at least PII filtering, that's something that has been standardized. Then maybe use of synthetic data, that's again another thing that is becoming pretty mainstream right now. And then Yeah, the other things are on their way to becoming mainstream, I would say.

Host: Yeah, I think I read somewhere, maybe correct me if I'm wrong, I read somewhere that maybe OpenAI was trying to invest to generate synthetic data for their own like model training. Maybe earlier this year, I don't know, like if they have done it, like they're able to generate synthetic data and use that for training. But like I had read something around that. So hopefully that happens.

Apoorvaa: I don't think I read the exact news, but I'm sure synthetic data is an area that everyone is investing in. And again, it's a very privacy-positive thing that we're not using our data. Any PII or anything like that. Yeah. yeah, I think most I would assume most tech companies are investing in this, which is really a privacy-positive step.

Host: Initiative, yeah, absolutely. So with that, we come to the end of the security questions.

We had reached out to some of our common friends and we have some questions from them. So the first question is from Trupti Shiralkar.

What are your views about striking balance between security, privacy, developer productivity and experience?

Apoorvaa: Yeah, I mean, as we discussed, right, there are real trade-offs and I think communication is key, of course, involving all the stakeholders from the beginning so there are no surprises so communication is really the key and then just having this win-win positive some approach being part of know organizational culture that you know we are together solving something that is going to be better for the company rather than you know security versus product or privacy versus you know product and so having that collaborative mindset and a company that fosters it, that's important.

And sometimes, yeah, the business is going to make a decision that privacy won't like, or sometimes the leadership is going to side with security, which the product won't like. So that's going to happen. That's part of the game. It's this concept of infinite game. It's important to keep playing that game and you'll have different outcomes every time. the important thing is everyone has a seat at the table, especially privacy security have a prime seat at the table at the decision making. Then everyone should be happy with the fact that yes, everyone's views were seriously considered and then we reached a solution that was best at that point of time for the company. Host: Makes a lot of sense. The next question is, one of the key statistics which was presented this year at Gartner Security Conference was that around 73 % of CISOs and security leaders in US feel burnout at some level in their work life. What's your take on this and how do you handle stress? tips for other security leaders or security engineers?

Apoorvaa: Yeah, burnout is real. I personally feel, mean, in my experience, if I'm working on things that I'm really passionate about and driven by, then that gives me more energy and kind of keeps me away from burnout to some extent.

The reason I mean, I can definitely see this happening, especially in privacy and security space, because a lot of our work can be reactive, especially we didn't touch upon incidents, but if when incidents happen, then that can be really stressful and, you know, crisis management. So it's definitely part of it is going to happen at some point.

I think, personally, I try to balance it out always with things that I really enjoy and am driven by. I like to, in fact, take that initiative often to do projects or pitch things that are exciting and that would be great for the organization. So if you are driving something that you are excited about and that your leadership and company believes is good for the company, then… that can be an enhanced.

Host: That's a good tip. Yeah, that's a very good tip. It's like you're doing a side project which gives you joy, And if it is related to your work, then that helps you balance some of the burnout aspects.

Yeah, so with that, we come to the end of the podcast. But before I let you go, one last question is, do you have any recommendation, like reading recommendation for our audience? It can be a blog or a book or podcast or anything, a research paper, let's say, anything that you have for our audience.

Apoorvaa: Yeah, I since we were talking about this burnout aspect, or in general, I mean, one of the podcasts that I like to listen to is, you know, Big Think by Adam Grant. It's more about just thinking of work more holistically. And he has invited guests from all different spaces, sometimes tech, politics. And it's a good discussion, generally, how to approach work and so on.

And my other pet project, pun intended, is just to understand how living a multi-dimensional life helps us, right?

As I mentioned, like doing things that you are really passionate about, whether it's at work or even outside of that, really gives you a lot of energy and positivity in general in life. For example, I'm also a musician. I perform a lot. And that's a really important aspect of my life. I'm also a parent. I have two kids. And I'm a privacy techie and I'm passionate about all these three things. And all of these identities together define me make me better. So that's I'm, you know, reading a lot about this. And in fact, I would like to plug my own blog newsletter, I've recently started to explore these topics and how to have a multi-dimensional identity and how it helps us in our professional careers as well. So it's called polypaths. That's the term I've come up with. A polypath is someone who likes to pursue different paths in life and grows,and excels at multiple things to have a holistic identity. So I'm really passionate about that. I would encourage the audience to check it out, polypaths. it's on substack and yeah, do subscribe if the content resonates with you. So yeah, thank you.

Host: Thank you so much for sharing that. So generally what we do is the recommendations, we tag them when we publish the episode. So we'll do that as well. Polypaths and the podcasts that you highlighted will definitely do that.

Apoorvaa: Perfect. Yeah. Thank you so much, Purushottam for having me. This was a really fun chat. I enjoyed it.

Host: Yeah, it was lovely to have you as well on the podcast. So there were many things that I was not aware of or even I was not thinking it from that lens. Thank you for like highlighting and especially some of the examples that you gave were very close to when not very technical, some of them which made it easy to understand. So yeah, thank you so much for doing that and coming to the podcast as well.

Apoorvaa: Thank you. Thank you. And thank you for, you know, having this. podcast. I've seen other episodes as well. So this is a really nice community contribution. I think so. It's kudos to you and your team. Thank you so much. Thank you for the kind words and to our audience. Thank you so much for watching. See you in the next episode.