[00:00:08] Lauren Burke: Welcome to Women and Analytics After Hours, the podcast where we hang out and learn with the WIA community. Each episode we sit down with women in the data and analytics space to talk about what they do, how they got there, where they found analytics along the way and more. I'm your host, Lauren Burke, and I'd like to thank you for joining us.
Today, I am so excited to have Noemi Derzsy joining us. She has joined us for a very interesting conversation about network science, graphs, and how it can be applied to customer data and the uses around that to understand customer behavior.
So welcome, Noemi. Thank you so much for taking the time to join us.
[00:00:55] Noemi Derzsy: Thank you, Lauren. Thank you so much for having me. I'm very excited to be here with you.
[00:01:00] Lauren Burke: So just to start off, could you tell us a little bit more about your background and the path you've taken over the years?
[00:01:06] Noemi Derzsy: Sure. So my background started in physics. Back in high school, I was very into physics. I had a high school teacher who was always very supportive of taking us to physics competitions. And I liked it very much. And she was also this fierce female role model that I had very early on in high school. And she also became a school principal. She was leading all these physics competitions. And she was a great mentor throughout the years, especially early on. And I said, "okay, I want to pursue a physics career. I want to be like her."
But then I got a bit discouraged. I mean, they tried to discourage me, but it didn't really work. The people around me because back in my home country, back then with physics, you didn't really have many opportunities to kind of have a career. The only options were either work in academia or that's it. Research opportunities were all so limited. So with that in mind, my physics teacher said, you know, "stick to what you like, but also add to it something practical." And she said, "how about doing a dual major of physics with computer science, because that will give you the technical background and if nothing works out with physics, you still have this as backup plan."
And that really shaped my career going forward because even now, I consider myself as a physicist and I use all my computer science programming skills as a tool. So I always consider it as something that helps me do my physics experimentations. And I really think that that was a very valuable feedback from her, because then I was able to do the dual major with Physics and Computer Science. Then I pursued a Master's degree in Computational Physics.
And that kinda led me to do a PhD in Theoretical Physics where I was working with my PhD advisor who was one of the only professors within the university who was excited to apply his physics knowledge to novel problems. So, for example, one of the things I was very interested in, how you can understand and model human behavior through obtaining a real world social data. So a data set with the social aspects and how you can apply your statistical physics knowledge and computational physics skills to model the patterns that you reveal from there and the different behaviors. So I found that very interesting because it was something that no one was doing within the university. So it was something new, and I always like exploring new things.
So I started working with him on my master's degree. And it was just so exciting because we were working, we got this Enron email communication data set, and then I wrote my master's degree thesis on that. And I was very excited about this fact that you can just take a real social data, and then do some analysis on it. And then reveal some patterns and then just do a model which kind of reproduces the behaviors, which is like a descriptive model. And then, maybe go into a predictive model as well, to predict future behavior.
So then I said, "Okay, I'm gonna do a PhD program with my advisor." And that kind of led me, after I finished my PhD program, to Rensselaer Polytechnic Institute where I did a postdoc in an ARL, Army Research Lab, funded research center where they were also modeling complex systems and mostly focused on network science. Which was this emerging field of applied graph theory where you have all these complex systems and you try to understand the relationships between the elements by representing them as a graph.
And this was something that was very fascinating to me and even now, it's fascinating to people because basically all social networks that connect us in the online environment are based on these graph structures. So I always kind of worked on this without having a definition of this is data science. And this was something that was kinda shaping my path forward. Even though while I was pursuing this career, it was always unclear where the future will take me. So that was something that I, I always had in mind like, is this going to be helping me in the future? I don't know, but I enjoy what I'm doing, so I'll stick to it.
And it's kind of really worked out well because once I completed my postdoc position, I decided I don't want to stay in academia anymore. So I wanted to move to industry. And then I found this great fellowship program called Insight Data Science Fellowship, which takes PhD graduates and transitions them to industry. And that kind of opened up the whole new world of data science in industry in New York City for me. And that was very exciting.
And then I found this perfect job for my skillset and for what I want to do at AT&T Labs. And I joined the Data Science AI Research organization and then we got transitioned a couple of years ago into the Chief Data Office, and I've been there since.
[00:06:26] Lauren Burke: That's so cool. So even as a person in a data science role now, do you still consider yourself at the core to be a physicist?
[00:06:35] Noemi Derzsy: Yes, definitely. I still, whenever I approach things, I'm always thinking at it from a physicist perspective, like, oh, it's a new thing. I wanna see how it works. I want to explore what's the driving mechanism behind it. So with this respect, yeah, I, I always think from a physicist perspective.
[00:06:57] Lauren Burke: And so you, you really have a wide ranging background, like covering everything from physics to network science to computer science. And it seems like it comes back to you having an interest in understanding and modeling human behavior, which is very useful, especially in today's world where we have so much data and so many different ways to interact with each other.
So what led to that interest in human behavior and do you think it shaped the specific path into data science that you've taken?
[00:07:25] Noemi Derzsy: Yeah, it definitely has. So when I first started working on this, coming from a physics background, it was always, we're trying to understand macro environments, trying to model large-scale complex systems. But it always felt a bit cold to me, like you're understanding these complex systems, but there's nothing that relates you to those systems because it was like environment systems or particle systems.
But then when I learned that you can apply this to humans, that gave me the element that I can relate to and I find more fascinating. Especially because like social science is a whole different field which is focused on understanding human behavior. And it's very fascinating because it's such a complex behavior, right?
But there are certain things that you can really model well once you start digging into these complex systems. And I realized that you can apply statistical physics tools to understand these behaviors and to model them, I found that super fascinating and that kind of shaped my career going forward.
[00:08:38] Lauren Burke: That's so interesting. And so for those who are newer or unfamiliar with the area, what is network science and how is it used to understand customer behavior?
[00:08:48] Noemi Derzsy: Sure. Network science is basically applied graph theory. I often get the question," What's the difference between networks versus graphs?" And basically when we talk about networks, we're talking about large scale graphs.
So the field emerged from graph theory, but in a more applied fashion. So once these real world data sets started coming into the world and people were starting to collect all these data about different information, about people, about biological systems, the infra systems, and so on. It became easier for researchers to see that the mathematical graph theory work, how it translates into understanding real complex systems.
And then network science is basically the applied graph theory, which is this interdisciplinary field that anyone can apply in any type of complex system of whatever nature. So for example, it is used in epidemics modeling, especially the past two years with the pandemic. It was widely used in many research papers. There's been many studies done based on the human interaction how the epidemic spread is, what's the dynamics of it and how you can model it and also mitigate it if necessary.
And then there's other aspects like, for example, infrastructure networks. You can think of airline traffic, land, water traffic. So all these can be represented as graphs and then you can also do all sorts of optimization on the traffic flow and so on.
So all these systems that surround us are complex systems that are interconnected, and whenever you have an interconnected system, you can just represent it as a network and then just go from there and use network science tools to basically analyze, model and predict behaviors.
[00:10:51] Lauren Burke: That's so interesting. So you're looking at the relationships between people and their interactions with other people, with companies, with other basically just entities. If you're thinking of like airline traffic and people traveling on different pathways to get to different destinations.
That's really interesting that you can take so many different relationships and so many different discrete interactions and combine them into something that helps you see a bigger picture.
[00:11:18] Noemi Derzsy: Exactly. So there are so many ways to represent complex systems through networks. And it gives you just infinite opportunities for research and analysis. And it's turned out to be a very important tool also within companies. There are many companies that kind of represent their customer base using knowledge graphs, for example, where they collect all these information that they have about customers into a big graph and then they can just run analysis on them through this knowledge graph tool that they're using.
[00:11:53] Lauren Burke: That's really awesome. And it's so cool learning about how far data and the capabilities of just what we can do with it have come over the years and how interdisciplinary work like this has really enabled us to do a lot of really interesting things.
But outside of the customer analysis side of things, do you think your background in physics and network science affects the way you approach analysis?
[00:12:16] Noemi Derzsy: Yes. So as a physicist, I'm always curious about exploring new data sets to identify patterns, behaviors, and to understand their underlying governing mechanisms. So I think the physicist curiosity is in me all the time whenever I embark on a project.
And as a network scientist, I always try to find relationships among elements in the data based on which I could represent that system as a network. Basically, I see everything as a network, which gives me definitely a different perspective on things and it gives me a different perspective and way of approaching an analysis.
Also, as a physicist, I feel that my curiosity sometimes leads me on a longer data analysis journey. Like people would probably jump into a modeling quicker, but I really like to take my time to explore the data that I have and to do EDA because of that curiosity of seeing if I can find any patterns, any behaviors. So I really like to do the data exploration.
There was a podcast a few years ago that I did, and I remember I mentioned there that I always feel like I'm in a crime novel, like in a Agatha Christie novel where I'm trying to find clues. And pretty much that when I'm going into an EDA process, I always go like very eager to see what's going on in the data, what I can learn from it and can I find any patterns and where will that lead me?
So I was always saying it's like playing a detective, but in a safe environment.
[00:13:58] Lauren Burke: I absolutely love that comparison. I really like Agatha Christie, so
[00:14:02] Noemi Derzsy: Same.
[00:14:02] Lauren Burke: I think that's that's a really fun way to think about it. I've always kind of thought of physics as basically just an applied interest in learning about how the world works.
[00:14:11] Noemi Derzsy: Yes.
[00:14:12] Lauren Burke: So I feel like that makes a lot of sense. The way that you are approaching things with that curiosity and that kind of drive to really understand how it all works before you're ready to move on to modeling.
[00:14:23] Noemi Derzsy: Exactly.
[00:14:23] Lauren Burke: Which I think great quality in data science and in your current role as an inventive scientist.
[00:14:30] Noemi Derzsy: Uh, exactly it is.
[00:14:32] Lauren Burke: So you've been an Inventive Scientist at AT&T and AT&T Labs for a couple years now.
[00:14:37] Noemi Derzsy: Yes.
[00:14:38] Lauren Burke: So how does that differ, the inventive scientist role, from the typical data scientist role? And what kind of skills do you need?
[00:14:45] Noemi Derzsy: So the Inventive Scientist title originates from AT&T Labs, which is the research and development division of AT&T. And when I first joined AT&T, I joined the AT&T Labs research and I joined the Data Science AI Research organization. Then later on we got moved into the Chief Data Office, but our role and title remained the same. And basically this is doing data science, but it's a data scientist role where the data scientist will seek to provide novel solutions to problems and is actively involved in the scientific research space.
So basically it is data science and it's also more of a research data scientist position. Most of my colleagues with this title have a PhD. So there is this requirement to have research experience, in addition to data science experience.
[00:15:44] Lauren Burke: So it sounds like it's kind of similar to a research scientist role you might see in other organizations.
[00:15:51] Noemi Derzsy: Yes. It definitely translates to that. We have the opportunity and we are encouraged to publish research papers, to go to conferences, and to stay active in the scientific research space. And to collaborate with academia, which was really great for me when I joined because I always said that, okay, I'm done with academia, but I'm still open to doing every now and then some collaboration or research.
And I think this job gave me that ideal leaving the window open opportunity so that I can still have some research options during my career as a data scientist in industry. And that was very important for me. And it's been working out great.
[00:16:38] Lauren Burke: That's so awesome. It allows you to make that transition to industry while still kind of having that bridge to some of the things that you enjoyed about working in academia.
[00:16:47] Noemi Derzsy: Exactly. So I kind of get to keep exactly as you said, the parts I liked, which is the research and working on novel things. And then still explore new things within the industry.
[00:17:00] Lauren Burke: That's awesome. What are some of the exciting projects or things that you've published while working at AT&T?
[00:17:07] Noemi Derzsy: Well, I am involved in several very different projects within the company, and this is also one of the things I really like at this job. Is because I have the opportunity to study and work on very different types of projects using very different types of data and it helps me to analyze always new things and I never get bored.
So whenever people think of AT&T they just think about the telecommunications company. But there's so many different data sets that we have, which give opportunity for data scientists to come up with so many data science problems that they can work on. We have the cell tower data, we have the communications data, we have internet cause we also provide internet service. We have that data. Then we have customer care communications data. We have live chat data, which is text data. Then up until this year, we also had TV data for we provide Direct TV. And then also Warner Media belong to AT&T. So we had all this TV data information. So there are so many different opportunities to explore.
But one of the projects that I am currently working on and focused on is ensuring that bias and fairness in machine learning models within the company. And one of the exciting thing was that also with this project, which because this bias/ fairness concern is just an emerging research field and there's more and more interest in this topic. Also through this research project, we identified the research opportunity. So I was able to actually again, bring in my network science knowledge and we proposed this graph based approach to identify machine learning papers and projects with bias concerns. And that's one of the papers that we published last year. So that's one very novel way of looking at things. So even if you think about bias/ fairness in machine learning, how would you apply graphs to it? And that is one of the examples where I used that.
And then other projects would be, for example, I'm using mobility data to study human behavior through this large scale anonymized customer data. And understanding how people move around to understand cell tower load is very important for us to know how to improve our services.
And some other project I'm working on is about characterizing our mobile network and analyze how its topology compares to other real social networks that are out there. This is also very crucial for us because this network topology of how people communicate, again, helps us understand these dynamical processes that are happening in our telecommunication network. And it also implicitly helps us improve the services to provide better network and make sure that there's no congestion on the tower during communications, during emergency situations and so on. So this also has the network aspect to it, types to it, which I really enjoy.
So yeah, as you can see, I have several different projects and they're all very different and there's always somewhere, some network involved.
[00:20:27] Lauren Burke: That's so cool. What's the specific type of network?
[00:20:30] Noemi Derzsy: So this is for the bias fairness one. So basically there, what we did was we took thousands of papers that were published in the machine learning research space. And then we created this graph of terms used in those research papers. And then based on that, we created this way of identifying potentially new projects or papers that would have any bias concerns based on the terms that they have inside them. And it's all based on this graph approach that we used for the modeling.
[00:21:11] Lauren Burke: That's a great thing to be focusing on, too. I know bias in algorithms, especially in AI based algorithms is such a huge interest that we're trying to develop policies around. So if you're working on stuff that can get ahead of that, that's really good to hear and it seems like you're going to be in a good place.
[00:21:31] Noemi Derzsy: Definitely, and I'm also very excited that our company recognizes the need for that and values it and highly prioritizes this type of work. So it is great to be involved in it and to kind of push the research forward.
[00:21:47] Lauren Burke: Definitely. And so you've actually, outside of work, you've been involved in a number of different initiatives and volunteer opportunities that have allowed you to work with a lot of different types of data, right?
[00:21:57] Noemi Derzsy: Yes. So before I joined AT&T I was doing the Insight Data Science Fellowship where what we do when they transition you from academia to industry, basically, they expect you to have all the skillset that you need. So in those seven weeks, what you do is you come up with a project idea, and then you create an MVP model, and then you finalize that model, and then you have a project read out to the companies that the fellowship is working with. So there I was working with some real data that was a completely different.
One of the things I was involved with during my transition from academia to industry. I also applied for this NASA Datanaut program, which was a program that had applications open. And it looked like an exciting opportunity to start working with open source NASA data and to meet people who have similar interests. This was an initiative to engage the community in exploring open source NASA data because they made publicly available thousands of data sets of different types like image collections and all sorts of different data sets. And they just wanted the community to start working on those data sets and see if they can find any insights.
And there I started working with their metadata, which describes their best collection of openly available data sets. And using that metadata. I started working with NLP, so natural language processing, to make sense of these data descriptions and text data and to try to categorize them and group them and to kind of make sense of their dataset collection.
And I found that very exciting. And then I continued working with natural language processing going forward.
[00:23:49] Lauren Burke: That's awesome. And so with some of the data sets you were working with, were you able to apply network science?
[00:23:56] Noemi Derzsy: With those I wasn't, I probably, I would have been able to, but I was focused on doing something different. So with this one I kinda explored the natural language processing world, and then I found it very exciting.
And then later on, for example, with the project that we published now within AT&T with graph based approach to identify bias in machine learning models. There, for example, I combined both natural language processing and network science. So it really depends on the problem and how I want to approach it. But most of the time there's always gonna be some networks sprinkled in the solution for me.
[00:24:38] Lauren Burke: Right. I feel like even if you don't see it, there's always some way that whatever you're looking at is connected to something. If you make that connection for yourself, it will kind of make the picture a little bit more clear and help you better understand what's going on.
[00:24:52] Noemi Derzsy: Exactly. And I also noticed that people really like graphic presentations. So whenever you can just throw a network to show something, people just find it very intuitive to grasp what's going on. That's one of the other things I really like about network science is that it's so interdisciplinary. You can apply it on any complex system. And also people have intuition to understand it. It resonates with people.
[00:25:18] Lauren Burke: Right. It's visually appealing and easy to understand.
[00:25:22] Noemi Derzsy: Exactly. Of course there's an art and science behind it. How to make it visually appealing and clear. Because sometimes I keep saying that sometimes you just look at the network and it's a hairball, so it, it takes some skill to make sense of it and to kind of reveal something that is of value from that big hairball.
[00:25:44] Lauren Burke: Right. And you can probably scale it down, right? You can decide what level you want each specific edge to be like, right. You can decide how many people you want that to represent, how many different airlines going on a certain path.
[00:25:57] Noemi Derzsy: Exactly. So you can, for example, say that I have this huge social network of millions of people, but maybe I don't want to represent each person in that social network individually. Maybe I'm more interested in different groups of people. So for example, in Facebook you have these different groups with people having similar interests, sharing similar interests.
So you have these groups of dachshund owners, like I am. Then you have groups of people who like cats and so on. So maybe you just want to represent those and then you would represent that group of people as just one node. And then that way you scale it up exactly as you were saying. And then you might just look at those interactions among the groups.
[00:26:42] Lauren Burke: And that's an interesting way to kind of think about it too, right. Because the scale of just one person's journeys, one person's interactions might not be as valuable in terms of like what you can apply to a larger group of customers. But if you take customers and you can group them into similar behaviors, that helps you better understand where, like you said, where you should be working on improving cell service, adding more towers, things like that. That's really cool. I really enjoyed hearing about it so far.
[00:27:13] Lauren Burke: Um, are there any other, like new developments or technologies that you're excited about in the data science or even network science space?
[00:27:21] Noemi Derzsy: Well, as a data scientist, I'm definitely excited about the new technology of 5G that got rolled out. And all the opportunities that it brings and how it's going to speed up our connectivity and all the online interactions. So that is definitely something I am excited about.
And I am in general, excited about all AI-based technology that enables us to live better. I feel that oftentimes the focus is on the negative impacts that AI can have and people kind of forget about the ways that technology has improved our lives and continues to do so. And of course it's not a fail-proof environment, and that is part of our job as data scientists to ensure that we build these trustworthy systems.
So that's why I am also excited about the latest focus and interest in the data science space for building machine learning systems that are trustworthy and ensure that there is bias and fairness consideration to it.
[00:28:25] Lauren Burke: Right. I think we've kind of entered a AI is a scary thing phase. And if it's not handled correctly, if it's not set up and, the models aren't trained correctly, it can be, and it is a very scary thing.
But I think the work you are doing and others are doing, it does show that there are things being done to make sure that the AI technologies and tools we're developing aren't going to be scary, and they are going to be innovative solutions that add value to people's lives.
[00:28:54] Noemi Derzsy: Definitely. Yeah, I think the takeaway is basically that AI and technology brings so many advances, and it's so important to improve our lives, but it has to be done the right way. And that's why I think this focus also on ensuring that they're being done in the right way is very important. Because there's been a lot of push in the direction of making quick research advancements in the AI space.
But we also need to ensure throughout this journey that we always ensure that there is bias, fairness too. So it's something that it has to go together with the research, not be an afterthought once we reach a smart system, then we start thinking about bias. That's not how it's supposed to work.
We should be thinking along the way as we do the research. Glad to see that this has momentum now.
[00:29:53] Lauren Burke: I absolutely agree. So, kind of speaking of tools and resources, what is a resource that you feel like has helped you in your career that you think might help others listening?
[00:30:05] Noemi Derzsy: I am not exclusively use sticking to one resource only when working. It really depends on what I am trying to achieve. Some people rely on only online courses. Others rely on only blog posts or just GitHub. But I like to use different resources for different things. Always the winning solution was a balance combination of all.
So if there is a theoretical detail that I want to dive into or some theoretical background of methodology or something, then I would rely on books or research papers. If I just want to learn about a new tool, like a hands-on quick learning that I can do, then I would do an online course.
And there's so many out there, especially now with the pandemic. There's been even more online resources made available, and I think that's great. And when it comes for coding and tips and tricks about coding best practices, I usually rely on GitHub and blog posts. So it's a combination of each.
But because we've been talking so much about network science, I think one resource that I can point people to, which they might find helpful. It's a newsletter where they also do blog posts regularly with hands on python notebooks to kinda show you how you can dive into the network science space and how you can think about it is Graphs Algorithms for Data Science. And once you sign up for the newsletter, you can just get the weekly updates and new blog posts about networks. And you can also start thinking about networks the way I do.
[00:31:46] Lauren Burke:That's great. We'll definitely link that in our link section for everyone. Thank you so much for joining us. And just to finish off, how can our listeners keep up with you? Do you have any talks coming up or anything that we should be looking out for?
[00:31:58] Noemi Derzsy: I don't have any planned talks right now. But LinkedIn is probably the best way to reach me. I haven't been really active on social media lately, but I hope to get back to it soon. So LinkedIn is probably the best way. Usually if I have something coming up, then I will just post it there.
[00:32:17] Lauren Burke: Awesome. Well, thank you so much Noemi for joining us. I really appreciate you taking the time to speak with us and tell us more about network science, how it's used to understand customer behavior and some of the other exciting things you've been working on.
[00:32:32] Noemi Derzsy: Well, thank you so much for inviting me. This was a great conversation.