Dr. Tanya Berger-Wolf joins us to discuss the development of computational ecology, the latest in the field of conservation AI, and the new field of study that she has established, known as Imageomics.
About Dr. Tanya Berger-Wolf
Dr. Tanya Berger-Wolf is a Professor of Computer Science Engineering, Electrical and ComputerEngineering, and Evolution, Ecology, and Organismal Biology at the Ohio StateUniversity, where she is also the Director of the Translational Data AnalyticsInstitute. Recently she was awarded a US National Science Foundation $15M grant to establish a new Harnessing Data Revolution Institute, founding a new field of study: Imageomics.
As a computational ecologist, her research is at the unique intersection of computer science, wildlife biology, and social sciences. She creates computational solutions to address questions such as how environmental factors affect the behavior of social animals (humans included).
Berger-Wolf is also a director and co-founder of the conservation software non-profit Wild Me, home of the Wildbook project, which brings together computer vision, crowdsourcing, and conservation. Wildbook has been recently chosen by UNSECO as one of the topAI 100 projects worldwide supporting the UN Sustainable Development Goals. It has been featured in media, including Forbes, The New York Times, CNN,National Geographic, and most recently The Economist.
Berger-Wolf has given hundreds of talks about her work, including at TED/TEDx, UN/UNESCO AI for the Planet, and SXSW EDU.
Prior to coming toOSU in January 2020, Berger-Wolf was at the University of Illinois at Chicago.Berger-Wolf holds a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. She has received numerous awards for her research and mentoring, including University of Illinois Scholar, UIC Distinguished Researcher of theYear, US National Science Foundation CAREER, Association for Women in ScienceChicago Innovator, and the UIC Mentor of the Year.
- StripeSpotter (Google Code Archive)
- Science and Method (Book by Henri Poincaré)
- Gregor Mendel (Wikipedia page)
- Mendel's experiments (Article)
Follow Wild Me
[00:00:05] Lauren Burke:
Welcome to Women in Analytics After Hours, the podcast where we hang out and learn with the WIA community. Each episode, we sit down with women in the data and analytics space to talk about what they do, how they got there, where they found analytics along the way, and more.
I'm your host, Lauren Burke, and I'd like to thank you for joining us.
Today, I am so excited to have Dr. Tanya Berger-Wolf joining us. Tanya is a professor at The Ohio state university in the departments of computer science engineering, electrical, and computer engineering, as well as evolution, ecology, and organismal biology. She is also the director of the Translational Data Analytics Institute at OSU. She was recently awarded a 15 million dollar grant from the National Science Foundation to establish a new Harnessing Data Revolution Institute, founding a new field of study known as Imageomics. In addition, she is a director and co-founder of the conservation software nonprofit Wild Me, home of the Wildbook project, which brings together computer vision, crowdsourcing and conservation. As a computational ecologist, Tanya's research is at the unique intersection of computer science, wildlife biology, and social sciences.
She has joined us today for an incredibly interesting conversation, and I hope you enjoy listening along.
Welcome Tanya. I'm so excited to have you here with us today, and I really appreciate you taking the time to join us.
[00:01:40] Dr. Tanya Berger-Wolf:
Thank you, Lauren. I'm really excited to be here.
[00:01:44] Lauren Burke:
Awesome. So, Tanya, you've had a really interesting career path so far. From where you got started studying computer science to your groundbreaking work with computational ecology, to the founding of an entirely new field of study. So just to start off, could you tell us a little bit more about your background and the path that led you to where you are today?
[00:02:05] Dr. Tanya Berger-Wolf:
It's a winding path that spans three continents, five countries, and many career choices that changed the path along the way. Cuz I was born in Lithuania, which at the time was still Soviet Union. And I was a math geek, wanting to do math and wanting to be a math teacher. Preferably high school as growing up because both my parents were teachers. And also because being Jewish, being a Jew in Soviet Union at the time, there were not, not all career choices were open. So I kind of thought that was the pinnacle of achievement. And then I immigrated to Israel at the age of 18.
So I did my undergraduate in Israel and starting out also thinking that I wanna be a math teacher, but just in case, let me also apply for this computer science thing. So along the way I found out that what I really like is computer science, more so than math. And I graduated with a dual degree in computer science and math. Switching the order as I first started, I was math and computer science, and now I'm graduating computer science and math.
And was all ready-set-go to either go and work in industry or in high school, when my husband, by that point I was married, I got married very young. He's an ecologist and he wanted to do a PhD. I did not want to see a course or a homework or a test ever again in my life. So we, we, yeah I was working in industry already at that point. And I also was working at a university supporting a college department.
We came to US, to Urbana-Champaign. And I started auditing classes. That was a big mistake. I started auditing classes in computer science and realizing that I wanna keep going. And so within a year, after spending a year modeling animal populations for the army Corps of engineers. I started my PhD in very theoretical computer science.
But along the way, if you notice I worked in an ecology department modeling also the dispersal of seeds on islands. I worked for Army Corps of Engineers modeling birds and the impact of various activities, especially at army bases, on endangered species. And then talking also to my husband and a lot of his colleagues and walking away from many of these conversations and the things that I was doing with a feeling, oh, there gotta be a better way of doing it than just writing this model with more parameters than birds in the population, running it 3000 times and getting 3000 different answers of what happens if you run a road through the habitat.
So to me, it was very frustrating and the wrong way to go about things. So, but it took me. I clearly, I, it takes me a while to come to a realization. It took me towards the end of my PhD to think, oh, wait, I can think of better ways of doing this. I should be thinking of better ways of doing things rather than just complaining. By the way, the same thing happened when I was complaining about the women in engineering organization, and somebody said, well, do better start your own.
So I stopped complaining and went to do a couple of post-docs that were very explicitly focusing on figuring out what's a better way of answering questions in ecology with computational approaches. And that's where I started figuring out also what to call this new field. Computational biology was kind of being born at the time. And I was like, well, I'm doing biology, but really I'm doing ecology. So probably computational ecology is the right name for this. So for a while I used to track how many computational ecology mentions there are on Google, you know on the web. When I started out, it was two. Now there are millions.
So that is a very great thing to see. That something you come up with, a term that becomes commonplace. I'm hoping the same thing for Imageomics. And so, yes, that's where the intentional switch came. And so when I started a faculty position, I already was doing research and explaining to everybody I'm doing research in computational ecology.
[00:06:30] Lauren Burke:
That's so awesome. And I imagine as you're seeing computational ecology becoming more and more used term, you're getting more and more people calling themselves computational ecologist as their first title, which I imagine is pretty awesome to see.
[00:06:44] Dr. Tanya Berger-Wolf:
Oh, yes, it is exciting. And even more exciting when I start seeing the term, and this was a while ago already, being used by people that had no idea who I was, which was great.
[00:06:57] Lauren Burke:
That's so cool. So when you got started with computational ecology, and sort of bridging that to your work in conservation AI, was that something you'd always had an interest in when you had gotten started with that?
[00:07:11] Dr. Tanya Berger-Wolf:
Nope. I am a city girl. A city girl and a math geek. I grew up in cities all my life, other than the brief stint for PhD in Urbana-Champaign. Which is the smallest place I've ever lived. And so, but even that, you know, I'm not a field person. You know, I typically, when somebody says do you like animals, I was like, yeah, I've had a cat all my life, you know? So my husband, who as I mentioned is an ecologist who studied bees and bugs and kept lizards for pets was really a challenge to our relationship at the beginning. He's the field person.
So when I started my second postdoc in Princeton, I by chance interacted with Dan Rubenstein who's a behavioral ecologist focusing on social behavioral of animals, mostly zebras. And he and his students were asking questions about zebras that led to my work in developing methods for social network analysis of zebras. But, you know, at the level of abstraction that we develop these algorithms, it really doesn't matter whether it's zebras, baboons, ants, or humans, you know, or brain cells for that matter.
But he kept on saying, you gotta see your data, you gotta see your data. And I'm like, nope, looks beautifully in my CSV file. You know, I know what zebras are doing and who is who's zebra friend. He's like, no, you gotta see your data in the field, it makes a lot more sense.
So he and his students dragged me out to Kenya, bugs and dust and all. And no electricity in 2008. I can tell you many stories that people still are laughing about the level of my field inexperience with bugs and dust and grass flies that fall through the mosquito net on your bed that I thought were fleas. And nearly was you know, ready to leave Kenya right there and then.
Or the elephants rubbing up against the back of the wall. And not being able to go to the bathroom in the middle of the night because there are elephants between you and the bathroom. And learning that you're not supposed to be quiet when the elephants are around, but very loud. And so talking “Dear elephants, I really need to go, could you please move away?” And keep on talking while you're like going to the bathroom and hoping that they're not come back, and all these kinds of fun things.
But that really made a difference, seeing your data. It made a whole lot of sense. I finally understood what they meant when they were talking about fission–fusion societies and leadership. And how they form coalitions and why it matters that the pregnant females need more water, because it makes a lot more sense in that landscape when you actually see all of that. And so the questions that biologists were asking started to translate in my mind into computational abstractions and questions about data in a very, very different, fundamentally different way.
And so I insist now that all my students go to the field at least once. And we're teaching a course, in fact have been teaching a course, where we take computer science and biology students and take them out to the field in Kenya, where they work on interdisciplinary projects. And so it is in that project, in that course, that one of the first projects that kicked started this whole conservation and Imageomics whole direction.
Because while I was in the field and seeing my data or seeing the zebra data, I asked one of Dan Rubenstein's field assistants, Rosemary Warungu, “ How do you get the data on who is who's zebra friend, how do you actually get the social network information?” and so she showed me how she drives in the field every day and then collects pictures. Every time she sees a zebra, she takes a very, very good, and it has to be very, very good picture of the zebra from the right side, with the zebra very well centered. Each zebra, and not obscured by anything. And then she takes GPS.
And I was tired just listening, but you know, this is the field work. This is good. And then I said, “so how do you actually recognize?” And then she showed me this old program where you take a picture of the zebra and you click on the outline very carefully of that zebra, so that it can fit a 3D model of the zebra and match stripe for stripe, and then find a matching pattern.
And I was watching her do it. And about 30 seconds into it, the impatient engineer in me: “So how long is it gonna take?” And she's like, wait. Two minutes later, I'm like, this is taking forever, you know. Five minutes later, I'm like, this is nuts. There has gotta be a better way of doing it. This is insane. I was jumping up and down and was fuming, you know. And it took 20 minutes per zebra.
[00:11:58] Lauren Burke:
[00:11:59] Dr. Tanya Berger-Wolf:
Right. And so I was melting down at that point. But when I was saying that this is nuts and there's gotta be a better way of doing it, one of Dan's pose docs, she said, “Oh, Tanya, you always say that there's gotta be a better way of doing it. Do you think you can actually do better and put your brain where your mouth is?”
I'm like, “You wanna bet?” So, so I was like two clicks. It should take two clicks and just couple of seconds. That's what it should do. And so I went, we are about to teach this course, field computational ecology, where would take students, as I mentioned from different disciplines, throw them together in the field and tell them to go and do something amazing, for the first time in 2010.
And I went to my then PhD student, Mayank Lahiri, and I said, “Hey Mayank, I just bet my reputation that we can identify individual zebras from photographs with two clicks.” Not that my reputation was worth a lot at the time or now, but, but I thought it was, you know, it was worth a try. But he already had an idea and that was his class project in that first course.
We published a paper. It captured everybody's imagination. StripeSpotter, that you can identify individual zebras from photographs, you know, walking barcodes. It was on NPR, it was on National Geographic and other popular media. But the algorithm itself had nothing to do with computer vision.
It was a very kind of hacky approach to doing this. But it worked and a proper computer scientist heard about it because of all the public media attention, Chuck Stewart from Rensselaer Polytechnic Institute. And he approached me at a conference and was like, “Hey, heard about the algorithm, looked at it. Great. We can do better.”
I'm like, fantastic, great. And so he and his student, John Crall, actually did better. And they created a proper first computer vision algorithm for identifying individual animals from photographs. And this worked not only for zebras, but for anything striped, spotted, wrinkled notched. And so when that was published in an obscure, very obscure computer vision conference, we were surprised that within two months we had about 70, 7-0, different requests. Can you do my animal? Can you do my animal? And that was everything from like snails in Hawaii to wild dogs in Africa and killer whales and so all kinds of animals.
And I was like, wow, I guess this can be very useful. Not only for obscure computer vision paper. And so that's where we're like, okay, if this is to be truly used by people, researchers in conservation, where we saw then a conservation demand, it has to be properly engineered. It has to be a system. It cannot be command line algorithm where you have to figure out how to kind of upload pictures to a folder or something.
And so I spent a year about designing what a system would look like and looking for good data management platform for this kind of data. And that's where we partnered up with nonprofit Wild Me and Jason Holmberg, who was at the time volunteer from engineer in his spare time designing this platform.
And we kind of came together and created what is now the new iteration of Wild Me and the platform Wildbook and other approaches. And so that's when, once we started, it was absolutely clear that not only that there is a huge demand, but that we're only seeing the like super tiny tip of that demand, and how critical that was. Because biodiversity today has a data problem, a serious data problem.
And so while I'm not the person, as I mentioned, to go out in the field to this day, and implement a lot of the conservation policy and make sure that we have the right management, the right resources, the right habitat set aside for animals and that they're protected managed in the right way.
Data problem is something that we can do something about. And so really putting then intentionally looking, what is the data problem in biodiversity? What are the ways to address it? Where can we make the most impact and how to make it trustworthy? Because it's not enough to design a good algorithm. It's not enough to make sure that you are accurate, 96% accurate on your test benchmark data set. If that's not gonna be used by the people who need to use them, at the time that they need to use them, for the things that they need to use them, it's all useless.
[00:16:26] Lauren Burke:
That's a lot of very good points. I want to go back to a Wild Me and Wildbook for a second. How does Wildbook work?
[00:16:34] Dr. Tanya Berger-Wolf:
So Wildbook is a platform that uses images as the source of information about animals. And this is animals, marine as well as terrestrial. All kinds and sizes. So we can get images from any source, whether it's scientists and field assistants collecting data in the field from field projects. Camera traps; so those are trail cameras, motion-activated. Drone images and other autonomous vehicles, underwater, on the ground, or in the air. As well as tourists, volunteers taking pictures and either directly uploading them to the platform or just posting them on social media. And then we have bots that get images and videos from social media.
And then we have the whole image analysis pipeline behind, that takes these images, finds the ones that contain animals. So that's detection. Puts a bounding box around each one, identifies, you know, does all kinds of intermittent things like quality and segmentation and all of that. But then goes all the way, not only to species classification as in Grévy's zebra, Savannah elephant, and humpback whale, or hawksbill turtle.
But the unique aspect of what we can do, based on all that previous research, is we can individually identify animals. Anything striped, spotted, wrinkled, notched, or even using the shape of a whale's fluke, or the dorsal fin of a dolphin, or the shape of an elephant's ear most recently.
And now we're adding algorithms that use facial identification for animals like bears and cats and primates. And pushing it as far as we can to use any biometric identification to identify individual animals. Because then we can track and count animals and do population ranges without putting collars or other sensors, right. To do it a lot less invasively and at scale, because we're using, different type and different scale of data through images. These approaches allow us to access to those data.
And so the Wildbook as a platform. It has both a good user interface, it's web based. Where you can, you know, upload images, do search queries based on geographic region, on species, on particular timeframe and so on and so forth.
But it also has all the image processing behind it and it has also good data management platform underneath with very, very good data architecture. That allows not only these queries, but also good protection of these data. Cuz these are endangered species and the geotagged images of endangered species is highly valuable data for wildlife criminals including poachers. And so we implement pretty sophisticated access control mechanism and it sits in a protected data enclave on the cloud. So those are kind of components of the platform that ultimately allows us to go from an image of an animal to its identified, and annotated version that is added to all the information about that animal.
And then we have also implemented a lot of connections to standard biological data sets and statistical tools.
[00:19:56] Lauren Burke:
That's so fascinating that you can actually do facial recognition on animals. I've heard for animals like zebras, right, the stripe pattern is as unique as our fingerprints. But I did not know about the elephant ear recognition or any of those others you've mentioned. That is absolutely incredible. And it's really, really interesting work. I imagine it's very useful that you are able to have anyone to take that picture. And then you don't have to rely on like the field team, like you were saying earlier to be the ones to go out and be manually tracking or manually tagging the animals. The animals probably appreciate it too, right?
[00:20:35] Dr. Tanya Berger-Wolf:
Well, yes. And in some cases, maybe not. Too many tourists swimming with whale sharks and trying to touch them.
But it does allow us access to data that previously was not used for science and conservation. And so, you know, for example, for species like whale sharks, these are global species. These are animals that can travel 5,000 miles in an individual animal. But typically the conservation projects and research projects are focused in one particular place and rarely share information between, let's say Mexico and Philippines. And yet the sharks do travel these distances.
And so being able to reidentify an individual gives that connection between places, incentivizes data sharing. So at the bottom of every Wildbook page for an individual will list all the contributors of data to that specific individual. All the organizations and projects, and even citizen scientists that have contributed. So people then see that this one individual has 12 different contributors. That incentivize data sharing. And that also reveals these global travel patterns.
So one of the papers that I'm most proud of in my entire career is the paper that I'm not even a co-author on. It's the paper that was published in December 2017. It's the most comprehensive paper to date at the time on the biology of whale sharks. And it talks about the migration patterns and the seasonality of migration patterns and other things. And that was only possible by using all of the data from Wildbook for whale sharks. And it was coauthored by 37 authors, most of whom never met other than through the pages of Wildbook. It is the technology that was intentionally designed to connect data pieces, to connect and incentivize sharing. To connect among data, people, data and people, geographic locations and across different species. Right? So this to me is the power of opening up data sources and combining them in a good way.
The other thing that was amazing, specifically for whale sharks. We designed this intelligent agent, a bot, that scrapes publicly posted videos of whale sharks on social media then finds the ones that are wild ones, runs them through the image analysis, identifies the individuals. So sends off the bot, sends off everything to the image analysis pipeline, and then the videos are added to the appropriate page of Wildbook for the appropriate individual.
And then the bot posts in the comments of the video on social media, “Hey, two minutes, 46 seconds, we found this whale shark in your video. Here is everything we know about it.” And people respond. I mean, first it takes them a minute to realize that this is an AI posting and then like, wow, this is amazing. And the first question that we get is “How can I help?” Right? Because they suddenly have a very personal connection. Something that they didn't even think was possible. That this video of swimming with a whale shark in Cancun or Philippines or Madagascar, that like an amazing moment of connecting with nature can actually help science and conservation. And so typically they have many more photographs and videos that can help.
But even with the ones that are publicly posted, what we also found out is that in the first year that we started collecting them in 2018, the sightings of whale sharks from these online videos, there were more of them than all these sightings from all the human contributors combined. Right? So these constituted more than half of the sightings for that year. And we're like, okay, they're probably repeated by humans, right? We're probably getting repeated sightings, not new information. No, 96% of those were unique sightings, not reported by any other source.
[00:24:52] Lauren Burke:
That's pretty cool.
[00:24:53] Dr. Tanya Berger-Wolf:
It is amazing. That means that the traditional sources of data were missing at least half that we're seeing through this passive data collection approach. And the power of that, right? The ability to scale is incredible.
So one of our newest Wildbooks is for the addition to an existing Wildbook. One of our newest species is killer whales. So killer whales, orcas, their conservation status is determined by the International Union for the Conservation of Nature red list. Which is the official conservation organization that determines conservation status of species. So the official conservation status of orcas, of killer whales, is data deficient. I take that personally. I mean, these are iconic species. It's not easy to miss an Orca, right? These are the largest Marine animals. This is not something small we're talking about. Iconic. How can we not know how many there are and what's their range?
So within just a couple of months of adding killer whales to Wildbook, there are thousands and thousands sightings that were reported. We hope that by the end of this year, but certainly very soon after, the species commission for killer whales for a UCN red list will have enough data from Wildbook to reassess the conservation status for the species. And so they will be data deficient no more. That's the power of the data that is already out there.
[00:26:34] Lauren Burke:
That's incredible that it has not only the impact to just track individuals, but also to have such a large scale impact. Allowing you to have enough data to maybe even change the conservation status. I feel like, especially if something is going toward extinction and you're able to identify more individuals and say there's less of a population than we originally thought. I can't imagine how impactful that can be to so many different species that are not even being considered for various lists, right. Because we just don't know how many are or are not out there.
[00:27:07] Dr. Tanya Berger-Wolf:
That's exactly right. And that's actually what happened with whale sharks. So based on the data from Wildbook, in 2016 the species commission for the OCN red list for whale sharks, reassessed the status and it changed. They changed it from vulnerable to endangered and the population trend from stable to declining. Decreasing, not because the species are doing worse, but because we have the right data.
[00:27:33] Lauren Burke:
Right. You know now.
[00:27:35] Dr. Tanya Berger-Wolf:
Yeah. We know now. And that has huge impact in terms of the policy, resource allocation, the type of protection, of protocols and so on and so forth. Right. So we can actually do something about it. And so it's both extremely rewarding as well as terrifying, the responsibility of knowing that whatever, you know, thousands of lines of code. Text, right. Can change the fate of a species. That really, really makes me think through every method that we're using, every number that we're producing, every analysis that we're thinking of doing. Like, can we stand under that bridge? Can we be sure? And how sure that this is the real answer?
[00:28:37] Lauren Burke:
Yeah, that's a very large amount of pressure. I imagine it helps the more people you can get involved. Right. If you can get more citizen scientists, more members of an open source community involved, I imagine that's something that could potentially be very impactful for determining that status, maybe helping avoid extinction for some of these species.
[00:28:58] Dr. Tanya Berger-Wolf:
We hope so. And the fact that, you know, people can help by writing code, by doing analysis, by collecting data, by doing annotations, right. By helping create training data. Thousands and thousands of volunteers out there doing different aspects of it. And of course more also testing these algorithms, deploying them in the field and using them to protect the species through the various conservation organizations out in the field.
But this aspect, it brings in also the participation from people who until now probably weren't thinking that they could participate in conservation and could contribute. So yes we're open source, open collaboration, protected data. We do not open the data for the reasons. First of all, it's not ours. And because it also is highly valuable endangered species data. That data of endangered species, that can be highly valuable for nefarious purposes.
But we've always had our source open and we're working towards creating a proper open source community so that people can contribute and revise and review and update the contributions. We are seeing already a little bit of that happening and the community is growing. We will support that community in the future in a more engaged way.
And we have always, always opened to collaborations. One of the kind of very tangible outcomes of that is that of the currently eight algorithms that are used within various Wildbooks to identify individuals for different species. I think four were designed and contributed by people who are not members of the core Wildbook and Wild Me team. And we've incorporated them into the Wildbook. So the more the merrier. The problems are so urgent and so challenging, and there's so few people working in the field of AI for conservation that every contribution is highly valuable.
[00:31:04] Lauren Burke:
Yeah, I imagine it's very difficult because it's such a niche area and you want as many people to contribute as possible, in any way that'll make an impact.
What are some of the challenges and limitations in developing conservation technologies specifically around some of these AI-based tools?
[00:31:22] Dr. Tanya Berger-Wolf:
There are many. The technical ones are that we typically don't actually have enough data for ground truth. We don't have good data, so it's all imperfect data. It's noisy, it's blurry, the lighting is not the one that a lot of the algorithms are trained on. So it's also what's called out of distribution.
For most of the algorithms that are being developed, they're trained on very different kind of data. So once you deploy it for the images, in this case that we're seeing in the wild, coming from the wild, it's really not performing well. So the problems are very challenging technically. So we work at this very, you know, close pace of research level development of approaches that are immediately deployed by engineers into the system. And they're the ones then bringing up the next challenge, and it works in this great partnership.
But the challenges are still big. There are technical challenges and research challenges, which is part of why we also started the whole new field of Imageomics. Partially. Imageomics is a lot bigger than just individual identification or even detection and classification and individual ID of animals from photographs.
The other challenge is the trustworthiness aspect I mentioned. So the whole part of having good algorithms for let's say individual ID is only a small part of making AI trustworthy. Part of it is we have a tendency of designing algorithms and then giving them, “Okay, now here's an algorithm. You can use it.” And it's quite often black box, right? So the first part of it to make AI trustworthy, especially in cases like this is to make sure that it is designed with those for whom it is intended. Not for them. And that there is as wide participation as possible of people in the design process. And including a technical aspect of the design.
But for that to happen, we also need to build capacity in places like Africa, like South America. Where we have the highest biodiversity in the world and the ability to use this technology, which is not also concentrated in very, very few places in the world. The second aspect of it, of the trustworthiness is indeed the interpretability of the results and the responsibility and the transparency of the process.
So the transparency of the process is being able to say, here are the assumptions. Here's the confidence intervals. And here's the data on which, and the conditions under which this approach has been trained. You know, if it's out of this, maybe we cannot apply. That's the transparency.
The interpretability is something that is still very much cutting edge research that is happening in AI and working towards explainable AI. I always say that explainable is a dream, interpretable is doable and transparent is critical. And so we have to do transparent and we have to do it now. Interpretable is something we're working towards and explainable maybe hopefully one day.
And then the third part of trustworthiness is the do no harm.
And the do no harm, we have this excitement, oh my god, I have this new algorithm. Right. And it's improving accuracy from 93.6 to 93.7. Wow.
No, right. First of all, I don't care about that. People ask me, “So what's the accuracy of your animal identification algorithm?” I'm like wrong question. Because it really depends. It's not one algorithm, it's a pipeline. It really depends on the species. It really depends on the condition. It really depends on a lot of different things and that's not the point. We're using individual ID for something, right. We're not using it to identify individual and stop there. We're using it to estimate population size. We're using it to track individual animals. So that's where we need to ask about, how accurately are we doing population size estimates using images with individual ID, right?
And does the accuracy improving from 93.6 to 93.7, make a difference in the final population size estimate. And if it doesn't change the population size or the conservation status or the population trend, then why, why are we spending the resources. And uh, doing something that we're not sure, you know, with all the biases and assumptions that went into it may not be better than back of the envelope calculation or what was there before. Why I was spending this actually very expensive resources. AI is expensive. The human resources are expensive as well as AI uses rare metals for all the GPUs and everything else. Huge energy and water consumption.
And to do that, especially in the context. So this environmental cost, especially in the context of conservation is hypocrisy, right? So, so we need to ask ourselves is what we're creating better than what was there before. So let's do no harm.
[00:36:53] Lauren Burke:
That's a really interesting position to be in. You're at the center of, are we at a good enough point? Is there benefit to improving a little bit more versus this cost it might take to take advantage of that opportunity?
The way you're approaching it as not just the metrics, right. You're not trying to get this most accurate data point and put it on the website. You're trying to get the results that make the impact that is most useful. Either identifying that population, to helping support that research against that species extinction, things like that. It's a really interesting and unique approach.
[00:37:35] Dr. Tanya Berger-Wolf:
I hope it's less unique with time. Because we are talking now about, you know, human AI partnerships. And it truly has to be. It's not the human in the loop. It's not the use of AI. It has to be a partnership for many reasons. Not the least of it is that we're solving, in this case, societal problems. AI is both the ingredient of that solution and responsible for a lot of the impact of the outcomes of the approach.
Which is why a lot of this, you know, it has to be a partnership from determining what's the question, to what's the method, and how do we measure the outcomes and how do we measure the results. So I don't think we are in many cases, other than in, in homework questions, going to be concerned with a accuracy of one algorithm.
But in reality, it's with a change in the outcomes of a policy, of an approach, of a protocol, of a decision that includes AI as an ingredient.
[00:38:46] Lauren Burke:
Right. I think that's the effect of the way we're using some of these algorithms that are coming out and their effect on policies, on privacy laws, things like that is becoming more and more of a focus in all areas that touch AI, that touch technology in general.
I know, you've talked before about animal data privacy laws, which I think is a very important aspect of this work you're doing.
[00:39:18] Dr. Tanya Berger-Wolf:
Yeah. Unfortunately, animal data privacy laws or even policies unfortunately does not exist. And so one of the issues that we've raised and keep on raising over and over again, is that aspect that not only individual data points about endangered species have to be protected. You know, the geotagged image of an elephant from Kruger National Park posted on social media led poachers to that location within a couple of hours.
The problem is that also the aggregations of these data are now even more useful. So for example, when Strava posted the Fitbit information open, you know, publicly, some people started looking through the tourists on safari data on Strava. And you can see where they stopped and with high probability, they are probably looking at something interesting. Right? So there's probably a rhino or a lion that's there. This is an example of how aggregating data and using side channel essentially that leaks information about endangered species is something that both the data availability and the data analytics allows to take advantage of those data both for good and for not so good.
So how do we think about protecting those data? Right. What does it mean to protect this data? Because it's okay obviously to say that there have been elephants on the continent of Africa in the last year. So that's the level of aggregation where both in time and space, doesn't give enough information to poacher to go after an elephant. But saying that there is an elephant at this location right now, is something very useful for poachers.
So somewhere in between is a good level, which is both, you know, not totally trivial, but also doesn't harm the animals. And we have to start thinking about where is that level and how does it change when we start aggregating data. And what data, like Fitbit information or other, can leak information about endangered species. And what policies should be used for data protection?
[00:41:47] Lauren Burke:
Yeah, that's something that not just in relation to conservation AI, but every field of AI, we're starting to see. I've seen a similar study about personal fitness devices, where they publicized basically the results of a study and people were able to identify 95% of the adults just from that data, which is pretty crazy. And no one is hunting the adult people, hopefully. So it's worse that we can also do that with animals that are potentially facing extinction, right?
[00:42:21] Dr. Tanya Berger-Wolf:
Yes, but you're absolutely right Lauren that a lot of data governance policies which actually exist for humans. We had about 20 years now of trying to figure out what are the right data governance laws. So we're a little bit ahead of AI policy on the data governance policy. We have some things in place for humans about privacy and data leakage and security which occasionally, yeah, we find out that we're leaking data from unexpected sources. But we're at least more and more aware of it.
For AI, we are only starting to figure out what's the right policy and what are the right guardrails around AI. So the very, very first proposal for AI governance has just been put forth by European commission. And you know, whereas GDPR has been in place for a while, the first proposals on AI policy are just now coming out. And I think it's an opportunity for many of us to get involved in the process and make sure that our opinion is heard. That we kind of pay attention to the things that are coming out. I mean, there are states that are putting in place AI governance policies. California has its own version that protects consumers from, for example, from use of face recognition software and other policies governing use of AI.
We're seeing some medical, at the federal level medical use of AI being governed. So starting to put a little bit of kind of rules of when and how to use AI and for example, medical diagnosis. And so I think it's going to be more and more we're going to see more and more of these proposals, that govern aspects of our lives that we, some of us are probably not even aware where AI is used, but it really is is touching all of us.
[00:44:35] Lauren Burke:
So you were recently awarded a 15 million grant from the National Science Foundation for your efforts to establish this new scientific field of study known as Imageomics that combines the use of computational tools and biological knowledge bases to analyze images of living things and help us better understand biological life processes.
What led you to create this new field and what kind of impact can we expect to see from it?
[00:45:04] Dr. Tanya Berger-Wolf:
So I'm not alone creating a field. This is a great team of people, biologists and computer scientists. We came from different kind of backgrounds and motivation, but for me really this journey of using images as this source of information about animals led to the point of, you know, we can do better.
So we're using images for conservation, right? We're using images as the source of information of who is in this image? Not only which species, but who is that animal? And that allows us to answer questions about population sizes, dynamics, and other things. The next step, can we really extract biological information directly from images? And what kind? Can we extract biological traits from images? You know, we've been looking at the world to understand the natural world for millennia, really, as humans. We've understood understood the world through observation. Literally by looking.
And more recently, you know, Poincaré, who wrote this book called Science and Method, this is 19th century, which we use today as the foundation of what is the scientific method. He has this wonderful quote at the end of the book, which says the scientific method consists in observation and experiment.
“If the scientist had an infinity of time at his disposal, it would be sufficient to say to him, ' look and look carefully.'“ But since he has not time to look at everything and above all to look carefully, and since it is better not to look at all than to look carelessly. Something, I say to all my students, he's forced to make a selection.
The first question then is to know how to make this selection. And so what all of computing, all of the technology, what it really allows us in the context of science is to look more carefully at more things. Including at nature, right?
We've gone from looking with human eyes, to looking with digital eyes. We're looking at nature through microscopes. We're looking at it through binoculars and telescopes. And we're looking at it through satellite images. And so the technology changed the scale of what we're looking at from literally from celular, molecular to planetary scale. And it changed the kind of amount of things and how carefully we can look at these things.
And so particularly with looking at nature over the years, we've started looking instead of looking at images or at just visual observations, we switched quite a bit to looking at genes and gene sequences, right?
Because with advantage of genomics, we really focused on analyzing the sequence and understanding the different parts of the sequence and focusing on the genotype and the phenotype. Which is the collection of biological traits that make up an organism of the species.
We really focused on the part of from genotype to phenotype. You know it is understanding which genes are responsible maybe for different diseases or for different appearance, or even your, you know, the ability to fold your tongue.
Did you know that the ability to fold your tongue, whether you can fold it into this little tube, is encoded in one gene? And it's a trait, right. And it's a trait that you inherit from your parents or don't.
So it's an example of a inheritable biological trait. And we know the genotype and how it's related to genotype. And so with more and more over the last several years, a couple of decades, focused on this connection of phenotype to genotype by focusing on the genotype.
But with the current technology that allows really to understand images at the level never possible before. And with a huge availability of images, particularly about nature and animals and, and plants, you know, coming from all these sources that I mentioned from scientific projects and citizen scientists and more recently, digitization of biological collections of natural history museums.
There are millions and millions and millions of images out there, probably billions. Just, you know, in terms of statistics, one source, the largest citizen science platform for nature, iNaturalist, has more than 110 million observations today. That's insane. All over the globe. Right. Of close to 400,000. So half a million almost, species out there. Way more by the way than ICUN red list.
So that's just one source, citizen scientists, and we have all the other ones. And that is an incredible source of information that we've until now pretty much not used. We can look at many, many more things. And we can look more carefully because the computer vision and machine learning technology has also been developing. We now have methods for detection, localization of objects in an image for individual identification, as I mentioned. You know, classification galore. We can even do pose estimation, from which we can do behavior and environmental reconstruction of the, the scene.
So how do we combine that, the availability of data and the tools to analyze, to really extract biological traits from images? Because until now that's what scientists, that's a phenotype. That's what scientists are looking for when they observe the world. Right?
We all may remember the Mendel's peas. The color, the shape of the pea pod, the color of the pea pod, the color of the pea, the shape of the flower, and so on. But traits as characteristics of an organism can describe its physiology, morphology appearance, right? Health, life history, demographic status.
And even behavior like playing dead. But in a possum playing dead when when threatened. Or birds, the way they weave nests. That's a behavioral trait. You know, health traits. The gait or the scarring or how thin the animal is, malnourished, and so on and so forth.
So can we extract all of this directly from images? That's what Imageomics is. So this is just like genomics before. Is the field of science of going from sequences to biological data to biological insight using quantitative approaches.
Imageomics is, we envision it as the field of going from images to biological insight. Specifically biological traits in the phenotype using quantitative approaches.
And we're starting with computer vision, machine learning approaches that leverage the structure of biological information that can constrain the machine learning models so that the answers to things like classification and identification are actually interpretable and biologically meaningful. So we're also driving and pushing explainable AI methodology development.
[00:52:29] Lauren Burke:
That's extremely interesting and I can't wait to hear what else is coming up in the field of Imageomics because it sounds like there is a lot of potential impact from this field and a lot of potential benefits towards many of these species that are out there. And all of these images and all of these videos, that just haven't had anything applied to them to figure these things out. So that's really awesome stuff.
[00:52:54] Dr. Tanya Berger-Wolf:
We hope so. I mean just to begin with we're already seeing the promise of Imageomics because the traits that humans are using, even to classify species, to say that this is one species, this is another. We're using a very limited tool; our own eyes, right? Only things that are visible to us and our own spectrum, visual spectrum, with our own visual model. And what we consider important, but we've evolved as species to pay attention to very specific things. Not the same thing that maybe a bird is paying attention to or a butterfly.
And so recently, for example, there was a paper that showed that genetic color variation, visible for predators, birds, and other insects is concealed from humans in a species of moth. So humans were not able to differentiate different species of moth, but machine learning algorithms could. So what we see as the biggest promise Imageomics is that computers will see what humans miss, right.
It is that partnership of AI and human which will help us find traits that we have missed, that we have not been able to see. Either because we don't have the hardware or because we didn't pay attention to these traits and understand biological processes that were hidden from us. Because we could not see so that computers now will help us look at more things, more carefully, so we can understand really life on earth.
[00:54:26] Lauren Burke:
That's so incredibly interesting. And I'm definitely looking forward to following the future of Imageomics because it sounds like there is a ton of stuff happening and the future looks very, very bright for it.
Before we wrap up. The final question I like to ask everyone: what is one resource that has helped you at any point over the course of your career that you think might help others who are listening?
[00:54:53] Dr. Tanya Berger-Wolf:
Talking. Really. Talking to people and listening. So asking questions and listening for the answers, having conversations that may not seem like the most useful one at this particular moment for the things that I'm working on, but turn out useful years down the road, or in a very different context that I did not anticipate at all.
And so, you know, these relationships and connections is probably the most useful resource. And making them through conversations, through reaching out, through just asking questions and listening. You know, I say that about, only about 10% of my conversation starters, pan out into some kind of collaborations and projects, but you need the other 90 for the 10% to come through. And you can have fun in the process.
And serendipity often is the most important factor in how the projects actually happen and which ones work and which ones don't and which ones go somewhere and become a whole new field of science.
[00:56:05] Lauren Burke:
Yeah, I feel like almost every conversation I have with someone in the field or in a different field, I learn something new. And you're right, even if it doesn't pan out, it's never a bad thing to learn something new from someone.
[00:56:17] Dr. Tanya Berger-Wolf:
Not only I learn myself when I talk to people, but it's these, you know, your ideas then kind of that keep on living in your brain may be ignited by a word by sentence by something that you hear from somebody else. And it's fun.
[00:56:37] Lauren Burke:
Definitely agree. So where could our listeners keep up with you and learn more about your work?
[00:56:44] Dr. Tanya Berger-Wolf:
So both Imageomics and Wildbook are on Twitter and on LinkedIn. And that's the best way to keep up with what's going on.
So you can follow Imageomics at @imageomics on Twitter or Imageomics on LinkedIn.
And you can keep up with our AI for conservation nonprofit on Twitter at @ WildMeOrg and on LinkedIn at Wild Me.
[00:57:14] Lauren Burke:
Awesome. We will include everything in the link section. So everyone should definitely go check those out.
I've personally looked at Wildbook and Wild Me, and it's incredibly interesting. Even if you are not yourself in AI, in conservation, even in technology, it's really, really interesting stuff. And it gives you a small glimpse of how you can contribute, your one image could make an impact, could lead to some insight that is beneficial.
[00:57:42] Dr. Tanya Berger-Wolf:
Yeah, and go on vacation or just walk around the backyard and take pictures. And, you know, if you are not an algorithm or technology developer, you can contribute to conservation by uploading your pictures. It doesn't even have to be with us. Connect with other platforms like eBird, iNaturalist. But most importantly, go out and look at nature.
[00:58:05] Lauren Burke:
Well, thank you so much for joining us today, Tanya. You're so interesting. You shared so many really incredible ideas with us.
And I really love that basically, everything you've started, everything you've worked with was because you saw something and you said, I want to fix that. I want to be the one that makes this better.
[00:58:24] Dr. Tanya Berger-Wolf:
Thank you. I'm an engineer at heart, and I also am lucky that I have incredible set of people who also want to fix things and are brilliant and good at collaboration. And we can have a great team working on fixing all these things and coming up with new solutions.
[00:58:47] Lauren Burke:
That's great. A great team can only make you better. So thank you again.
[00:58:51] Dr. Tanya Berger-Wolf:
An Analytics Community. Featuring Women. For Everyone.