Bellincat’s 2023 Summer Fellow joins the podcast to discuss his research on AI and how well both humans and algorithms can differentiate between human and AI-created art. Get tips for recognizing AI-produced imagery, what the shortcomings of chatbots are and how it all incorporates with OSINT.
Dennis Kovtun is a freelance journalist with interest in OSINT and a 2023 Bellingcat Summer Fellow.
Dennis Kovtun
The turnaround time for an article or a radio newspaper in traditional journalism is about a day, sometimes less. But in Bellingcat, it can take many months, many months to do this piece of research.
Shannon Ragan
Welcome to NeedleStack, the podcast for professional online research. I'm your host, Shannon Ragan.
Aubrey Byron
And I'm Aubrey Byron. Today we're discussing some recent tests by Bellingcat on using AI for OSINT.
Shannon Ragan
That's right. And joining us for that discussion is Bellingcat's summer fellow Dennis Kovtun to talk about the research he performed during the program. So, Dennis, could you start us off by telling us a little bit about the research you performed while at Bellingcat that inspired your tests?
Dennis Kovtun
So I was in Bellingcat for four months after I finished trade school. And what inspired me to do these tests, ironically enough, was my inexperience with open source research and open source journalism and all things awesome. Because while I was aware and knew what Bellingcat was doing, what open source research was, it doesn't mean that I had a great degree of experience with it. And so I thought, could some of these tasks in open source research be automated? Things like geolocation, for example. Artificial intelligence do it? Because AI is all the hype now that AI is going to take our jobs, AI here, AI there. AI is doing such cool things. And so I decided to put it to the test and see how well it coped with the tasks that I set it. And I've done two articles on it, two bits of research. The first one being when I tasked AI, chatbots Bing and Bard to geolocate images. And the second one was when I fed a large number of AI generated images and real images into an AI image detector called AI or not and asked it to identify whether they were real or generated by AI.
Dennis Kovtun
And from where I stand now, I was not super impressed with how AI was working on either of those two tests.
Aubrey Byron
And in what ways did the AI you tested excel and what were its biggest shortcomings?
Dennis Kovtun
I'll start with the AI detector. AI image detector performed very well, was given high quality images. So when I downloaded Midjourney images in large file size, about two to three megabytes in PNG format, then it had very high success rate. In fact, pretty much 100% success rate when it comes to AI images. And it also was pretty good at identifying real images with some shortcomings. But these are ideal conditions when you have a large file size and high quality image. In real world conditions, you generally have an image that is not super high quality and that is really compressed. And so when I compressed these images, when I started feeding into AI detector, small files that were something like 300, 500 kilobytes, then it started making mistakes. I think that particular tool, AI note it does have room for development because now it performs well in very ideal conditions, really. But if it does manage to successfully extract data even from compressed images, then I think it will set itself apart from other tools such as other AI detectors because other AI detectors have not excelled in identifying even large file size, very high quality ideal conditions images.
Dennis Kovtun
As for chatbots, I honestly speaking can't truly say that they excelled in identifying locations of the images that I took. And again, the images that I gave chatbots, they can be very easily geolocated manually. I just wanted to test if chatbots could geolocate those images at all. And what I found is that chatbots try to replicate the steps that human researchers do, tries to mimic them, imitate them, but they don't manage to do that particularly successfully. They don't see what's in the image, they don't have this capability to analyze what's in the image and they tend to see things in the photographs that are not there and it tends to imagine things that do not exist in the real world. For example, when I was tasking to geolocate one of the images that I took in Edmonton where I am right now, it stated that it was near a space research center. Well, I looked it up where that space research center was because the name of it looked a bit sketchy. And in fact, that particular facility does exist, but it exists in Vancouver, thousands of kilometers away from here.
Shannon Ragan
Oops, yeah, that was a fascinating article. I think the other one on using AI to recognize AI, you know, AI generated images was also fascinating. And that seems to be such a need right now with the volume of content being put out by AI in terms of AI recognition. How do you see this being incorporated in OSINT due to the shortcomings or reliability issues?
Dennis Kovtun
Well, one thing that for me at least when I finished those tests is my skepticism of AI tools increased significantly. But at the same time I think those tools definitely have their place because you can't manually identify every single AI image. Sometimes it may not be possible to identify AI images unless you are an expert in very particular field. For example, when I was testing AI or not tool, I generated in Midjourney several abstract paintings and to me they look like abstract paintings you see in your local museum. I'm not an abstract art specialist and so I would not be able to distinguish between. So that's one area where I think that goes beyond typical classical open source research or OSINT, the way we think about it now because Bellingcat is known for its Russia-Ukraine investigations, for example, but it goes beyond that. For example, if I'm investigating art fraud or something along these lines, then I want my AI detector to be accurate. Because at this point, and that's an area for future research, we don't even know if experts in those fields will be able to successfully identify AI generated images and that they will be able to distinguish the work of a human from the work of a robot.
Dennis Kovtun
That's one area, but also in more serious things. I read just a few days ago that for example, in Spain there is a pretty significant scandal now going on because schoolgirls, some schoolgirls had their faces superimposed onto AI generated doubles in pornographic videos. That is very concerning, that is very serious and so it's criminal conduct anyway. But I think it will have implications for legal profession. For example, if you do a criminal trial into what is effectively child porn and then identifying if it's a fake in which the person in question has been defamed, but not in fact has been filmed engaging in such activities, or it's it's this. From that perspective, I think AI detectors need to improve their accuracy. Absolutely. Because at this point of time, I do not know if there are other tools that do not use machine learning, that do not use these neural networks that can successfully identify such images. So from this perspective, I'm not thinking about it in terms of well, AI detector is yet another gimmick that generates lots of false positives. It's not a good thing that it generates lots of false positives at this point of time, particularly when we deal with compressed images.
Dennis Kovtun
As for geolocation, I am not sure at this point if AI chatbots are capable of successfully geolocating images, of working with images in their current form. Maybe at the next iteration, maybe when they are redeveloped, then maybe because right now they have learned from the internet which they scrub. That's how they learn, these chat bots. They learn the steps that open source researchers take, they understand how to mimic them, but they can't do it accurately, they hallucinate all the time and they require excessive prompting. When I gave them the images to geolocate, those images were a they were in ideal conditions, b they can be geolocated manually. And third thing, I also gave chatbots the general location of the place already I gave them the name of the city where I took it. I didn't just say, well, that's the image, I don't know anything about it, I want to know where it is. I gave chatbots the name of the city that this image was taken somewhere in Edmonton, and this image was taken somewhere in Orsova. So it knew that and that considerably already narrowed the search area. But when I tested aspects of how it was working with those images, for example, on one photo that I took, there is a building with several very visible logos on it, corporate logos.
Dennis Kovtun
And so I downloaded those logos from the internet and fed them separately, those logos into chat bots and asked them to identify what companies did these logos belong to? And neither Bard nor Bing were able to successfully identify those logos which were very simple. These are big companies that those logos belong to. They're not at all obscure.
Shannon Ragan
Yeah, you think about how even just like a reverse image search of the logo which it's doing, it's telling you that it's doing this type of work. It's capable of doing it, but it doesn't necessarily do it for portions of images, maybe, but it's interesting where it comes up.
Dennis Kovtun
Corporate logos, it didn't identify them. It didn't identify them. It hallucinated all of them when I gave them all of them. 100% failure rate in corporate logo identification.
Aubrey Byron
Well, going back to sort of the AI or not, did you yourself find any tells for differentiating human made images from AI.
Dennis Kovtun
Mid journey is pretty good at generating realistic images, I must say. I don't think we are right now still in the world where AIs generate strange shapes, where people have six fingers or an OD number of teeth or something like that. Honestly, I think AI has improved from that point. It's become better. But it doesn't mean that it's perfect. And it's perfect in its imitation of real photographs, real paintings. I generated 100 images in Midjourney. After generating 100 images and looking at them for so long, honestly, I can kind of instinctually identify if the image has been generated by AI or not. Because mid journey, while it's trying to imitate real things, it's trying to imitate real photographs, real paintings, real drawings. I think it still has a certain style that you become used to when you look at those images. So for anybody who just wants to train themselves kind of by road to identify AI images, that's what I'd suggest they do. Generate 100 images in mid journey. Spend enough time looking at them, and you will be in pretty good shape, I must say. But the tells for me, when I was looking at those images were several things.
Dennis Kovtun
For all the photographs, for imitations of older photographs, it was shiny ice, for example. They're like beady shiny ice. And that was strange, as if light was being reflected as if light was being reflected of them. And of course, that's not what you'd see in a real photograph. And also in general, these imitations of all the photographs of paintings, for example, they look just way too perfect. In the real data set that I used, I used 20 paintings from the renaissance period by great masters, for example, and I asked majority to recreate those paintings. Effectively. I took their descriptions of those paintings from the museums that owned them, and I fed those descriptions into midjourney, so effectively recreated those paintings. And the way I looked at them, the way they looked, those limitations, they looked as if they were painted yesterday. There were no cracks. There was no darkened varnish on those paintings. There were no little imperfections. The way they just looked was very much perfect. And that's not what you'd seen in real world conditions either. Even if the painting from hundreds of years ago is very well restored, it had a lot of work done on it and it received high quality scanning.
Dennis Kovtun
Still, it's not what you're going to see. That was the second thing for me when it came to paintings. For example, as for photographs, there were several tells that we have been aware of for a while, and AI image generators still have not fully resolved this issue. For example, the image of a city just off the street, if you narrow it in and zoom it in and look at, for example, the shop windows, you see that their titles are just written in gibberish. It's just a set of letters that is not found in any language, really. So that's another tell. I think people just need to examine those images pretty closely and think what may be off with those images. And the same comes to AI generated images of people, because those images looked, again, way too perfect. And it's not an insult, not an offense to anybody, but every single one of us, every single person has some imperfection in them. Whereas if you show, as AI does, as Mid Journey does at this point of time, an individual who is extremely beautiful, with no imperfections whatsoever, with nothing about them that may betray the fact that they're real, well, then you may start thinking maybe that is a procedurally generated image based on those thousands, hundreds of thousands, maybe millions photographs that midjourney ingested into it and took out these top traits out of them.
Shannon Ragan
Yeah, it's interesting how much maybe we just don't even realize the context of our human intelligence that we bring to looking at images. There's just this little icky feeling that something doesn't feel right when you're looking at an AI generated image. That little things like you kind of have to put your art dealer hat on. Like, what about this isn't in the real world? What is not worn? Where is the dust and dirt and specs and grime and things like that? Yeah, put your human hat on to be good at.
Dennis Kovtun
Absolutely. For me, looking at those images, particularly those that had people in them, I would really compare them to sort of uncanny valley feeling that you get when you look at not very well done CGI in movies. So anybody wanting to experience that sort of effect and learn how to critically look at such images and have this uncanny valley feeling and instinct, well, the best thing I can compare it to is Cats the Musical.
Shannon Ragan
I was going to say watch Polar Express.
Dennis Kovtun
You know, and so and so I think people would do well if they didn't buy into the hype of AI, because at the end of the day, what we have right now is not real artificial intelligence, it's neural networks, it's machine learning, it's deep learning. But these tools are not sentient. They operate according to procedural software. That they're based on. I think at this point of time, the whole term artificial intelligence is more marketing than what we in fact have.
Aubrey Byron
Yeah, well, and the problem with using it, as we found when we were testing it for writing, is that if you are skilled in what you're asking it to do, then the excessive prompting all of the time it takes is not really worth it. But if you're not skilled, it can be helpful, but you also won't be able to fact check it.
Dennis Kovtun
Yeah, well, when it comes to writing, for example, and that's one of the things that I discovered when I was doing AI or not article. The company that owns Chat GPT shut down their own AI detector because it gave so many false positives. It was so inaccurate. And I think the test that I've done on images and the older controversy over AI detectors that identify AI generated written text, I think it just shows that also from time to time, I think we need to give people more grace and not be too suspicious of people that they used AI to just cheat. Because we have heard that, for example, in academia, AI detectors became really controversial when students got accused of cheating and did not have a lot of recourse and basically were trying to prove that they're not a camel, so to speak.
Shannon Ragan
Yeah, putting faith in yet more AI to weed out cheaters. Nothing's perfect. You just have to use your best judgment.
Dennis Kovtun
Absolutely. It was with written assignments. But for example, if we use the case of academia, academic institutions offer fine arts programs. These controversies when students got accused of cheating on their essays, for example. Well, what prevents right now, a fine arts student from being accused on using mid journey to generate their term assignment? And instead of taking the images themselves or doing the painting themselves, that they use mid journey to do it? What prevents them from being accused that they use mid journey to do that? I don't think anything right now that there is any safeguard right now.
Shannon Ragan
Yeah, well, I wanted to pivot a little bit to talk about Bellingcat, obviously a notorious organization in some rights. We're big fans on the show. We discussed the We Are Bellingcat book at the end of last season. Could you just tell us a little bit of what it was like working for Bellingcat and doing this research with them?
Dennis Kovtun
The position that I got there was pretty unique. So internships and fellowships is something that Bellingcat is only starting to do. Hopefully, they'll do more of, and just off the bat, to anybody who is listening, who is in grad school or upper year of the undergraduate and is thinking of getting into open source research, write to them and offer them what you've got and maybe you'll go from there. Because Bellingcat is in many ways a really unique organization, not just in terms of what they do, but also how they operate. I was working for them remotely for the whole time with the a lot of work in Bellingcat is kind of choosing your own adventure. You do what you think you're good at, and you do what you want to do, really, and you have time to do it in four months. I generated well, I did, I wrote those two articles, but the reason my output seemed not super impressive two articles in four months is because such work is really labor intensive. You spend lots of time doing this research and lots of time putting it together, lots of time thinking about it, and you can do many things there.
Dennis Kovtun
The good thing about open source research is that it's a very unrestrictive term. You can do pretty much anything with open source as long as you already have what you need for it. Somewhere, it's stashed somewhere, somewhere in the corner of the internet and just waiting to be found, to be discovered. So you don't need to do interviews, for example, for it. You don't need to do what people do in more typical journals than what I'm doing now, for example. And I found the environment in Belancat very supportive of creative work. You are encouraged to ask for help, you are encouraged to exchange ideas with people. And I very much appreciated it. In some organizations, some more old fashioned organizations, shall we say, sometimes people, particularly if they're new ones, they're not encouraged to ask for help, and they're expected to know a great deal about this work, at least enough to do most of what they're tasked to do completely by themselves. But open source research doesn't work like this. It's constantly finding new things and it's constantly looking for new ways of doing things, and it's really being creative with what you've got and what you are working on.
Dennis Kovtun
I very much appreciated being able to write a slack message to pretty much anybody in that organization and tell them that's what I think, that's what I'd like to do. Any advice here? Any suggestions here? They very much appreciated that.
Aubrey Byron
Yeah, well, and I think in that industry too, which I also came from originally, not a lot of publications have the time or really just the money to let you spend a couple of months on a really deep dive investigative project. They just can't for that.
Dennis Kovtun
Absolutely. I work in traditional journalism now, and the difference is extraordinary. The turnaround time for an article or a radiant news piece in traditional journalism is about a day, sometimes less. But in Bellingcat, it can take many months, many months to do this piece of research. But for a good reason. Because, as I said, such work is really labor intensive. And you spend time thinking about it. You spend time planning how you want to write it, how you want to communicate your findings. And it does take a lot of work, particularly if you go into a pretend chartered territory with open source research, which you often do. AI, that I've done the work on AI. Not very many people did it before. I'm proud to say that what I've done with Chatbots was something truly original. And so a lot of it is like academic research, particularly in the sciences, I'd say, where you conduct your investigation, where you conduct your research and you become an expert on it, an expert in a very niche thing, in a very niche field. But you become an expert on it because you know it best and because you've done the most work on it and you've done the original work on it.
Dennis Kovtun
That's the important bit.
Aubrey Byron
Yeah. I constantly find the blending of investigative journalism and OSINT really fascinating in this industry.
Shannon Ragan
Absolutely no surprise. We keep coming back to it. We love it. Well, as we wrap up, any parting thoughts for the audience, especially those that may be newer to OSINT or even.
Dennis Kovtun
Self taught, Bellingcat has got a Discord that I strongly suggest that you join if you are interested in this type of thing. It's got a pretty large community of people who are just interested in this and who do awesome as a hobby. I'd say for anybody interested in more professional work in open source research, that Discord should be your first point of call. I also would like to say that what I saw, what I witnessed, is that sometimes open source work can be pretty resource intensive as well. Not just time consuming, but also resource intensive. So Bellingcat has got a Patreon, and if you are interested in this type of type of stuff, then you may help them out a little bit if you want. But I'd like to return to people my age and people in my stage in life, so to speak. I'm 25 years old. I finished grad school in May this year. Thanks. And I'd say sometimes I think people who are interested in OSINT may be a bit intimidated by the complexity of it, by the amount of technical skills that is necessary to do this type of work. And I was fairly intimidated.
Dennis Kovtun
I was overjoyed when I got this research fellowship with Bellingcat, but at the same time I thought, I'll need to do Geolocation, I'll need to do chronolocation, I'll need to do all of these things. I sort of understand how it works, but I have not really done this. So am I going to embarrass myself or how am I going to make the best time of it? I'd say, after I've done a bit of it for four months, that my fears were not exactly warranted, that you can find a niche, you can find an area in open source research that does not require you to have this complex technical skill set. And even with geolocations, if you start doing them, you're going to get better at them. I was pretty hopeless in the beginning, but I got better. My skill level at doing geolocations, for example, at this point of time, I'd estimated it being about intermediate to higher, so I can geolocate most images at this point of time. Some images are more complex, of course, and more difficult to do. They have fewer attributes, but given enough time, I can work with them. I would really encourage people to get into it.
Dennis Kovtun
If you are interested in Open Source work and get on Discord, on Bellingcat's Discord, where people from time to time collaborate even on investigative research, and Bellingcat uses these collaborations on Discord to inform their own works. That is a professional published work. People who are finishing grad school are finishing upper year undergraduate and have something to offer to them. I don't think you can just be gun ho and say, oh, well, I graduated and I'd like to have an internship with you. You need to prepare for that, you need to work towards it. When I got a fellowship with Bellingcat, I was not completely new to open source intelligence. I've taken a training course with them, I've read about them, I knew what they were doing. So be really informed about this field, about this area of research journalism. But if you have something to offer to them, who knows? You might have a good chance of succeeding and working with that organization for a little bit, which I again, strongly suggest because it's very good for your resume, it's good for your portfolio, but it's also being able to say and knowing that you've truly done something really unique.
Dennis Kovtun
And not that many organizations offer that. If you work with your local news, for example, because I'm a journalist, I return to local news. Now, after you've done a certain number of articles, for example, on say, local crime or local traffic or local happenings or whatnot, then it kind of becomes formulaic. Not to say that it's not important, it is, but there is a formula that you follow, and there is not much originality that you can find in those stories with open source research. If you want to be original, if you want to do unique work, there is an avenue for that. There is an opportunity for that.
Shannon Ragan
Yeah.
Aubrey Byron
Well, that is a great note to end on. And thanks again, Dennis, for joining us today.
Dennis Kovtun
Thank you for inviting me very much. Appreciate it.
Aubrey Byron
And if you liked what you heard, you can view transcripts and other episode info on our website, authentic8.com/needlestack. That's Authentic, the number 8, .com slash needlestack. And be sure to let us know your thoughts on X @needlestackpod. We'll see you next time with more in the latest and OSINT. Stay tuned.