Discussing the state of AI technology | Expert Interview with professor Siwei Lyu, Ph.D

[Music] foreign [Music] verify today um I if you could please introduce yourself provide a little bit of your background and also what you'd like to be called okay thanks for having me um my name is and uh I'm a professor here at the University at the Buffalo State University of New York I'm with the Department of computer science and engineering um I'm uh my research is in artificial intelligence machine learning and computer vision but with a special focus on media forensics which is the resea

rch field of exposing manipulated or synthesized uh or in whatsoever way falsified media including images audios and videos thank you great thank you so much um can you so what we're talking about today today artificial intelligence can you give us the definition of what generative AI okay so generally AI uh is used to refer to uh artificial intelligence technologies that are that that are able to recreate produce realistic looking or uh you know uh realistic type of Medias this include images a

udios voices um videos and also in this case environment texts so these algorithms are based on machine learning so they are not specifically manually designed they were actually uh mathematical computational mathematical models that are trained on a large amount of media files um as as I mentioned the pre uh previous types of different media and then once they were trained they can be used to generate this different uh media and what's the most typical way people are you know I'm familiar with

AI and and programs like mid-journey or chat gbt this is not like this is something that somebody actually has to give prompts or pass to tell the program what to do yes yes these are the most um I'll say right now this is these are the Chacha Beauty and me Journey are like two of the most popular and widely used the tools uh so I'll uh showcase generative AI but there are many other forms for instance uh you can use AI more generative AI models to recreate human faces look like real people but

belongs to nobody uh nobody alive um you can also use a algorithm to transfer uh or convert voices from taxes to voices you can also uh convert voice styles or to a different identity like somebody person a speaking of certain things you can use algorithm to recreate voice that the same sentences are spoken uh in prison beats voices we can also have videos where um you know the subject is generated by again generating AI models impersonate a particular person so there are many many different for

ms of generator react that's yeah there's so many ways in so many ways to create these images as you said like especially to impersonate something um are you concerned or should a regular person be concerned about the capabilities of AI to be used that are manipulated in those fashions yes absolutely there are several levels concerns we should have I think at the very personal level uh the availability of this kind of Technologies just make it easier for somebody to defraud us um just think abou

t you know you pick up a phone uh well we have a lot of scam calls these days on our phone you know um and spam emails but just think about adding a layer of impersonation and especially this is impersonate this is to impersonate someone you are familiar with um how dangerous that could be you know you think about teenagers got a phone call from their parents a zoom calls like like FaceTime quote FaceTime call um and and they hear the parents voice they see the parents image and uh turns out tha

t that was not really the parents it was a predator um and tried to alert them to certain places or doing certain things that harm them uh or think about a senior season and you know pick up a phone call and then again this time there are grandchildren called and you know asking for money how easy you know how people will fall for this kind of scams so that's at one level I think the second level is this also pulled our information system in particular on social media um we read about you know u

h falsified news stories um and you know um adding again a layer of somebody's face behind that um you know instead of you here somebody saying something off regions um uh read about this compare that to you you actually watch video and that exactly that person saying exactly what you know uh was reported to say how much you know credible dark the level of credibility degree credibility that could add to the falsified story and I think the last layer is at the public very public level when this

technology becomes more economic when all this falsified media could be made at a massive level then we will be flooded with um with this kind of a low policy or you know falsified media already see the examples of fake news photographs generated by this me Journey fixture is created with large language models those could adding a lot of again trouble to are already over overburdening information system so I think I think that's you know several different levels this should be a critical measure

critical matter that we we need to pay a lot of attention to right now they're I'm thinking of like a deep uh which what we've talked about before um there are defects that are created all over the all the time is this pin generative AI be cons like is that a layer above a deep fake like what you know because it's created it's completely original it's not created using somebody's existing likeness so what we see a lot with deep fakes now are you know the the face superimposed on a real image or

on a real video um is this a layer above that can is this Beyond deep fakes essentially um I you know depending on how we Define defects I will say this is a form if we use deep fake uh in a broader sense to mean any media created by AI algorithms I think this is a form of defects but it's um uh but but the new twist especially you know starting from the end of last year to this today um the new twist is making this model with this uh generate generative AI choose even more democratized even ea

sier to use um like if I if we have this discussion last year this time I would say you know defects is um a threat on Horizon but you know there's still a barrier you know to so somebody to actually use the to tune an AI model and to use that model to create all this um uh falsified media that person needs to have lists at the at least to have a powerful enough computer with competition power to support it and and also need to know a little bit of programming because he or she needs to know whe

re to get the code you know how together called Running on that computer and then and then they will need to get data for you know what they whatever they need to do to train that right so computation data and uh and and model and code um but I think the recent development um eliminates the or re significantly reduce the requirement of that I mean just think about how do we create images in Journey or stable diffusion all we need is basically a sentence as a prompt and then we can have a lot of

images and based on our requirement we can refine the problems so people with no programming in background no understanding of machine learning AI can now start making um making defects um and I think that's what what what's changing the way this is the game is played so um uh so that that makes it even more uh critical to look at this problem seriously and come up with some solutions what do you think are possible solutions um as you mentioned I I'm actually really glad that you brought up the

topic of where we were because that was going to be my next question of where were we you know where is technology five one year ago versus where we are today and as you mentioned uh anybody can create this content and as you know and I've been exploring the journey and using it uh for my own just to you know better myself and I don't have any hard you know hardly any programming skills so I think um what do you think could be done should be done should big Tech play a larger role in limiting ho

w this how widely available it is uh what do you think are present possible solutions well there are a couple different there are several levels of uh uh obviously several approaches uh that will serve as the overall comprehensive content manager solution to the problem but let me start with something that I don't think will work well um I I think you know the first reaction would see something bad I mean it's human nature to say let's stop this let's not just not doing it right um um and and I

think that's that actually is not a good solution to this problem maybe not even a solution to the problem the reason is uh right now I think as um it's like a genius out of the box I mean it's very very hard technically speaking very difficult to um complete ban um the availability or even like limiting the availability of the technology to users that is that would be very hard to do um the code is there you know the capacity has been proved somebody with sufficient resources and time could rep

roduce What open the eye and you know meet Journey or state of the diffusion have done um and and also putting a ban uh only limiting our uh the ability of people who have the intention of using the technology in a good way but whoever already have the malicious intent you know that probably will not have any effects on them the other problem is all this gender of your AI technology even though you know we are mostly focusing on their negative impacts Banning them is like throwing the baby with

the bath water uh we if we just kind of um stop the development of the technology across the board we also kill many of the good good applications of this technology right so just give you one example um uh there's a Canadian company using the same uh very similar technology like what is conversion you know make somebody sound like somebody else but they apply this to uh strong patients whose language video has been affected by The Strokes and they they use the algorithm to normalize their voice

s so that even though they sound not very they didn't they couldn't articulate themselves very well um using the algorithm there are caregivers their children their family could hear much better as if they were uh you know normal before so I think that's a good use of this technology or there must be there are plenty of such examples so planning them is not a great um solution um the other thing is I um the other solution that I don't think will work is like you know um do like a detailed conten

t content moderation um that's going to run into a lot of issues it's actually in in this country with the First Amendment um so here is what I think could be a combined solution I think at the technical level we should have better ability first of all to expose those synthetic media uh or manipulated one so we need to develop more you know stronger algorithms to do that stronger systems that can be able to will be able to give us some analysis and and and results about you know whether what we

are seeing or hearing is real um and and to do that you know one thing is the researchers need to work very hard on that keep keep up places with the generative AI uh ai's development and for people working on generating uh developing generative AI models is hugely imbalanced and and if we'll keep doing this we definitely are on the losing end of this battle um the second part so other than detection which I call the passive in a sense that you know we can only apply this when something showed u

p we cannot we should also invest in active measures the active measures are on two sides one is on the individual customer an individual users so we should protect our data if we have measures to protect our data for instance we could um put when we share our data there should be what we could develop algorithm to um adding some fingerprint signatures into the data we uploaded so later on people use the data to train the model of me I will have a way to say this is coming from my data and this

is a violation because I didn't sign off the rate the rights to for whoever using the data to create a generative AI of mean a lot of me the other is to also treat the tools I think this particularly relevant to things like mid-journey chbt stable diffusion uh we should the tools should whatever come out of the tools generated for whatever intention or um uses the tools could put in some um some some signature some fingerprints to say this image come from this tool right sort of I make analogy s

ort of like you we watch a movie you know at the beginning after the title page it will say this is based on a true story to tell people that this is not true this is based on something real um it's is it is a good enough you know uh reminder that you know people will not take this as the same value as something real um and the same thing could be done there and there are Technologies for doing that I think for the companies on the other hand for especially platform companies they should activel

y make use of this Technologies to um help the users to understand what they the the media they are being exposed to um so if you know if we there'll be no use that you know if we uh Trace our data Trace our two but nobody could see that result so when people share this data on the social media Tech the the social platforms should have choose embedded in their platform to actually read out read read out all this almost like you know somebody have uh uh have a ultraviolet pen to scan the dollar b

ills so this is a counterfeits this is a real one right so they they are they should play that role instead I mean as I said I don't think they they should do a very aggressive content moderation because you will run into a lot of legal uh legal problems but problem is showing where things are from that is totally legitimate and uh and and should be done thirdly I think the the government should absolutely put more investment as I said in the technical development all of all these Quantum measur

es but also should be um should should you know uh invest in educating the users um we a lot of times um uh the the fake the the defects or synthetic media affects us because we are not aware I'm not aware not meaning that we are not skillful or you know not good at telling them apart I actually studies so that if we spend enough time um we could humans actually have pretty good ability to tell apart what is real and it was fake the problem is our attention span is so um uh so narrow these days

by the by the use of social media uh we don't we simply do not have time to think about that right so so you know our first reaction is just read about the contents you know instead of asking the credible question is this real or not and I think that the the the government should put in this um um help to improve the awareness of the users about this this uh this problem and and more importantly the government should come up with regulations and um I mean come up with a something like a liabilit

y or accountability um regulations for um for promoters providers makers of synthetic media um so I think these are a few of the measures in my mind that could be implemented and uh could form as part of a effective and comprehensive solution to the problem do you think the tech I'm I'm kind of Playing devil's advocate here um every morning I wake up and I look on Twitter and I follow like a lot of AI streams and a lot of feeds and it's like this new thing this new thing like this new thing like

oh you think last week was crazy how about check out this new tool do you think that it's evolving so fast that it would be very difficult to regulate it seems like there would have to be new rules in place every day uh for a new tool that's a bit that's capable of doing something that mid-journey or stable diffusion can't yes I mean that's it that's one of the major challenges these days um the technology is growing very fast um so keeping up keeping track of every single threat will be very d

ifficult but I think what I laid out is are some general principles um in particular you know this um um uh the the the the spreading of this information this uh synthesized synthetic media uh could be hindered by the fact that you know they are somehow treatable we're not treating individual person who is making that that probably well is you know some legal use legitimate use of the media but tracing back what tools are using I think that's that that information is enough for the users to rais

e their awareness this is not real this comes from a tool um if we have somebody to say what you are seeing come from this particular algorithm then you know that will help the user to make a better decision and at the same time I think we're also keep some distance away from the the potential legal issues and then no matter what new tools coming out of this if they can be treated under the same category um if only if they can help to trace back the the tools making them I think you know we can'

t well I'm very honest this cannot eliminate the problem but it will slow down the the in the uh the the world say Wildfire Sprite of synthetic media yeah now I want to I want to talk a little bit more about uh what you were saying as far as detection um and as you mentioned you know something as simple as like even including like a watermark before you download like you know like the you know an icon in the corner um and you know the the RNC just released the first AI generated ad uh ahead of t

he 2024 elections and they had a disclaimer on there that said made with AI right now right um had you know those that information you know it you know somebody would have just thought it was a dystopian ad created they would have never known it was created with with AI um what we do as far as fact checking images and video it's we have like a couple of different processes where we can do a reverse image search we can do a reverse video search we can look at the videos metadata um but that doesn

't exist with AI because it's can you know we can't do a reverse image search because it just was created um what are some helpful tips for people to be aware of or to be on the lookout for if they see something like what's what do you think is the best way for the regular everyday social media user to detect AI okay um yeah here um I think there are again several different strategies um but I think the most reliable one and probably the easiest tool for us to use is our common sense um I think

you know Sunshine is the best disinfection diseases faction um agent um in the same way I the easiest way is be well you know every user be critical have critical thinking everything we see interesting I think first of all have that light that level of awareness secondly um I mentioned Common Sense this is is like you know where we see one thing that one video that's very interesting catching rois we need to check alternative sources to you know uh Frost validate if this is really what happened

and and if no other sources mentioned the single Source um uh media then we know that you know this may not be as reliable um as other things right um and and I think you know that's that's with the contacts um and without contacts if we do nothing about it very quickly um just only focusing on the media itself then there are also ways to uh pick up uh some artifacts I will say the generative models generator AI models even though they're very powerful for instance if I show some of the human fa

ces created from the air models they look very realistic like their skin tone their skin the facial hair down to the very mini age mini scale details but they they miss on something big and obvious for instance one uh one thing is when we look at we have you know a real for real person like we're talking in this current sighting uh I'm looking through the the camera and for a real person uh the two eyes are almost looking at the same location some Same Direction so my eyes are roughly seeing the

same scenes and if we look at the what is reflecting from our eyes they should be quite similar from the from two from the two eyes but some of the AI generated images you zoom in on their Iris region you find that you know one eye simply looking in one scene and the other eye look into the other so that is um that's the kind of artifacts that the generative AI models may have um and I think part of the reason is um not as humans where we get information about the world both from data which mea

ns our experience as well as knowledge you know we read about things and we understand the basic physical uh basic principles of the physical how the physical physical world runs the art the AI models only get information from parts of House of data um and and sometimes so the simple uh physical constraints in the physical world cannot be effectively captured by larger model data um and and the example I just gave about this inconsistency between the reflections of the two eyes is one of such ex

amples I call them the architis heels of the genitive AI models because they're powerful but they're not omnipotent they have they do have limitations they do have shortcomings although I would say you know as time goes by they will fix you know this artifacts one by one but there will always be one because they will not understand the world as we do so looking for those um artifacts and and as a researcher I'm also developing algorithms to help humans to spot and identify those those artifacts

the other way uh to do is using some of the detection tools um there will be detection tools available I think in in the very near future um some of uh you know my group developed those kind of tools there we instead of taking you know as the previous example so we take some intuitive uh artifacts that are intuitive humans we look at the signal from a different point different point of view using a different uh perspective the same way again making an analogy is like x-rays you know we want to s

ee through human body what happened inside the human body just using visible light is not enough because light cannot penetrate human body then we use x-rays that give us we're basically looking at the same thing but but from a different perspective same thing happened here um some of the media may look very real or hear very well to to to the human viewer um but if we look at them from a different angle under the algorithm algorithmic lenses will see some of the signal abnormalities uh statisti

cal abnormalities of those synthetic media um so so those are the the tools we are developing and I think it will be in the in Hand of users in the short uh um in the near future so I think you know use common sense crowdsourcing uh looking for artifacts and then rely on Signal detection tools are the ways that we can you know uh we can combat and we can help us you know better see uh see through those syntax media are there any tools that exist now that are that are pretty close to like are the

re like or what theater where what are the best detection tools that are available now uh I will say we used to have one we call it difficult meter uh we put it online um and uh we have to put it offline because uh um for a couple reasons uh one reason is we got hacked so many times you know once we put it out for free for user to you know is is our effort to bundle up the state of our third party uh open source defect detection tools together to provide a easy to use web-based user interface fo

r all different users sort of like you know chat GPT and uh me the journey did for the generative AI we want to do that for uh media forensics but unfortunately you know we see a lot of misuse abuse you know people trying to figure out you know what keywords are used uh to detect their um synthetic media and maybe even you know based on that understanding developing counter matters the other problems we just got you know um we we just got hacked and you know people doing denial service attacked

to our servers you know crash it down uh things like that um and and the last reason is lack of resource as I mentioned this is we feel the very pain of not having enough resources supporting this kind of work um you know it's only me as a research lab you know my my students my postdoc and myself working on this maintaining the system to the moment to the to the to the point that we we can no longer be productive doing any other things so um so so I think that's that's an unfortunate situation

but it also shows that you know uh how difficult it is I mean there are a few tools available on um on the market for doing this one is called the reality Defender um it's a commercial company doing um uh the fake detection um among other things um I'm a technical advisor to them so I know their tools are their tools are well designed but the the downside is uh you know for serious use you need to pay um and I think you know one of the problem is the the detecting of synthetic media unlike makin

g synthetic media there is even lies clear business model you know how do you make money how do you sustain this because we're working almost like a a little bit like you know preventive we say you know we we tell you this is fake but we it's if something is fake doesn't necessarily mean you are going to lose money to it so there's no clear profit model for running business supporting uh this kind of services and this is again reinforced my belief that this is this is exactly where the governmen

t can help the funding agencies um and The Government Can you know um or put put more resources and investments into into into into something along these directions uh to help us to you know keep the balance with the generative AI side uh you know clues or like you said artifacts for audio that would help somebody if they received a phone call yeah yeah um absolutely um I think the audios well over the phone call could be a little bit difficult and over the real phone um because I mean the hiker

s whoever the the frosters um could also be very smart they they were mixing you know one one um trick they could use is downgrade the quality of the audio so they could add a lot of noises you know they could um put in in into some even including some of the the noises introduced by unstable um uh communication channels so it'll be really hard to tell something is fake or real over the phone I would say you know you can but the most reliable way to do that is you hear something you call that pe

rson directly to double check this is like crowdsourcing uh the cross validation but the the what the audio quality does can uh can also about the the defect audience do carry some artifacts for instance they're usually very quiet you know uh sometimes you don't if if you play the high quality version the background is very clean you don't hear anything that's one problem the other thing is uh like similar to the artifacts we talk about for images there are artifacts in sounds too um very simple

uh something very simple to understand are breathing so like we make sound by taking in air first and breathing it out while we're breathing all the air we can make sound I can we can speak some of the AI generated voices when you hear them there's no sound of taking in air so the quality is the the sound itself has good quality but you you may get this subconscious psychological effect of feeling pressed feeling out of breath that is because there's no no sound of breathing in um in those audi

o samples um and again um The Last Resort is you know using signal-based attractions and we develop algorithm for doing that and I believe also some of the commercial tools are available for doing that too yeah so get rid of those those are really really helpful tips um we talked about where technology was a year ago um where do you think it's going to be a year from now I think I think for sure we're going to see more development of generative AI um as you're you know putting up more data um I

think the model will get better and better um but there are some fundamental limits of of these models I will say um you know people used to call an uncanny valley for generative AI models where the uncanny valley meaning something that is close to perfect but not really perfect will actually improve your weird you know the viewers will look at those images feel weird and um you know turn now to believe them to be something real um I would say for the generative AI is The Uncanny value happened

in a different way in the sense that you know when they are approaching now they're looking very very realistic but just fighting up data even you you get all the data putting together um there are certain there are certain um uh I'll say um characteristics of the physical world cannot be effectively captured by the model again limited by the model's own capacity limited by the computational power we have um so I think the the model itself you look at the individual part of the the generative qu

ality they're getting better and better high quality images videos and audios will come out of these models but we can still find those little places where it can now do right for instance for images created by mid-journey I um more stable diffusion I follow that technology for a while ever since it started you know as a result as a research ideas you know demonstrations in technical papers um I literally see an astronomical development Improvement of the quality of the synthesis but starting fr

om the first day all the way to today I still see artifacts existing there for instance hand hand is very hard for them to synthesize you have you have hands you know even though the faces are so good um this generally if it's so perfectly you look at the hand they are quite I would say um uh um the quality is not there yet you know you have hands with small you know more fingers light fingers weird configurations again that is because hand is notoriously difficult to render exactly and you and

I think it's it's going to be very difficult to do that just by using data you have to incorporate some understanding of the atom on economical structure of the of the hand and understand the degree of freedom to create all the different um uh different shapes of the hand um and so I think you know just piling up data will just will take us we'll still have spaces there but the space is limited we're going to reach that saturation level um and and for the technology to break that level of barrie

r in in the rendering in their synthesis I think will take some more years so you know we're just in the middle of this fast accelerating phase um but you know I my prediction is um in a few years maybe two three years this trend will slow down somewhat uh because we're gonna run into a barrier where just using data is not enough and and but I cannot say you know for longer terms maybe there'll be new models that can take all this uh sorts of information to further improve the quality of the gen

eration then we have another set of problems yeah so we're still a few so you're saying we're still a few years away from Skynet uh yes I think I think that's I'm I'm kind of um again personally I'm I'm less pessimistic about that future um and um and the other thing we haven't thought about is uh we haven't taken into consideration is human nature you know humans are amazingly flexible in The Versatile in the human brain um so you know we're we're thinking about we think about more of this uh d

ystopia because we kind of learn the underlying implicit hypothesis is humans do not evolve you know our understanding of synthetic media do not grow with time if we stay with a current level understanding and awareness yes for sure you know if they increase further improve further a lot of people will fall for it but I think with this kind of um uh you know public awareness campaign on public awareness including what we are doing at this moment we want more people to know about this um uh the D

eep fakes and then they when they start to build in in their mind when they're seeing something interesting that will actually effectively reduce flatten the curve to borrow the term from uh from covet virus situation so I think you know that that's the the situation is very Dynamic it is like you know it's one side is growing the other those two um so we we kind of seeing this tag of War eventually reached certainly equilibrium in the next few years um so I think I think that's that's why I'm n

ot that pessimistic and about the future of technology foreign thank you

Comments

@MuhammadMuazzam5300

Wao that is a good step to introduced AI tem in different particular questions which are arose in the ming of begginer as well as learner for teaching touch to future It's helpful which make me understand to easier steps... ❤❤❤ Love you both respected honourable Ma'am and Sir Thanks from Pakistan 💙💙💙

@vickiparkway6402

We are created by God and in God's image and it cannot be changed

That's evil man trying to recreate himself

Discussing the state of AI technology | Expert Interview with professor Siwei Lyu, Ph.D

Related articles

Comments