Why AI art struggles with hands

You're called to create a post-apocalyptic giraffe astronaut. Generated. Genghis Khan playing a guitar solo, pixel art. Generated. A man holding a delicious apple... What's with his hands? Why can't AI art make hands? It doesn't matter what AI art model you use. If you have a man holding a delicious apple his hands will look weird holding it. Why is this so hard? Seems easy enough, right? We've got this weird situation where AI art instantly make... Abraham Lincoln dressed like glam David

Bowie. But struggles with a woman holding a cell phone. This isn't just a weird glitch. The struggle of AI art with hands can actually teach you something bigger... about how AI art works. I mean, what is so hard about this? I asked an artist who has taught thousands of people... how to draw hands from imagination. Before someone becomes or starts training to be an artist. Like officially training. It's pattern recognition. You just grow up seeing a whole bunch of hands... and you start

knowing what hands look like. You learn how things look by living in the world and recognizing patterns. An AI is similar but has key differences. Imagine an AI is like you... but trapped in a museum from birth. All the machine has to learn from are the pictures... and the little placards on the side. Apple: A red apple on a brown table. That's like the images it sees from the web and the descriptions that go with them. It's similar to how you learn, but locked in that museum. If you w

ant to understand an apple you can rotate it in your hand. You can watch it whenever you want. If AI wants to understand an apple it has to find another picture of an apple in the museum. Pattern recognition has allowed AI and people to draw decent apples... but the processes differ. You start training to become an artist, and now you're like okay, now I have to learn the rules. And that's where it becomes very different from how AI is learning. Artists, in order to draw something complica

ted we tend to simplify things into basic forms. And so when you look at a hand... you pretty much have the big blocky part of the palm, right? You have the front, you have the back and then you have the thickness. So you can pretty much just make that into like a square with some thickness to it. Then an artist can add all the style and texture and detail they want. AI works differently. Look at this hand. The shapes are bizarre, but the AI has done a great job showing the light and text

ure here. Remember, the AI knows how things look but not how they work. So these patterns in pixels are easy to understand. It never learned, however that fingers don't really bend like this. It doesn't simplify the forms. Remember, it's trapped in the museum so it is just trying to guess where hand-like pixels should be. Without knowing how hands work like we do. But listen, I find this kind of dissatisfying. I mean, I'm basically just saying that AI can't draw hands because it's not a p

erson. But AI also doesn't know anything about construction and it can still make a beautiful skyscraper in New York City. So to understand this better I spoke to two people who have worked with generative art models. Yilun Du is a grad student whose heart is in robotics. But, you know, AI art is like a big deal now. So, he got pulled into it. Because of how popular these models have been in generative art.... I've also been working on that. And I talked to Roy Shilkrot who has a super vari

ed resume but has been teaching about generative art since 2018. Good students that come in.... that are trying to break those models take them to the next level. Talking to them helped me figure out three big reasons. Not every reason, but three big reasons that hands are tough for AI art models. The data size and quality the way hands act and the low margin for error. For the data size, let's go back to the museum idea. The museum the robot hangs out in it has a ton of rooms dedicated

to faces... but not so many rooms for hands. That means it has less to learn from. Just as an example, available datasets like Flickr HQ has 70,000 faces. 70,000. And this popular one annotates 200,000 pics of celebrity faces... for lots of details like eyeglasses or pointy noses. There are a ton of great hand datasets that can really understand hands like this one with 11,000 hands. But these may not have been used to train the AI that makes art. That data scarcity combines with the q

uality and complexity of the data. Hands data in the art museum isn't yet annotated to show how they work. Like the celebrities pointy noses. What they say is... there is an image and there is a person in the image and that person is holding an umbrella. You don't give the machine a lot of clues saying this is a person holding t he umbrella. The thumb is going from one side of the handle and the fingers are curled... and then thumb is covering the index finger but not the other one. All th

at is made worse because hands do lots of things compared to, say... faces. So there's a pretty common like portrait photo face. There are a lot of these photos online and the thing is everything is very well centered, right? Like eyes are always around here. Like there's always this order. That's not true of hands which can do this and this and this. I swear I'm sober right now. Stan mentioned this, too. How many fingers do you see right now? Like two or three. Like it doesn't know ther

e's five. Because sometimes there's two sometimes there's three sometimes four, sometimes five. You can see these problems with AI hands but the jankiness is all over AI art. Just look at horses. You can also have like three legs, five legs, six legs. The model does not learn to explain this because there's too much diversity and it doesn't have as much bias as we do. Okay. Did you hear that last part he said? Good, because it's really important. It doesn't have as much bias as we do. We c

are a lot about hands and need them to be perfect. There is a low margin for error. But because the model doesn't understand hands hasn't seen many and because hands act weird... it makes pictures that are like hands it’s seen in the museum but not an exact hand. That's good enough for a ton of stuff, but not hands. Here, let me give you some examples. Come over here. So I typed “make me a person with exactly five freckles”. So this one's from Dall-E 2. This one is from Stable Diffusion an

d this one is from Midjourney. So it's like, you know, great job. You've got a red haired person. They're more likely to have freckles. But there are not exactly five freckles here. Here that doesn't really matter because we see a freckly face. But hands require higher standards. Look at our apple-holding man again. I made 3 other variations. The hands are all weird, but don't look at them right now. It changed the shirt stripes, the buttons, the apple style... None of that matters because i

t's stripe-like button-like and apple-like. But hand-like isn't good enough. I came away from this thinking a couple of things. AI art is basically bad at art. We're just able to see it with hands... and B, it's never going to get any better. But both of those things are a bit wrong. I will say that the newest AI art generator to come out at the time of this video is Midjourney version 5 and they made some progress with hands for sure... but it's not totally fixed yet. Don't tell the AI to

hold an umbrella. I think they're spending lots of time on some things that you appreciate, which is why you like the images and a lot of stuff that you don't actually even notice. I think that for a lot of natural scenery or something like that I feel like model might be better at that than people. And they are working on two things. First, they have the AI look at a ton more pictures which requires more computing power. They're trying to solve that on a big scale because if you want to tra

in on more than a handful of images... if you want to train more than 100 images this would take tremendous resources from you to retrain the model itself. The other solution might be to invite more people... into the museum. There's an interesting analog. So like, have you heard of like ChatGPT? The big difference was that it basically used human feedback. So like they generated many, many sentences and asked people to rate which ones are good and which ones are not good. They basically fi

ne tune the model so that it would generate sentences that are convincing to people. I guess it would require a lot of engineering to get people to label so much data. But I think if we could just get like people to rank... how good the images are generated by these models then like a lot of these issues will go away, actually. Because they're just training the models to do what people like. It's not just the hand... teeth and abs. Anything where there's like a pattern... a large amount of

something. It doesn't know the rule of “there are this many” because it's trained on different amounts.

Comments

@bananewane1402

Considering how much human artists struggle with hands, I’m not surprised the AI can’t do it

@OKaFee

In the lucid dreaming community - one of the most reliable "reality checks" is inspecting your hand and confirming if you have 5 fingers. For whatever reason, the brain has a difficult time generating a five fingered hand while dreaming. It's kind of a creepy coincidence that AI has the same issue.

@instagramsnapchat

The worst part for me personally is these models have gotten so incredibly good at lighting and realism that seeing these weird messed up hands in completely photorealistic lighting makes them so much more uncanny than in like a painting or drawing.

@gabriel1812

as someone who went to art school, and was required to take a course on drawing hands, I can confirm: drawing hands is hard.

@logank444

My grandfather is a semi famous artist and he gives the family art that he messed up. It's usually the hands that he messed up

@MinisDunyasi5

You know it’s hard to draw hands, when even AI struggles with it.

@MrWeebable

It's weird how humans can instantly determine when something looks wrong, but the same humans cannot necessarily correct it or make it right from scratch. As a beginning artists there's a weird rift between your mind's eye and your skill.

@noahdoss1967

“The AI knows how things look but not how they work” I’ve gotten into so many frustrating conversations trying to correct friends and colleagues talking about chat gpt as if it had some internal logic and self-referencing reflective capabilities

@Mixajlo93

It's interesting that in dreams, we also struggle to see a hand as it is. For lucid dreamers, it's kind of a test to see if they are dreaming or not. In dreams, hands are usually distorted in a similar manner

@floopyboo

Hands are tough for humans too. Ask any artist what they have struggled with the most, and the answer will be hands, followed closely by feet.

@peteskyrunner4845

Today, 9 months later, AI has gotten so much better at hands.

@CedarBronze

I think that in order to solve the "AI knows how things look, but not how they work" problem is to train the AI not only on images, but also on rigged models, like Blender models before you hit "Render." I personally find out how things work and what proportions they generally have by spending a few minutes fiddling with the object and studying it from different angles before trying to draw. Edit: Sorry I'm late.

@Selestrielle

If you know a thing or two about sewing, you notice pretty fast that AI is also terrible about clothing. Buttons merging into zippers, fabrics changing textures and weights, folds appearing and disappearing without seams, those are all things you see commonly in AI art but people don't notice as much because your average AI artist isn't a seamstress.

@ProkoTV

Thanks for the talk, Phil! We live in some interesting times for art. Now, back to practicing drawing hands! 😅

@whatfurqanknows

i love how not so fast at explaining this video is and really having a calm music. we need these types of videos more. thanks Vox!

@zircon256ua

You accidentally made the like button highlight at 7:50 when saying "button-like".

@margaretthemagnificent

As an artist, I will confirm, hands have an EXTREMELY low margin for error. There are many different body types, face shapes, limb proportions. Consequently, there's wiggle room. Not so with hands. People will still compliment most artwork that slightly misses the mark, but they will go silent if you mess up hands.

@hulqen

At 7:57 he said something that resonated with me: "AI art is basically bad at art, we're just able to see it with hands". A lot of times, when you look closely at an AI generated image, you start to notice all kinds of strange things, like shapes that doesn't make sense, roads leading nowhere, details that are simply wrong. Will this change, and what will it take? Right now it seems that you either have to accept a lot of errors or "peculiarities" with AI generated images, or you have to do a lot of manual work to get it right.

@riccardoleone4265

This is a reminder for artists: draw from real life as much as you can, not just photos. Our understanding of volumes and structure over simple outlines and textures is what will set us apart from AIs

@Antares_Aurelis

At the moment, I noticed that at least some neural networks draw faces as a separate module, on top of the rest of the picture. The same should be done with your hands. There should also be a setting to “hide your hands” so that they simply end up behind your back, in your pockets, etc.

Why AI art struggles with hands

Related articles

Comments