And how can it get better?
Subscribe and turn on notifications 🔔 so you don't miss any videos: http://goo.gl/0bsAjO
Make sure you never miss behind-the-scenes content in the Vox Video newsletter, sign up here: http://vox.com/video-newsletter
Hands drawn by robots … often just don’t look right. Why is that, and what will it take to get better?
Producer Phil Edwards is exploring five different aspects of AI that help explain everything from large language models to where unusual training data comes from. In this first video, he digs into why AI art struggles with hands. The challenges range from the same ones that human artists face to those that are a unique result of how AI generative art is created. The road to improving these hands may not be as obvious as you’d think.
Vox is an explanatory newsroom on a mission to help everyone understand our weird, wonderful, complicated world, so that we can all help shape it. Part of that mission is keeping our work free.
You can help us do that by making a gift: http://www.vox.com/contribute-now
Watch our full video catalog: http://goo.gl/IZONyE
Follow Vox on TikTok: http://tiktok.com/@voxdotcom
Check out our articles: https://www.vox.com/
Listen to our podcasts: https://www.vox.com/podcasts
You're called to create a post-apocalyptic
giraffe astronaut. Generated. Genghis Khan playing a guitar solo,
pixel art. Generated. A man holding a
delicious apple... What's with his hands? Why can't AI art
make hands? It doesn't matter what
AI art model you use. If you have a man holding
a delicious apple his hands will look weird
holding it. Why is this so hard? Seems easy enough, right? We've got this weird situation
where AI art instantly make... Abraham Lincoln dressed like
glam David
Bowie. But struggles with a woman
holding a cell phone. This isn't just a weird glitch. The struggle of AI art
with hands can actually teach you
something bigger... about how AI art works. I mean, what is so hard
about this? I asked an artist who has taught
thousands of people... how to draw hands
from imagination. Before someone becomes or starts training
to be an artist. Like officially training. It's pattern recognition. You just grow up seeing
a whole bunch of hands... and you start
knowing
what hands look like. You learn how things look
by living in the world and recognizing patterns. An AI is similar but has
key differences. Imagine an AI is like you... but trapped in a museum
from birth. All the machine has to learn from
are the pictures... and the little placards
on the side. Apple: A red apple on a
brown table. That's like the images
it sees from the web and the descriptions
that go with them. It's similar to how you learn,
but locked in that museum. If you w
ant to understand
an apple you can rotate it
in your hand. You can watch it
whenever you want. If AI wants to understand
an apple it has to find another picture of
an apple in the museum. Pattern recognition has allowed
AI and people to draw decent apples... but the processes differ. You start training to become an artist,
and now you're like okay, now I have to learn
the rules. And that's where it becomes very different
from how AI is learning. Artists, in order to draw
something complica
ted we tend to simplify things
into basic forms. And so when you
look at a hand... you pretty much have
the big blocky part of the palm, right? You have the front,
you have the back and then you have
the thickness. So you can pretty much
just make that into like a square with some
thickness to it. Then an artist can add
all the style and texture and detail
they want. AI works differently. Look at this hand. The shapes are bizarre,
but the AI has done a great job showing the light
and text
ure here. Remember, the AI knows
how things look but not how they work. So these patterns in pixels
are easy to understand. It never learned,
however that fingers don't really
bend like this. It doesn't simplify the forms. Remember, it's trapped
in the museum so it is just trying to guess
where hand-like pixels should be. Without knowing how hands
work like we do. But listen, I find this
kind of dissatisfying. I mean, I'm basically just saying
that AI can't draw hands because it's not a p
erson. But AI also doesn't know
anything about construction and it can still make
a beautiful skyscraper in New York City. So to understand this better I spoke to two people
who have worked with generative art models. Yilun Du is a grad student whose
heart is in robotics. But, you know, AI art is
like a big deal now. So, he got pulled into it. Because of how popular
these models have been in generative art.... I've also been working
on that. And I talked to Roy Shilkrot who has a super vari
ed resume but has been teaching about
generative art since 2018. Good students
that come in.... that are trying to break
those models take them to the next level. Talking to them helped
me figure out three big reasons. Not every reason, but
three big reasons that hands are tough
for AI art models. The data size and quality the way hands act
and the low margin for error. For the data size, let's go back
to the museum idea. The museum the robot
hangs out in it has a ton of rooms
dedicated
to faces... but not so many rooms
for hands. That means it has less
to learn from. Just as an example,
available datasets like Flickr HQ has
70,000 faces. 70,000. And this popular one annotates
200,000 pics of celebrity faces... for lots of details like eyeglasses
or pointy noses. There are a ton of great
hand datasets that can really understand hands like this one with 11,000 hands. But these may not have been used
to train the AI that makes art. That data scarcity combines
with the q
uality and complexity of the data. Hands data in the art museum isn't yet annotated to show
how they work. Like the celebrities pointy noses. What they say is... there is an image and there is
a person in the image and that person is holding
an umbrella. You don't give the machine
a lot of clues saying this is a person holding t
he umbrella. The thumb is going
from one side of the handle and the fingers
are curled... and then thumb is covering
the index finger but not the other one. All th
at is made worse because
hands do lots of things compared to,
say... faces. So there's a pretty common like
portrait photo face. There are a lot of these
photos online and the thing is everything is
very well centered, right? Like eyes are always
around here. Like there's always this order. That's not true of hands
which can do this and this and this. I swear I'm sober right now. Stan mentioned this, too. How many fingers do you see right now? Like two or three. Like it doesn't know
ther
e's five. Because sometimes there's two sometimes there's three sometimes four, sometimes five. You can see these problems
with AI hands but the jankiness
is all over AI art. Just look at horses. You can also have like three legs,
five legs, six legs. The model does not learn to explain this
because there's too much diversity and it doesn't have as much bias as we do. Okay. Did you hear that
last part he said? Good, because it's
really important. It doesn't have as much bias
as we do. We c
are a lot about hands and
need them to be perfect. There is a low margin for error. But because the model doesn't
understand hands hasn't seen many and
because hands act weird... it makes pictures that are like
hands it’s seen in the museum but not an exact hand. That's good enough for a ton of stuff,
but not hands. Here, let me give you
some examples. Come over here. So I typed “make me a person
with exactly five freckles”. So this one's from Dall-E 2. This one is from Stable Diffusion an
d this one is from Midjourney. So it's like, you know,
great job. You've got a red haired person. They're more likely to have freckles. But there are not exactly
five freckles here. Here that doesn't really matter
because we see a freckly face. But hands require higher standards. Look at our apple-holding man again. I made 3 other variations. The hands are all weird, but
don't look at them right now. It changed the shirt stripes,
the buttons, the apple style... None of that matters because
i
t's stripe-like button-like and apple-like. But hand-like isn't good enough. I came away from this thinking
a couple of things. AI art is basically
bad at art. We're just able to see it with hands... and B, it's never going
to get any better. But both of those things
are a bit wrong. I will say that the newest
AI art generator to come out at the time of this video is
Midjourney version 5 and they made some progress
with hands for sure... but it's not totally fixed yet. Don't tell the AI to
hold an umbrella. I think they're spending lots of time on some things that you appreciate,
which is why you like the images and a lot of stuff that you
don't actually even notice. I think that for a lot of natural scenery
or something like that I feel like model might be better
at that than people. And they are working on two things. First, they have the AI look at
a ton more pictures which requires more computing power. They're trying to solve that
on a big scale because if you want to tra
in on more than a handful of images... if you want to train
more than 100 images this would take tremendous resources
from you to retrain the model itself. The other solution might be
to invite more people... into the museum. There's an interesting analog. So like, have you heard of
like ChatGPT? The big difference was that it
basically used human feedback. So like they generated
many, many sentences and asked people to rate which ones are good
and which ones are not good. They basically fi
ne tune the model so that it would generate sentences that are convincing to people. I guess it would require
a lot of engineering to get people to label so much data. But I think if we could just
get like people to rank... how good the images are
generated by these models then like a lot of these issues
will go away, actually. Because they're just training the models
to do what people like. It's not just the hand...
teeth and abs. Anything where there's like a pattern... a large amount of
something. It doesn't know the rule of
“there are this many” because it's trained on different amounts.
Comments
Considering how much human artists struggle with hands, I’m not surprised the AI can’t do it
In the lucid dreaming community - one of the most reliable "reality checks" is inspecting your hand and confirming if you have 5 fingers. For whatever reason, the brain has a difficult time generating a five fingered hand while dreaming. It's kind of a creepy coincidence that AI has the same issue.
The worst part for me personally is these models have gotten so incredibly good at lighting and realism that seeing these weird messed up hands in completely photorealistic lighting makes them so much more uncanny than in like a painting or drawing.
as someone who went to art school, and was required to take a course on drawing hands, I can confirm: drawing hands is hard.
My grandfather is a semi famous artist and he gives the family art that he messed up. It's usually the hands that he messed up
You know it’s hard to draw hands, when even AI struggles with it.
It's weird how humans can instantly determine when something looks wrong, but the same humans cannot necessarily correct it or make it right from scratch. As a beginning artists there's a weird rift between your mind's eye and your skill.
“The AI knows how things look but not how they work” I’ve gotten into so many frustrating conversations trying to correct friends and colleagues talking about chat gpt as if it had some internal logic and self-referencing reflective capabilities
It's interesting that in dreams, we also struggle to see a hand as it is. For lucid dreamers, it's kind of a test to see if they are dreaming or not. In dreams, hands are usually distorted in a similar manner
Hands are tough for humans too. Ask any artist what they have struggled with the most, and the answer will be hands, followed closely by feet.
Today, 9 months later, AI has gotten so much better at hands.
I think that in order to solve the "AI knows how things look, but not how they work" problem is to train the AI not only on images, but also on rigged models, like Blender models before you hit "Render." I personally find out how things work and what proportions they generally have by spending a few minutes fiddling with the object and studying it from different angles before trying to draw. Edit: Sorry I'm late.
If you know a thing or two about sewing, you notice pretty fast that AI is also terrible about clothing. Buttons merging into zippers, fabrics changing textures and weights, folds appearing and disappearing without seams, those are all things you see commonly in AI art but people don't notice as much because your average AI artist isn't a seamstress.
Thanks for the talk, Phil! We live in some interesting times for art. Now, back to practicing drawing hands! 😅
i love how not so fast at explaining this video is and really having a calm music. we need these types of videos more. thanks Vox!
You accidentally made the like button highlight at 7:50 when saying "button-like".
As an artist, I will confirm, hands have an EXTREMELY low margin for error. There are many different body types, face shapes, limb proportions. Consequently, there's wiggle room. Not so with hands. People will still compliment most artwork that slightly misses the mark, but they will go silent if you mess up hands.
At 7:57 he said something that resonated with me: "AI art is basically bad at art, we're just able to see it with hands". A lot of times, when you look closely at an AI generated image, you start to notice all kinds of strange things, like shapes that doesn't make sense, roads leading nowhere, details that are simply wrong. Will this change, and what will it take? Right now it seems that you either have to accept a lot of errors or "peculiarities" with AI generated images, or you have to do a lot of manual work to get it right.
This is a reminder for artists: draw from real life as much as you can, not just photos. Our understanding of volumes and structure over simple outlines and textures is what will set us apart from AIs
At the moment, I noticed that at least some neural networks draw faces as a separate module, on top of the rest of the picture. The same should be done with your hands. There should also be a setting to “hide your hands” so that they simply end up behind your back, in your pockets, etc.