Main

AI Learns to Walk (deep reinforcement learning)

AI Teaches Itself to Walk! In this video an AI Warehouse agent named Albert learns how to walk to escape 5 rooms I created. The AI was trained using Deep Reinforcement Learning, a method of Machine Learning which involves rewarding the agent for doing something correctly, and punishing it for doing anything incorrectly. Albert's actions are controlled by a Neural Network that's updated after each attempt in order to try to give Albert more rewards and less punishments over time. Check the pinned comment for more information on how the AI was trained! Current Subscribers: 135,027

AI Warehouse

10 months ago

هذا ألبرت. هو عبارة عن ذكاء اصطناعي علمناه الزحف إلى أهداف ، ولكن هل يمكنه تعلم المشي؟ يستطيع ألبرت التحكم في كل طرف من أطرافه ، و يكافأ على الاقتراب من الهدف ، فسوف يتعلم كيفية استخدام رجليه ليمشي. يبدو أنك تعلمت كيفية عمل الدودة ... هذه ليست الطريقة التي من المفترض أن تمشي بها ... أرى كيف هو الأمر، ألبرت. دعونا نرى الدودة تعمل في الغرف التالية. من الآن فصاعدًا ستتم معاقبتك على ارتطامك بالأرض ، لكن تكافأ عند اصطدام قدميك بالأرض. لا ، ألبرت ، الدودة لن تعمل هنا. عليك التحسن كثيرًا مع ساقيك للمشي
... أنت فعليا تتوازن! ألبرت ، كانت هذه خطوتك الأولى! لم تكن رشيقة للغاية ، لكنها بداية. نحن الآن نصل إلى مكان ما ، أنت تتعلم المشي مع التخطي! لقد أصبحت جيدًا في ذلك ، ألبرت! أعتقد أن التخطي أفضل من الدودة. لكن من المفترض أن تمشي ، لا ان تتخطى. التخطي لن ينجح لفترة طويلة ، ألبرت. ومع ذلك ، أنت على وشك الوصول! أوه... أنت لا تعرف كيف تستدير ... قمت بالضغط على الزر ، ولكن هناك الكثير لتتعلمه. في هذه الغرفة ، ستضطر لتعلم الالتفاف. بالإضافة إلى المكافآت الأخرى ، ستتم مكافأتك على إبقاء صدرك مرتفعًا. ل
لتأكد من أنك لا تغش ، إذا لم يكن صدرك مرتفعًا كافية ، لا يمكنك الضغط على الأزرار. نعم ، الجدران موجودة الآن ، ألبرت. يبدو أنك ما زلت تتخطى ... لكنك على الأقل قمت بالضغط على الزر الأول! مرة أخرى؟! *تنهد* هذا أفضل من القفز عن الحافة,أفترض. هذا زران! أنت على وشك الوصول ، ألبرت! لا يمكنك المشي عبر الجدران ، ألبرت. اذهب حولها. عمل جيد ، ألبرت! لكن هذا خلط أكثر من مجرد المشي... بغض النظر، أحسنت! حان الوقت الآن لتعلم اتخاذ خطوات حقيقية. لهذه الغرفة ، تحتاج إلى تعلم كيفية التعامل مع المكعبات. بلاضافة عل
ى المكافآت والعقوبات الأخرى ، ستتم أيضًا مكافأتك على تبديل القدمين. مع هذه المكافأة الجديدة ، فإن خلطك العشوائي ليس جيدًا. نعم! لقد بدأت أخيرًا في اتخاذ خطوات مناسبة! لكنك ما زلت سيء... هيا يا ألبرت ، انت قادر على هذا! لا ، ألبرت. هذه هو الطريق الخاطئ ... يبدو أنك تعلمت التعامل مع المكعبات! نعم! كان ذلك جيدًا ، ألبرت ، لكنك ستحتاج إلى ان تكون أفضل بكثير للتغلب على هذا التحدي النهائي. عمل ممتاز ، ألبرت! الآن يمكنك المشي ، هناك عالم جديد تمامًا من الأشياء لتتعلمها :) - Translated by FoulerTripod

Comments

@aiwarehouse

In every “AI learns to walk” video I’ve seen, the AI either learns to walk in a weird, non-human way, or they use motion capture of a real person walking and simply train the AI to imitate that. I thought it was weird that nobody tried to train it to walk properly from scratch (without any external data), so I wanted to give it a shot! That’s what I said 4 months ago. It’s been really difficult, but I’ve finally managed to do it, so please watch the whole video! The final result ended up being awesome :) If you're interested in training your own AI like Albert but don't know how, there's now a really easy way to do it! Luda, an AI lab, recently built a web app that allows you to create and train your own AI using deep reinforcement learning (just like Albert) completely for free in your browser! You build your own character (called a Mel) with lego-like building blocks then watch it train in real-time on their website in just a few minutes (really). It's an awesome project, and just like my videos, makes deep reinforcement learning so much more accessible, which is why I love it so much. This section of the comment is sponsored by Luda, but these words are entirely my own, it's an amazing project that I would have been obsessed with had they released it before I built Albert. I've genuinely been looking for a sandbox/game exactly like this since I was a kid. They're still early, but they're giving my audience first access to their closed, pre-alpha build. Make sure you check out their site and create an AI agent for yourself!:D https://prealpha.mels.ai Now, back to Albert: NOTE: You can only see one Albert, but there are actually 200 copies of Albert and the room he’s in training behind the camera to speed up the training. If you want to learn more about how Albert actually works, you can read the rest of this very long comment I wrote explaining exactly how I trained him! (and please let the video play in the background while reading so YouTube will show Albert to more people) THE BASICS I created everything using Unity and ML-Agents. Albert is controlled entirely by an artificial brain (neural network) which has 5 layers, the first layer consists of the inputs (the information Albert is given before taking action, like his limb positions and velocities), the last layer tells him what actions to take and the middle 3 layers, called hidden layers, are where the calculations are performed to convert the inputs into actions. His brain was trained using the standard algorithm in reinforcement learning; proximal policy optimization (PPO). For each of Albert’s limbs I’ve given him (as an input) the position, velocity, angular velocity, contacts (if it’s touching the ground, wall or obstacle) and the strength applied to it. I’ve also given him the distance from each foot to the ground, direction of the closest target, the direction his body’s moving, his body’s velocity, the distance from his chest to his feet and the amount of time one foot has been in front of the other. As for his actions, we allow Albert to control each body part’s rotation and strength (with some limitations so his arm can’t bend backwards, for example). Just like the last videos, Albert was trained using reinforcement learning. For each of Albert's attempts, we calculate a score for how 'good' it was and make small, calculated adjustments to his brain to try to encourage the behaviors that led to a higher score and avoid those that led to a lower score. You can think of increasing Albert’s score as rewarding him and decreasing his score as punishing him, or you can think about it like natural selection where the best performing Alberts are most likely to reproduce. For this video there are 13 different types of rewards (ways to calculate Albert's score), we start off with only a couple and with each new room add more, always in an attempt to get him to walk. REWARD FUNCTION Room 1: We start off very simple in the first room, we reward him based on how much he moved to the target and we punish him for moving in the wrong direction. This led to Albert doing the worm towards the target, since he figured out that was the easiest way for him to move the quickest/get the highest score. Room 2: In the second room we start checking if his limbs hit the ground. If the limb that hits the ground is a foot we reward him (but only if it's in front of his other foot, more on that later), if it isn’t, we punish him. I also made it so Albert wasn’t rewarded at all unless his chest was high enough to force it to at least be partially standing. As seen in the video, this encourages him to not fall over and encourages him to use his feet to do it. We also introduced a new reward designed to encourage smoother movement; if he approaches the maximum strength allowed on a limb he's punished, and he's rewarded if he uses a strength of almost 0. This encourages him to opt for the more human-like movement of using a bit of strength from many limbs as opposed to a lot of strength from one limb. Room 3: This is where we start to polish Albert’s gait that developed in room 2 and teach him to turn. From here on we start using the chest height calculation as another direct reward where the higher his chest is the more he’s rewarded in an attempt to get him to stand up as straight as possible. These rewards so far give Albert a decent gait, however he’s still not using both of his feet (which was by far the hardest part of this project), so room 4 is designed to do exactly that. Room 4: We get Albert to take more steps from a few additional rewards. To start, we introduce a 2 second timer that resets when one foot goes in front of the other. We reward Albert whenever this timer is above 0 (the front foot has been in front for < 2 seconds), and we punish him whenever the timer goes below 0 (the front foot has been in front > 2 seconds). We add another reward proportional to the distance of his steps to encourage him to take larger steps. To smooth out the movement, we also add a punishment every frame proportional to the difference in his body’s velocity from the previous frame to the current frame, so if he’s moving at a perfectly consistent velocity he isn’t punished at all, and if he makes very quick erratic movements he’s punished a lot. If you're still reading this, you're probably really smart and want to learn more about Albert, so make sure to join my discord server I just made where we can talk more about the details of Albert's AI! https://discord.gg/jM2WkNuBnG :) Room 5: For the final room the only change I made to the reward function was to go back to an earlier version of a reward. Throughout the other rooms I had been tinkering with how I should reward Albert’s feet being grounded, my initial thought was to only reward the front foot for being grounded to try to get him to put more weight on his front foot when taking steps, but somewhere along the way I changed it to just rewarding Albert for any foot being grounded, and that was the version Albert trained with in rooms 3 and 4. For this final room I switched back to the old front foot grounded reward which resulted in a much nicer looking walk. Also, the video makes it seem like I never reset Albert’s brain, that isn't entirely true, I had to occasionally reset it because of something called decaying plasticity. OTHER For rooms 1 to 4 I only allowed Albert to make a decision every 5 game ticks, but for the final room I removed that constraint and let him make decisions every frame. I found if Albert makes a decision every game tick it’s too difficult for him to commit to any proper movements, he ends up just making very small movements like slightly pushing his front foot forward when he should be taking a full step. The 5 game tick decision time forces him to commit to his decision for at least 5 game ticks so he ends up being more careful when moving a limb. When I recorded him beating the final room I removed this limitation because he’s already learned to commit to his actions so allowing him to make a decision every tick just results in a smoother motion. If you’re still reading this thank you for being so interested in the project! I’d like to upload much more often than once every few months, and to do that I need some help. I have 2 part time positions open, one for a Unity AI Developer (helping me create the challenges and train Albert with ML-Agents) and one for a Unity Game Developer (assembling the scenes the AI trains in, writing scripts for smooth camera movement, creating any animations needed for the intro/outro, essentially any of the development that isn't AI). It would be part time work (paid per project), If you think you’d be able to help, please apply here for the AI Developer position: forms.gle/rExRJCKcxNmxnBRu5 and here for the Game Developer position: forms.gle/gnWV2rg76XkyGTwH9 I’ve hidden these job postings in this long pinned comment to make sure anybody who applies is interested enough in the videos to actually read the whole comment, so thank you for reading all the way through!:D Thank you so much for watching, this video took me 4 months to make, so please, if you enjoyed it or learned something from it, share it with someone you think will also enjoy it! :)

@edhozell

AI will definitely hate us

@DonLeeVA

The little tantrums he throws after falling gets me every time

@theunknownanomaly1950

I've had to relearn how to walk a few times due to having ankle surgery, and I must admit that Albert's progression from shuffle to walk is uncannily similar to my own progression.

@izzdrizz3535

It's so cool once you realize that the way Albert learns to walk is the same way a lot of babies do. First they wiggle to move, then they learn to shuffle and slide, then they learn to crawl, then they start taking steps, albeit odd steps, and then they finally learn to walk and turn. It's pretty much the same way a human baby would learn. Super cool. :)

@4m0nym

I love how 6:10 is the most replayed part of the video. It really looks exactly like when a human trips, doing those stupid desperate steps but ultimately you're past the point of saving yourself.

@gauravkumar3146

We all need determination like Albert in our life.

@i-love-anime-idols

Up next: Albert learns to get his degree and start a family

@LeannLeannProduction

Wow this is the most clever way to train an AI to walk that I have ever seen. You masterfully set up video game-like challenges to force it to learn specific behaviors that would overall give a better result. Plus, the editing is fantastic and you have a quirky sense of humor! Awesome work! So fun and fascinating to watch :D I look forward to continuations of Albert's learning journey!

@sidthesloth12

The fact Albert worked on learning to take a hit and stay upright, before continue to learn how to walk correctly, is a major genius move, on the programming and Alberts part as well. He was able to identify the importance of tasks, and prioritize their value to his job.

@FlyingPoptart101

nicely done! bipedal walking is super hard and the amount of curriculum learning and reward tuning here is super impressive!

@RasmusBerggren-uo6uu

I love that in every single video Albert finds a way to jump of the edge. I love that he did it even before he had learned to turn correctly

@WatchFelineSpine

I love it when he tries to balance for the first times LOL it’s so cute 😭❤️

@Gabonation

The fact that albert learned how to do the worm before walking means this man is ahead of us

@eaoke3383

I love the detailed explanation. Thank you for taking the time to write it.

@deeeeniiiiss

it's kind of insane how similar this is to a real person learning how to walk, from crawling to shuffling to skipping to slowly using your feet properly, even at 6:08 when he fell I saw myself falling like that 100 times and running until I either catch my balance or becoem one with the ground

@mythbusterman8541

The feat of balance at the end of the final room whilst being bombarded with boxes was something to behold , well played Albert well played .

@peterjohnson8570

Love it! But I'm super interested in everything that goes into making it... a tutorial series on how to build these simulation and training environments in Unity (or, even more preferred, Unreal) would be AMAZING!

@glitch4771

I like how the ultimate reward for all of his efforts is just endless falling into the void

@studio_farrago

Imagine your toddler learning how to walk and you just start petling him with boxes. 😂