Main

George Hotz | Exploring | we bought a tenstorrent e150! | Grayskull™ e150 | Open source | Jim Keller

Date of the stream 16 Mar 2024. from $1250 buy https://comma.ai/shop/comma-3x & best ADAS system in the world https://openpilot.comma.ai Live-stream chat added as Subtitles/CC - English (Twitch Chat) - at the bottom - Show Transcript Source: - https://tenstorrent.com/cards/ (Grayskull™ e150 - Add to cart) - $799 - https://tenstorrent.com/setup/ - https://github.com/orgs/tenstorrent/repositories - https://github.com/geohot/tt-twitch Follow for notifications: - https://twitch.tv/georgehotz Support George: - https://twitch.tv/subs/georgehotz Pre-order tinybox: - https://buy.stripe.com/5kAaGL6lk9uX9nW144 (https://tinygrad.org/) Chapters: 00:00:00 intro 00:00:40 tenstorrent e150 00:02:40 AT&T, AMD, Qualcomm 00:04:07 Jim Keller, Tenstorrent, Open source 00:07:13 nvidia orin devkit, TRUFFLE–1 00:10:10 tinybox specs, nvidia 5090 rumor 00:11:05 pushing AMD, embracing open source 00:12:20 unboxing e150 00:13:55 interactions with AMD, lying 00:15:13 tenstorrent mega blower fan 00:23:10 fan loudness check 00:23:45 fanless, thermal protection 00:28:42 tenstorrent.com/setup, NDA 00:29:10 Alex 00:30:22 hardware installation 00:34:10 banjo.canonical.com 00:35:10 firmware installing broken 00:39:55 work life balance, Alex 00:41:15 tt-smi, temperature 00:43:07 tenstorrent bounties 00:47:15 smoke test 00:49:45 buda 00:53:40 cloud is a scam, selling cards 00:55:20 import error 00:57:50 this is why tinygrad is going win 00:58:55 nix 00:59:25 docker, no dependencies 01:02:20 nvidia value, cuda 01:02:55 reproducible builds, getting rid of complexity, it just works 01:05:30 import error 01:06:40 the complexity 01:06:54 redis great software 01:08:22 missing jemalloc 01:11:00 key error BACKEND_ARCH_NAME 01:12:15 mysql, sqlite 01:14:40 docker encouragees 01:15:15 exploring tt-metal 01:17:45 PEP8, bad programmers, details 01:18:25 downloading multiple gigabytes 01:20:35 translating github comments 01:25:00 tt-buda 01:26:04 stop writing code like this 01:32:30 if you thought AMD is bad try tenstorrent card 01:32:55 tt-smi reset options, reset file, reset json 01:35:45 this is the opposite of how you do complexity 01:36:40 ordering food, folding phone 01:42:05 tt-metal 01:46:00 start saying what it is 01:48:30 HSA fail, making everything generic 01:51:20 tt_lib, tt_eager 01:52:00 py-buda, luwen 02:04:00 installing egg 02:10:00 tensor bob 02:13:20 devin 02:15:30 reading docs 02:17:40 pyrsistent 02:19:40 Jim stop, nobody wants 50% PyTorch 02:19:55 be proud of what the chip is and expose the chip for what it is 02:21:19 ttlib docs, chip info 02:23:10 no real company is going to buy this card, dojo, inference price per dollar 02:23:50 tinygrad in for 10 years 02:24:05 PyTorch trigger 02:31:10 what this is? 02:35:45 grayskull e150 number of cores 02:37:20 bad documentation 02:42:44 tenstorrent staff in the stream 03:00:25 not trusting C 03:05:02 Alex closing the door 03:11:50 hate coding in C 03:17:00 bad code trigger, memcpy 03:27:17 SFPU 03:34:15 doing math, 13+7=21 03:43:30 kernel APIs 03:47:50 this better than GPUs 04:02:25 opencl api bad concept 04:06:30 food 04:12:20 life magazine 1970 04:16:10 Alex, Groq $20k card, Groq open source 04:20:05 brain backprop, small work units 04:20:39 nvidia full fabric memory 04:22:26 tenstorrent website bug 04:24:07 keeping up with state of the art ML 04:25:10 AMD price advantage, mi300x 04:28:20 graph compiler should be generic 04:28:35 port tinygrad to tenstorrent 04:33:40 the bitter lesson 04:38:50 break 04:47:12 worry about correctness 04:48:25 thumbs up ai 04:48:50 tinygrad code for metal 05:09:00 host_api 05:11:00 Karen ai, devin, VCs, deep state 05:15:17 unitree humanoid, money printer 05:19:04 1971, gold 05:25:50 clang 14 install, VC investment scam 05:42:50 dispatcher kernel, 1971, zebu ep1 06:03:00 fake queuing, scheduling is the key, big scale 06:11:30 Groq demo, tt hardware, wormhole 06:29:25 software, endorsing tenstorrent, comma body 06:49:50 X9000 G2 Groq demo challenge Official George Hotz communication channels: - https://geohot.com - https://twitter.com/realGeorgeHotz - https://instagram.com/georgehotz - https://tinygrad.org - https://geohot.github.io/blog - https://github.com/geohot We archive George Hotz and comma.ai videos for fun. Follow for notifications: - https://twitter.com/geohotarchive Thank you for reading and using the SHOW MORE button. We hope you enjoy watching George's videos as much as we do. See you at the next video.

george hotz archive

13 days ago

good morning everybody let me just turn this off make sure streaming yeah we're good cool uh what do I want here that one okay we have a stream chat guess what we bought today if you read the title you know but we bought a Kens Tor E350 you guys good thing you subscribe to my stream that's how we afforded the $800 to buy this it was $800 and it's not going to deliver us $800 of value we already know that uh but you know we bought it to support the cause uh we're drinking high ball energy this is
like these are these are discontinued and it tastes discontinued and do like not a good thing but we're still drinking in high ball energy we got a 10 t 350 it comes with a blower duct assembly and uh we're going to try to put it in the computer and we're going to see if that works and I don't know man might work it might not work but we're doing it live right here on George hot stream um look I know I haven't streamed in a while look a lot of a lot of a lot of shit's been happening in my life
man no it's not really true nothing's been happening except for shipping tiny boxes except for shipping tiny boxes uh you know I've been putting in a lot of work um to deliver tiny boxes uh to the people is The Voice Low oh yeah the voice is kind of low let me let me let me uh Crank That Crank That Crank that all right is that good you can probably go in here and crank in more sound crank that oh we are on the wrong microphone we on the wrong microphone I don't know properties Yeti X all right w
ell that's what you get so that's the voice uh yeah we got we got lots of we got lots of tiny boxes uh we could just like go over here and you can like check out tiny 12 uh so this is Tiny 12 it's a tiny box um I can show you like like look at this boys oh you can't see any of that look at this look at this look at this Communications Matrix bidirectional copy Peak bandwidth been talking to AT&T again sorry not AT&T AT&T genuinely are some of the worst uh people in the world I'm talking about am
d who's actually pretty nice you know what I will say this about AMD they are in a lot live company um Lisa Sue is alive right Su Pai Dead uh Jensen alive Lisa alive right like like like you could just tell quickly interacting with a company Qualcomm I don't even know who the fucking qualcom CEO is I've emailed him 10 times do he reply to me no you know you're a dead company bro and it's sad qualcomm's real sad but don't get me started on Qualcomm again uh I believe like AMD can can own the futu
re um um I'm I'm very uh bullish if they if they uh you know if they go through with what they said they're going to go through with but I've seen a lot of positive steps and that's why the tiny box has six AMD gpus in it uh these are these are AMD gpus uh so you know we're bullish on AMD uh medium on Qualcomm I mean look they make a great chip but they uh they suck and we are so bearish on AT&T and like it's just you know AT&T is Boeing tier boys Boeing tier um but we're not here to talk about
any of those companies here today we're here to talk about tense torrent uh so tense torrent uh has been taken over by one Jim Keller the most famous uh computer architect uh living computer architect uh he created like every the AMD 64 the Apple M chip or the predecessor to the Apple M chips uh the Tesla FSD chip and now he's at T so you know we like to support the cause um he also open sourced a lot of code so you know we uh that's that's who we support right if you want to be like like cabus
look I like cabus too but they haven't open sourced anything so when cabus agrees to be open source and sell me a sell me a wafer s me a wafer a wafer that comes in a box that'd be cool no no no no they didn't ship a card whoa whoa I don't fuck with that kind of shit man if it's like we have one special card for you I'm like no fuck off anyone who wants can buy a 10 stor card right here just click add to card they'll ship it to you right you think I would not spend a minute of this stream if thi
s was something special that I had that you couldn't buy you can spend your own $800 and buy one of these um not in the M series but the A4 of course he was um I saw the Truffle like look things like that it's an overpriced Appliance right you're much better off for the same dollars buying a gaming PC but if you don't feel comfortable like if you don't feel comfortable installing your own Linux maybe truffle will be good but it's an appliance it's not uh it's it's it's it's it's it's as dumb as
like the rabbit thing like don't buy these things right like like they what what you get sold is like oh you know what it is you ever see there's a great uh YouTube thing that tears apart MMO RPGs uh and people will buy into like MMO RPG pre-orders because they want to be part of a community and they think that the MMO RPG is going to somehow improve their life and I really regret like there's a lot of people who saw my Lex and thought like the Tiny Box was going to be like a personal firewall y
our house like it's not it's not okay the the the the Comm is not a self-driving car it's a driver assistance system and the Tiny Box is not a firewall for your house it's a budget AI training computer it's just about the best flops you can buy per dollar but if you don't understand what flops are the tiny box is not for you um no the Tiny Box oh the Truffle dude the Truffle don't even I'm not even going to go into it okay it's literally the Nvidia Orin in a plastic case okay like like um is the
re anything else with similar RAM and bandwidth for that price so yeah it's I mean okay you like the high Ram right here I'll show you if you guys don't know about right like the the these are the um the Orin Dev kits I don't know why you can't type Orin which which how much round does it have which one is it it's one of these wait it has 64 all right fine we'll look at the Truffle that cuz all right fine fine fine fine truffle AI computer shit Invidia or GPI GPO huh okay maybe it's actually a b
etter deal than the Nvidia one I take it back that's actually what it competes with um how much is this okay never mind and nvidia's nvidia's scamming is is too high uh so yeah um no but if you're looking forward to how much RAM bandwidth does it have 200 gigabytes per second I mean this is on par with a uh an AMD epic I don't know how they're getting that price okay it's less of a scam than I thought I I thought it was just a plastic case around something that cost like $7.99 but I didn't reali
ze how much Nvidia was scamming so so whatever uh what like how um all right but yeah like like you you might have a good experience with this if if you're looking if you're the kind of person who likes oama you might like truffle um like this isn't what Tiny Box is at all but I think that what happens like with a lot of devices like that is like you buy one and like like you think you're going to use it but you don't um you're much better off at that price point it's true you're not going to ge
t 64 gigs of RAM but you're much better off at that price point buying a gaming GPO uh I don't know but I guess if people want the high RAM for llms um I mean it's interesting llms are interesting like that where like it's not really the flops matter a lot less than the ram bandwidth um of course uh you seen ours 5.76 terabytes per second now to be fair that's not full fabric right uh so that's each GPU has I just added the ramp balance up and you can access things that fast you can't access any
thing that fast so that's just how Ram Works gddr 7 coming in hot let's go wait so I was talking last night and I heard that there just might not be 509s right think about it you're Nvidia and you have two products one that makes 30% margins and sells to Gamers who you got to deal with versus the other that makes 90% margins and sells to corporate ass people look which one are you going to manufacture if you have limited capacity on your line which tsmc does are you going to manufacture the 590
or are you going to make more h100s um yeah we are trying to push AMD uh I had another phone call with them yesterday look like again they're a big company you you have to you have to give them some uh allowance for that but they like you know they're they're they're they're trying and they're not like they're they're trying well they're not like I don't mean that like they're trying like oh they're trying I mean like they're trying um you know they obviously don't want there to be bugs in the d
river either and I think we can we can get to a better place together uh yeah if they Embrace open source they'll take the lead completely completely the idea that these companies don't Embrace open source for some reason it's it's it's like a backwards way of thinking um and I don't mean this again I don't mean mean this in like a progressive shame like that's a backwards way of thinking how could you how could you say that there should be slavery you know um I don't know man you know you got t
o think of the pros and cons right that's why we got to make machine slaves and that brings us to our stream today which is the 10 torr E350 let's open the box all right so I'm going to I'm going to nudge this down a little bit and you're going to watch me on the floor trying to plug this into the computer okay comes in a white box they list on the back they patents weak weak weak what's in the box let's see what's in the box okay comes with this little card um oh it comes with some stickers all
right little envel opy thing here little envelope thing some stickers all right all right comes with a piece of let's make this a real unboxing video comes with a piece of foam oh and then in the box we have a t torent card all right what do we think has a hole for a 10 story card sorry that wasn't very ASMR of me look at just 10 story card um but yeah know we're we're bullish on AMD and there's going to be an announcement they told me there's going to be an announcement in a week or two you gu
ys will see um we're we're bullish on AMD we'll see if they actually come through with the announcement but for the most part when they say they're going to do something they do it which uh you know I can respect they're not Liars they're not liars and I'll say this too in all my interactions with with them sometimes I'll interact with people and they yeah they lie they lie because like I haven't found that at all with them um they've been they've been straightforward about everything uh you kno
w again everyone's not going to do exactly what you want all the time um but uh yeah they they you know that's that's when when someone starts lying to me that's when I get very upset um when when you start hear like you know I could talk about some interactions with some companies where they're just they're just liars um and let's tell about lying too here's the thing about lying when someone lies to you they're lying to themselves you can't just lie to other people if you're a liar if you're s
omeone who frequently lies you no longer have a coherent World model you no longer have you know what the old saying it's like it's it's so much easier to uh to to just keep the facts straight if you just tell the truth right um yeah so they they lie to themselves the organization when a sales person lies to you the whole organization's just full of lies um and I haven't found out with here we go and here's a TENS torrent Mega blower fan okay uh did it come with screws how do I how do I stick th
is on here this is also not going to fit in my computer wait what the hell did I misunderstand something are there screws how do I stick this on here like I see what it's supposed to do oh oh this screws hang on hang on put this camera back down all right full screen got okay so we have our 10 torn card and we have a blower fan duck thing and in here we have some screws but wait is this going to fit in my computer if I put that thing on the back I I see what it's supposed to do how am I supposed
to screw that in though there's no way this is going to fit in my computer okay let's let's unplug my computer have Q over here it's a normal gaming shit computer uh I maybe it'll fit I'll move back a little so you guys can see I clean my room just for you and for Jordan [Music] Peters is this going to fit okay so that's that long and this guy going to go back here and there's no way in hell that's going to fit inside my computer God damn it look at this thing did T Tor think about this what am
I supposed to do we could go fanless all right well let's take out have an old I have an old AMD card take screwdriver too where's the screwdriver maybe there's one in this closet screwdriver oh no tell me I don't have a screwdriver we did not prepare boys we did not prepare did you want me to prepare before the stream is that what you wanted have a little Ikea tool chest here know them Ikeas I'm about to blow a bag at Ikea you just want the content I know yeah you don't want me to prepare befo
re the stream that would have been lame what do you prepare for of course I got to figure out how to push that damn PCI thing need a poker it'll hurt my hand doing this we'll hurt my hand there we oh well we broke it so we're taking out the Red Devil uh this is a this is rdna 2 card disgusting um now we love AMD guys we're AMD fans on this channel who are we not fans of AT&T and Bo it and the Deep State watched a good video about the Deep State last night all right uh so is there any way this is
going to fit maybe maybe it will just barely how do I put this on came with screws but like am I an idiot let's go back to the tap oh no and it feels like this fan doesn't have any speed control so it's just going to be loud so we have a blower we have a card and it came with three screws but how do I put the screw screws in here oh I probably have to take this apart okay okay of course I don't have the screw we're going to have to find more screwdrivers I know I have more screwdrivers I just d
on't know where have like one of those like kits for mobile phone disassembly somewhere but I don't know where here have a vape in here that's pretty cool no go walk we have some hex things in here but no it's not going to be any of these we have to take these screws [Music] out maybe we can use pliers do it carefully we really need to get the right screwdriver also is this just going to be loud let's first check let's just first do a loud check cuz if it's going to be loud I don't even want thi
s computer's name is literally quiet so you know loud is kind of the opposite of what it is all right so let's take this weird fan assembly let's plug it into some power here let's try the computer how loud's it going to be boys loud as fuck oh wow it's a lot of air though all right we're making an executive decision we're going fanless how long will the card get I don't know let's go fanless let's find out 10 T I hope youe good thermal protection I I can see the press release on their website u
h George used the card outside of its normal specification he thought the fan was loud and didn't put it on it is loud okay I didn't think it that's a fact no no no no no no we we we we're here to like tense torrent Jim's cool he does the right thing uh we're going to go through the quality of their software in a bit but um so far I don't know why didn't think of this fan being used in residential environments where I'd rather not hear something that is stupidly loud so the blower duct assembly
is going back from when it came all right we're back to the card this for you let's put the card in the computer let's get this thing powered on let's get this show on the proverbial Pro proverbial Road the cards rated for 200 Watts so I wonder what it is at idol it has a big heat sink like it's going to overheat medium right in the hole okay the fan assembly was not user friendly this card was clearly not designed though in Norm people computers .1 for gpus all right got it in the hole let's pu
t some screws in it I don't know what it tends to any 350 is either we just bought a150 is either we just bought it for $800 using profits from them from this stream so I hope you all appreciate you know what I probably should have plugged this wire in before I stuck that in the wall we have a six pin connector and an eight pin connector now they're both connected to my power supply fits in nicely with my 380 ETI the side for my piece of tempered glass back on here you think it overheats right a
way I think it's one of these cards that has no thermal management and it just like it throws the 200 wats the blower fam assembly was not well thought the I think I think it should have uh speed control actually I gave this feedback to ttor while I was visiting I saw the thing and I'm like oh really it's an external fan and then they shipped it with an external fan so uh yeah yeah it didn't get better I have this problem once with the uh bought one of them Zeon five cards that was meant for dat
a centers okay the E 150 is inside of the computer let's plug the computer back in we got power we got Network and we don't need anything else let's go put my tools away let's do what the card says here we'll go to tor.com SLS setup I remember early on they wanted to give like they wanted to give me early access to a t torent card and they were like you have to sign a da and I was like nope not doing that because if you sign an NDA you can never give your real opinion about things my opinion so
far is that the fan is too loud and too big the fan is always too loud uh I'm on stream also the fan is too loud and too big and might put the 10 Tor card on my computer without a fan Alex Alex X [Music] Alex domestic violence time what kind of household do you live in man I'm sorry you come from a broken home uh okay so uh lspci where's my T torrent card oh here we go processing accelerators uh all right let's uh uh let's get to work uh setup it would be nice if I knew if it was overheating or
not I can touch it doesn't feel hot we're good let's see what the hardware installation says about the oh guys we didn't use an ESD wrist trp all right TT KMD this is their kernel driver um let's check out their desired version of it we'll add it to dkm mask right that's pretty easy let's install it all right mod probe 10 Tor great uh let me see if it's cool uh we can also do update PCI IDs there's a chance that they added it no oh here processing accelerators T join in Gray skull great they add
ed at lspci uh okay let's uh up up the firmware there's any issues installing TT flash um detected Wormhole beable good we have a gray skull okay cool uh firmware is the firmware open source no the firmware is not open source it should be why is the firmware not open source oh wait no also TT flash didn't install uh cargo the rust package manager is not installed great George you didn't use rust up you're supposed to use rust up to install rust I haven't act up in a while let's take a look at th
is what banjo.com why isn't that working why do I have ban out where's banjo coming from is my computer of a banjo what is Banjo oh creepy okay now we'll have cargo and look now it's installing lvm 17 we've moved up in the world oh I can disable this great good fuck off data collection don't connect to banjo still connecting to banjo we don't mind as long as you use rust sometime in the future no rust all right we have to wait while cargo installs so we can install TT flash while we wait let's t
ake a look at the firmware what 404 not found uh oh we're not off to a good start here your firmware install links broken uh fortunately I know about GitHub but if I was a grandma who purchased a 10 torrent card uh I would be in for a rough time uh where are the releases SL releases okay there are no releases so why does it think there's a release Here we can get 4.0 or here we have new versions do we want new versions do we want to download 4.0 [Music] one I don't know that one seems pretty goo
d we might not even need to update the firmware the firmware might be okay I'll notice that's not the latest one this one's actually newer release patch patches work around to address TT metal issue this sucks why is this firmware not open source Jim open source your firmware great not in format because that's not a real link is that a real link great that is so rude that's so rude you didn't put it in a no no no no no no no no you didn't put it in a thing and now I have to do this rude it it's
just rude to not put your thing in a directory you know okay uh wait we don't even have my card my card's not even in here I have an [Music] e does this one have it why don't where's my e I have to reboot the computer all right let's try reinstalling TT flash now that I've installed car but wait I don't understand I have an e 150 where's the E150 where is E150 perhaps the same you're telling me I have to use the crappy tiny firmware all right let's try again TT Flash for firmware pack looks like
you're running a very old set of firmware it's safe to assume that it needs an update but you should update it using Force that's very sketchy also it's very sketchy that this download didn't work okay let's try this one running it says an error that the firmware is too old you should for all right this is so sketchy I think I just broke my card let's give it a try okay we're verifying the flash flash complete power cycle the board a host reboot will usually accomplish this all right it's my be
droom or my work room Bros work life balance is a scam made up by people who don't want to work hard and if you don't want to work hard you should have you know whatever I don't know like this is why I can't stream anymore like I just I'm sick of saying that kind of shit I'm sorry thank you tell J the fan is too loud and too big no is that Trinity yes all right come on back you if you don't want to work hard you get you Ubi I heard you can go to Switzerland and kill yourself legally it's like a
nice experience I heard um all right let's install ttmi oh wait do we have the module installed or have the mod Pro we have the module okay all right just downloading 17 other things great love it gtsi oo that's pretty nice all right oh this is nice oh good work good work wow I wish Nvidia SMI was this good Nvidia has got to step up their game that's pretty nice that's the nicest SMI I've seen um good thing we trained our dam good good good uh we only have a little bit of Link because we're in a
crappy computer but that should be okay uh now unfortunately it doesn't show me a key thing I care about which is the temperature of the device um core temperature okay okay wow that's updating very fast uh uh good we're only drawing nine Watts okay that'll that'll cool passively for a bit wow this is this is the best SMI I've seen good work uh here but V Toops like a third party no I put the device in you didn't watch the beginning of the stream all right okay getting started started with TT B
uddha and TT medalo all right let's start with TT Buddha TT booted demos are in a different repo I think they have bounties too should we try to claim one of their bounties look here there's bounties I am working on this no no no no no don't look advice for people doing bounties never trust anyone who says crap like that sorry to pick on this guy all right all right he does have a greenish GitHub maybe he'll do okay but still until you see a pull request from people all right what's quen 1.5 oh
I have to include a comprehensive read me oh how much do I get paid for this $500 that's not worth it um I will submit a PO request though how comprehensive does this read me have to be um but wait let's see if we can claim enough bounties to pay for our thingy can you link this issue using the aloria source open no no stop pushing your token shit all right let's go um well let's clone the demos repo and see if we can get a demo work do they have Mist for example we were joking yesterday that mi
st isn't going to work but we are going to get it to add 2 plus 2 first five steps install TT Buddha okay let's go look at that here why is this in the demos set up huge Pages e doar or virtual ADM I hate both of those things stop yo y you know the joke about Docker look man it works on my computer oh well yeah but like how are you going to package it up to like ship to other people's computers I don't know man you want to just like make a tarball out of my computer and then we can ship it to pe
ople Docker all right wow no no no but like I love this SMI and everyone else needs to step up their SMI game wait it says budha is installed even no wait that's not the right I installed different firmware why does it think that that's the firmware version why why is this different oh I can get eth for but does it have token okay install P Buddha oh TVM that's smart TVM is a tiny red competitor it's pretty good all right I don't want to do Docker I hate Docker all right um huge Pages huge pages
are fine some devices one editing the grub I'm sketchy all take gra what they mounted it in you can just do that is an f f stab now that's cool uh we can reboot later oh should we up get upgrade no install recommends we we're connecting to banjo again I'm really sick of banjo P to compile what is all this crap I don't really want any of that torch Vision smoke test no P can I build it downloaded the latest wheel files just download a wheel let just download some Wheels py from 3.8 that's disgus
ting probably only works with this one though so let's [Music] try there's not a supported wheel on this platform okay well wow that's a large wheel we have to keep an eye on the temperature 46 oh great we all tensor flow block banjo and Etsy host that's not the worst idea someday I'll convert to Arch and have to worry about so much more crap let's Let It install stuff and while we let it install stuff let's check out metalum failed wrapper to convert to Pi bua first where's the metalum one okay
that's pretty cool uh GitHub project is is located here why is it in a separate organization see if this has tons of weird instruction guides uh if you just came from building from Source installation instructions okay a note about rebooting oh the minimum number of reboots will be two I don't know about that good thing we didn't Harvest it this one UNH harvested how do I harvest it oh wow okay they had a helpful python script to enable huge bases um okay we going install some stuff let's see i
f the smoke test works great okay which of the instructions did I not follow I have to install a new Boost boost docks all right let's Let It update all this crap I regret this already should we tell the banro to fuck off yeah let's tell the banro to fuck off yeah no banjo for you it's going to take a long time I don't want to wait for that let's just do this I had a comment to remember why I did that for later hey they're actually selling cards based right no the cloud is a scam guys the cloud
is everyone dude like someone told you that the cloud exists the cloud doesn't really exist it's it's one computer haven't you seen that that's the cloud right all right let's try the smoke test again oh well we need lib yaml CPP okay well we'll do a few more of these installs where's yaml none of these even are yaml oh here we go that one those seem like inoffensive dependencies some dependencies are offensive got to have the ammo all right well we don't have lib zmq okay well have they conside
red tiny gr which doesn't have dependencies what's a DOT file I don't even know is there a way to do this faster could someone could someone else do this faster than me am I an old man sometime I feel like an old man all right and now we're up to this kind of crap undefined symbol tensor imple compute stride like Channel identity okay this literally has all right well tensor compute strides let's see what this is like P torch um like the wheels should have fixed this has the wrong version of P t
orch probably the metalum has a better way of checking the huge Pages thing let's let's just do this where a all right should we reboot even though that's totally not what's causing that the torch problem the torch problem is is other crap all the second pass of the huge oh well I have to recurse sub models also periodically check lfs this is so complicated Max tenset slots and by Direction This Is Why Tiny's going to win you can follow all this stuff this is this is why tiny gr's going to win l
ike well I'm pretty good at computers I I I just tried for forever to get this set up and there's some problem with you know what oh I love how fast this computer reboots oh these these these uh these desktop computers reboot so fast okay so I probably have the wrong version of pytorch all right it seems like I have some old Cuda version of pytorch uh should we update pytorch no NX is no no I don't want to talk about NX okay I tried NX once and it was more confusing than all this no but NYX is r
eally beautiful yes I understand it's really beautiful but once they figure out how to express it beautifully and not just have beautiful ideas um then NX can take off but until then it won't yes yes the idea of oh it's all Version Control yes that's beautiful um Docker is absolute trash and still wins because this kind of pain costs Millions yeah let me show you a real way to solve problem okay all Docker does is layer tons more crap complexity on top of things the real way you solve this probl
em is you don't have fucking dependencies right those are tiny grads dependencies like stop writing code that in includes the world but no we just need this we need this look at this package it returns yes it Returns the word yes I can't write a function to do that like no but you have to use libraries in order to make your code maintainable no like this is this is this is tiny grad's thing like yeah we linked to some libraries using this autogen stubs right we have a few we have a few C types a
utogen stubs they're really easy to regenerate everybody needs to stop with all the dependencies and you know what it is it's so mediocre software Engineers can have jobs like I'm really convinced that most code in the world exists to give mediocre software Engineers jobs all right is it going to work now see if it works no same problem wait is that a different error now compute channels last 2D dim wasn't it 3D dim before I feel like it was 3D dim before all right there's probably an exact vers
ion of pytorch I have to use for this maybe I should try to build this from source and not use a wheel do I have to use off I have to use Docker no I'm I refuse to use Docker okay well first let's see if our huge Pages actually worked oh sweet we have huge Pages great I didn't have to do two reboots uh T Vision install where is torch that's torch Vision I don't think that matters um let's try building this from Source oh we have to like clone all the garbage uh what is it like a way to do this w
ith like sub modules yes let's make the world a sub module yes but no this this is exactly right here is why everybody who has shit like this like I'm buying Nvidia wait actually let's buy Nvidia oh look at Nvidia wow look it worked it even updated the driver let's use Nvidia oh import torch wow look torch works all right let's I don't know let's like like make like a random thing on a Cuda device right wow look that just worked oh let's multiply those two matrices oh look at that and it's on de
vice CA zero notice how it didn't give me any errors right and I didn't do anything special I just pip and stalled torch that is the incredibly valuable thing that Nvidia has and until every company understands this like no one's going to seriously compete with Nvidia I press this like I tell AMD the same shit it's like like you need reproducible builds you need to get rid of most of this crap because that is what gives Nvidia that's what lets Nvidia charge 90% margins for shit the fact that it
just works now there's two ways to make things just work right you can make things just work by managing insane amounts of complexity this is like the Google approach right you you can get really good at managing large amounts of complexity or you can also just not have the complexity to begin with any computer you can go to hip install Tiny grat and it will work um oh and then people think AI is going to fix this oh don't worry AI is AI is just going to be another shit layer well now now I don'
t use Docker directly now I just have ai spin up my Docker and my kubernetes like no you guys are this is the reason all software is shit no zuda is z zuda this is exactly not the answer even hip isn't the answer teny gr doesn't use hip anymore we use HSA uh it's one level lower than hip stop putting crappy apis on top of other crappy apis because I'll tell you about crappy apis right like if you have a crappy API at this layer and then you build another crappy API on top of it now all you did w
as crappy times two right this crappy API didn't didn't fix this underlying crappy API it just added its own crappiness um it's pretty competitive on AMD now yeah just pretty similar to G if you beam well it's not crappy Square maybe it even is somewhat there is a squared component to the crappiness too all right undefined symbol tensor imple compute channels 2D last okay let's figure out exactly maybe in requirements it'll tell me which version of torch I'm supposed to install Python and requir
ements okay so there a torch here uh okay here we go here we go it has to be this one uh oops yeah you gin you had great okay let's see does that fix the problem no no now we have a new problem okay let's try installing all the requirements uh install dasr python n requirements I don't understand why the wheel didn't do this for me oh no now this is a newer version nope this is going to break torch Library very kind optional dispatch key great and that doesn't even work oh well let's try did I t
ry this already what did I get when I did that oh yeah I have to do get subon [Music] no it's not that like all software is not crap it's just like the complexity this is not the way to deal with complexity like you know you know you want to see some great software rest this shit's great software you many lines Rices like this is this is great software here let's go into Source look it doesn't have multiple layers of directories it's just this you can just read it simple code and it works incred
ibly well and it like Powers the whole internet um like that's a great repo and here just just just to just to play let's build redus okay so I go to reddis and I type make wow is this going to work is this just going to work because this software is not shit I could have thrown some js on that bitch but now we can make fun of redus if this doesn't work actually let's throw a j on that b because it's taking forever okay so we're missing J Malik uh look they even tell you about this let's use lib
C's M with a J oh lib HDR histogram okay okay all right never mind reddis has too many why does reddis depend on a histogram why wh why why after a reboot never mind all software shit oh wait hang on hang on hang on hang on no no no no okay it's a little bit unfair because I [Music] think it was building no I just want the server I don't want the test I don't care about the test oh okay I don't know what I did wrong but it worked [Music] okay that's pretty good actually that that was kind of fi
ne that was kind of my fault I didn't specify what I was trying to build but there we go now we're connected to redis uh maybe we should also build a client that might have been my fault because I stopped the build or something let's not blame redis that that's a pepc error maybe it'll just completely there we go beautiful that was a petack arrow now I'm connected oh let's do help all I'm saying is red is is good software all right good we've en knitted the sub modules now let's pip setup instal
l oh key error backend Arch name of course of course we're going to need to give it a backend Arch name okay well let's see what we have to put in for that here we go boot a backend okay Arch name equals we have a gray skull back end Arch name all right all right looks pretty good have any lines in Rus let's take one kind of a lot but I bet a lot of them aren't needed um 100 ,000 lines in Rus how much of these are like no wait no there actually there's actually 100,000 lines in Rus all right all
right that's not that's not crazy it's also written in C uh what else is good code let's try to build my SQL is my SQL good code yeah that's a fan it's [Music] building we don't have B do yet yeah it's throttling my poor little ryen CPU there SQL light well sqlite's just even a library it's not even a sqlite is good software sqlite's phenomenal software I think it's just a library though I don't think it's anything else well this is a little bit large I will say this is a little bit large and w
hy does sqlite not have CI do we think this building is going to work all right let's let's play with TT metal maybe TT Metal Works installing oh good good that's the one we have great um go to TT metal uh what branch do we have you main okay uh get sub module andit recursive it's even that docker's bad software Docker itself might be totally fine software it's what Docker encourages and enables yeah yeah okay Docker not even if software stopped depending on everything then good we're building s
omething good good good we have NC risk all right let's get Visual Studio code up here and let's start exploring uh what do we want to explore do we want TT metal like what even is this that documentation page looked okay let let's see if this let's see if there's anything interesting that like okay this looks kind of interesting um okay build a project following that great uh H okay I kind of like this let's see if we can import ttn no module named ttnn yeah I mean this looks kind of interestin
g not with torch but can we just do this why I need to zero fine that worked how do I install this installation guide Canon Ops and tensors okay installation instructions okay we did all that develop our dependen do this great download all the lfs crap um set up the environment note that this setup is required every time you want to use the project oh no what Pepe it's disgusting if a standard rejects the god tier 2 spaces then we reject the standard like I'll tell you the problem with the stric
t formatters the problem with strict formatters is you can no longer tell who's a good programmer by looking at their code or not like one of the Beauties is I can look at a poll request and when people like have like a a variable and then like a space and then equals and then the thing right there I just know they're not a good programmer okay because they don't pay attention to detail if they had a pep eight formatter I would never see that about them and and you know that would just you just
be losing information what are we downloading that's multiple gigabytes oh we have to download the weights for llama 70b so you can run that example well that's sensible um all right so we don't have that yet oh oh did this do something okay let's try the example in this pipot no module name Pi Buddha but I I I I I installed it oh no never mind I'm sorry thirdparty Buddha backend device bin silicon Wormhole create ethernet map does not exist or is not a regular file great okay let's just try the
metal thing because Buddha didn't work how am I ever going to claim this $500 Bounty should I just try harder on this [Music] TT to unknown torch Library we have I think torch Vision didn't work maybe no I installed the exact wheel they wanted try to figure out where this thing is all right xft uses the 2.0 it does not seem to if you need to compile with other versions of torch you need to recompile I use the exact same version of torch let's uninstall and reinstall torch maybe it'll work bette
r using this exact wheel let's download the wheel no torch works great it's just when people try to link to it it's linking that's the tragedy oh here we have this torch audio garbage let's uninstall that okay okay okay I have a new Theory the older this version of TT Buddha might have been built against a different might be a different uh which release did I download we can look at my wheels and take a look okay uh Dev GS this one all right so this happened on January 5th so let's check out thi
s commit okay now let's look requirements this doesn't even have requirements anymore where's torch great this one doesn't even have a torch version that I'm supposed to use but this is probably the wrong torch version um how's this supposed to work if that doesn't work which version of torch should I use P torch 0.2 that's the one I have all right let's try one more thing I don't know what this cxx API garbage is let's try it without that that's the python we have TT Buddha seems obsolete this
did this Chinese guy figure it out in addition when packaging the wheel you need to avoid modifying there's not even a setup. pi anymore what is this this is not even a version these people need tiny gr support without dependencies yeah it takes me a while too this is Bullit shit you know like I just I'm kind of like this is like should we all right let's try metal for a bit I think we're giving up on Buddha I'm I'm not spending my time figuring out why the torch Library symbols undefined like s
top writing code like this oh oh okay it worked don't use the torch they tell you to use all right all right great okay all right it did something pH finally uh we got to keep an eye on the temperature all right we're doing okay all right look at all this stuff I mean this kind of pretty PR creating input Q connector well that wasn't in the docks all right sure we write a little uh little Benchmark want a benchmark let's see what this is a tiny [Music] uh wait I have to make a module Pi Buddha P
i torch module this looks so hard e my is [Music] gross all right does this work let's try a smaller Matrix first [Music] um h no error while pushing inputs okay so maybe I have to like make it a batch or something okay that seemed marginally better no all right uh let's start by copying their example exactly and then figure out where it breaks oh do I have to tell it it's a torch andn parameter um I don't I fine I'll write out torch. mapall if that matters I'll make it four and that should pret
ty much right oh let's stop doing it three times that might have been too much to ask for this the 10 point card might be a one time thing uhoh it's getting hot no it didn't work okay let's see if the old test still works or if I broke the card we broke the card okay can I reset the card if you thought AMD was bad try a t torent [Laughter] card um is it hot no it's not hot we're fine I mean that's not too hot that's not that bad uh uh all right does ttmi support reset oh here we go do I want to
generate a res I can generate a reset file I can generate reset J on fucking joke oh yes here update the generat file and use it reset reset EXP one argument okay okay okay okay that worked reasonably I don't know why that didn't show up and help oh it is there okay I see it just confused me with generate reset Json all right let's see if the test Works after I've reset the card okay good good good good we're we're back to it's working again it's working again all right let's see if my Benchmark
works that was actually not that bad not that bad we just had to reset the card once I mean we asked it a lot we tried to multiply a matrix that was okay good good this worked okay now let's make said M can I change that to one or is that going to break all right all right not too bad not too bad let's try let's try looping it three times what do we think will it work will it work can can the T card do three multiplies no I know I know that was not that bad that was not that bad we're not hatin
g we're not hating boys oh well don't do that definitely don't try to run something multiple times that was too much all right am I just not reading the API correctly all right let's play with metalum did I break it don't do it three times that's too many times you can multiply one Matrix per time the softare starts can I make the text bigger not but I can let nonsubscribers not talk that's right um there you go it's bigger are you happy when Matrix one reset That's the Law no no no no no to be
fair you the program multiple times great great great great now see this is the opposite of how you do complexity [Music] guys for I want food is it Uber Eats time oh I know what I want shansi Magic Kitchen oh wait till you guys see shansi Magic Kitchen uh sh Magic Kitchen oh yo by the way folding phone I got a folding [Music] phone I want to reorder from shansi Magic no shansi Magic Kitchen is not open right now nowhere's open right now because it's 9:35 by the way look at the size of this fold
ing phone you can see the crease from here uh hang on what do I want to eat this this all disgusting I'm going to eat disgusting food never mind also was disgusting wow P Buddha is actually pretty decent and practical good point oh CU I have to recompile it I mean that might work but if that does I I just yeah no I don't want coffee I want shansi Magic Kitchen no no I want Tha food that's what I want I want some noodles can I get some noodles or some fried rice when I was in India I saw a sign a
nd it said fired rice I'm like where' the rice get fired from no there's no Tha food available but I could get faam colie this is very far away it's almost 10 o'clock I could just wait and be patient what do you think of that or do we want a bagel bagels are disgusting like I eat bagels and then I'm not happy Taco Bell why is only disgusting food open all right should we get an S ball we get an SI ball bigger ball yeah bigger ball if I add $21 I can hang on so if I order from another one if I bu
ndle if I bundle a kondi from Randy's Donuts does that count as Uber no no you can't do that no no no no that doesn't work I'm trying to save $25 with Uber 1 but I have to buy more stuff from the place and all the stuff from this place is expensive right let's get Alex a juice here perfect great okay my as EO will be here yes spend more save more right like I just feel scammed you wish you had money to eat well you know don't buy a 10 stor card because then you won't have $800 but if you want th
is joy and if you want if you want to experience this this joyful experience by a 10 swor card all right let's see if we can use metal we got it working we got it working it wasn't it wasn't too terrible to be fair like as a dev kit this is pretty decent as far as devkits go on the devkit Spectrum um let's install this Python 3 setup Pi install Arch name is not provided well we know the arch name and it's gray skull no we don't have Dev men map great or the imple device do we have wheels are the
re wheels for T metal release okay okay is this a wheel uh no it's a zip file all right maybe we should try a release maybe the releases work I think that's what one of the things suggested let's check out that tag wow thank you for gifting Subs see if that works no problems with dead map again okay so question if I have pdha installed does that mean I have ttnn installed no definitely not okay so that was that was that was a false hope for did most of these um let's try this not try to clone I
don't know you want just try it again follow the instructions exactly why was I not I don't really want an SI ball why was I not just patient and waited for Shane Magic Kitchen to open so I could eat chicken and noodles which is what I really want Ethan 1G you should know better it's not for anything it's never for anything think of this four things you know okay I try to explain this with like the marketing for the Tiny Box like like stop saying what it does and start saying what it is I hate a
ll these like like no one sells food by what it does right people sell food based on what it is like I can respect food you know you can respect respect um no one tries to like tell you like like like this burger will complete you no say what it does people just need to like like focus on if everyone would just focus on describing the world accurately okay we need these two to metal home right python pass good we installed wheel let's just reset it resetting it was very enjoyable why is the core
power so high now I remember when it was low it's a little long now it's good we're building TT metal from this version and then we can get started T man this is kind of cool oh look at this graph oh that looks cool why does everyone try to like make it like work with torch like it's never going to be good maybe AMD will be good because they built literally Cuda clone single core and multicore so I mean this actually the the I hate this crap oh this is why HSA failed like stop trying to make ev
erything generic and just just just just make your shit like explain to me what the T toor thing is actually very cool like the accelerator is very cool I visited them they explained it to me um it's just a shame that they don't like explain this to you they're like we're building a generic API that's going to drive business value like no stop saying what what is it user friendly intuitive familiar with fucking pie torch why is your low-level shit pie torch oh maybe there's something lower level
than ttnn all right well this is all open source so why don't we just start reading the code for TT Buddha P Buddha budha and then we can see here okay Pi budha we have something called Pi torch module we can like see what this actually does luru great uh what is this calling into at the lower level probably need tons of C code I do like that they've lifted that that that their metal thing is python like you want your metal thing to be python but then the question is does it actually use that e
ach one of these files is longer than tiny grad yeah make something that works for your thing exactly what do you want to know from semi analysis I should pay for how much is semi analysis I should pay for it is it expensive oh they have a substack 700 that's kind of a lot a year a month good we're installing a lot of stuff oh yeah that's not that bad okay dependencies TT lip is a unified to the tensor argument within TT eager but I think TT eager is part of TT metal the amount of code in this s
tuff is just crazy but like this is what happens this is what happens when you just employ software Engineers you get an insane amount of code that can barely multiply a matrix but no you're just not using it right yeah well who's going to use it right you know like if I can't use it right tell me who's going to use it right let's see let's let's give it a fair shake and see if I can fix my thing and see if there's some docs that explain why I can't put modular run in a loop like you'd obviously
expect run inference inputs output Q get oh maybe is this something I can let's just cause run inference which this shit why when I look in run inference what it calls is underscore run inference why is there an abstraction layer there of course we're going to get devices in this complicated thing and then we're going to run devices inference all right and then if not resume we're going to initialize that how do anyone do it oh yeah we're going to start a loop thread unless we're in sequential
we're going to check the sequential override how does anyone do this oh great thank you for building a environment for me do I have ttnn now uh no of course not wait that has ttnn that's just weird uh no don't install it in a do I have a wheel somewhere where can I just install it globally uh no I want a TT metal wheel iPhone 3 setup installed Marsh name equals Grace call come on oh wait you already built this why are you building again don't build again please don't build again please don't bui
ld again okay it looks like it's not using a ton of CPU so it's not building again maybe Enterprise companies have code complexity tools that want 5,000 tiny functions like I I hate this this is this is this is just this kind of code is just like please have a million bugs yeah oh look oh good thing we're in sequential override mode and we can run forward inside a thread all right let's look at run forward okay here we go so where does it actually run the fucking thing what does run command do w
here do I get there okay here we go all right now we have a device okay push to command que okay self command Q push great all right so we're deep in here and we have a device now what actually gets pushed to a command que can I like look at a command Que send the shit to this this is block oh God what is this what's being pushed to a command Q you put Types on all the useless shit like the q's but you don't put Types on the oh well of course this doesn't work because it doesn't have weird ass o
ld version of torch missing parentheses and call to print no no uh how do I create a wheel can I make wheel like that wheel okay well somehow it's set up but I had to use this virtual amp and you know how I feel about virtual amps I hate that Python 3 doesn't work but if I do python 310 for some reason I can import ttnn there let's see if that works no well that's the wrong ttnn so it doesn't work at all never mind if I do the one in Python 3 maybe I have to go here when I import ttnn but that d
oesn't work because there's no Pi frame get back okay why do you have Pi Buddha and PT metal but pdha doesn't use TT metal have dot C fucking P torch garbage right so where is any of the code that actually does anything in this thing no no no no no no it didn't work for me in 310 it never actually worked it just it just imported the import. py file basically I like that there's a script I like that there's a script that counts the lines this is very tiny grad inspired let's see let's see how man
y lines it is okay never mind um all right why don't we look at ttmi or should we try to claim a bounty do we want to actually try to use their no I'm not I'm not dealing with that that just sounds awful yeah I know I love EST Trace right uh let's look at this thing I have it here don't I no I don't we'll CL it okay uh Imports TTS SMI backend what is the P Luan what is that oh T Tor system interface Library oh of course there's a custom okay well this is useless um is this all just in like C and
there's no way I'm ever going to like be able to do anything with this how do I like talk to the the device where's like the code that talks to the device does this have code that talks to the device yeah dist dist is what I want dist okay good we got an egg all right let's uh pip three install this egg you can't just inst install an egg why not oh let's upgrade pip that sounds nice better pip maybe better pip can install an egg what is an egg why can't I install an egg canot find a version tha
t satisfies the requirement force and stall my egg that didn't work I didn't have high hopes for that is that disc always there python install egg file how do I do this how exactly does one install an egg file oh python setup you do this at the setup tools page oh easy install Perfect I love when things are easy okay I spell it right don't have easy install I don't know pick preinstall easy install all right no module named easy install you think easy install would be difficult I knew easy insta
ll was hard oh it's called setup tools setup tools set tools is already installed no module named easy install Easy install this is not easy is it a real egg zip archive data oh great we got dot c a whole lot of weird um I don't know like that might just work maybe don't really have high hopes for this maybe it worked um tt. device all right all right we's see what is supposed to be in ttb import ttb as TTL okay let's see uh get num PCI devices this is if this function Works we're no of course n
ot but we don't have a yaml of course how could though how could we possibly expect it to work without a TT metal yaml shit what the hell do we have that yo file somewhere oh I probably have to set like the those things I had to export here we go let's try TT export TT metal home equals home kofka T torent TT metal now let's see if we can do this okay good we've detected one PCI device okay okay okay we're making good progress now boys good progress um how do I install the egg let's install the
egg so we don't to do zzy anymore uh put three install the egg install the fucking egg easy install okay let's ask Labs o Claud 3 hiu how do I install an egg file in Python download the egg file open the terminal pip install the fucking egg oh not compatible 's a virtual environment Force install egg no binary no depths I don't know let's try what does no binary do bypassing yeah yeah yeah bypassing all right let's go no binary requires an option all right well you must give at least one require
ment to install I need a package name TT is it can cause more problems than it installs ignore installed okay you know what I have a better idea where is my python whatever this works um let's go to my metal here I know about ttnn uh s. t a pen to that shit import sis we need to also import Os os. inviron Oh make the text bigger for the noobs well you know noobs you get that size text okay so be happy um where did I make that finally do something [Music] here what did we have to do TT metal home
[Music] this all right let's import TT lib as TTL and let's print this okay good' be nice if that actually worked um I can make that work by oh wait it actually installed what the egg installed okay whatever so I don't even need that great okay good good good good all right we can get the number of devices now great we we're making progress um so all that stuff was actually TT lib but maybe TT lib is TT metal and ttnn is like right here and let's see what this actually is okay this all looks li
ke nothing how much code is just nothing definitely don't care about that oh here I have a so file I can just import that that's the one that doesn't work all right let's try to like talk to this device and like have it do something eager tensor oh see okay let's create a device I tried to do that before in ttnn did I not whatever yeah those things sound good glad that we're designed to support multiple devices this is a great all right good we have ai clock all right um labels torch to TT tenso
r oh it closes the device for you that's pretty nice let's see what's here tensor all right we can make a tensor let's make a tensor what's a p tensor oh it's like a torch I need a torch [Music] tensor really of course no definition found for tensor what's a TT layout oh tile sounds good tile is where I want to have my tensors TT D type py tensor reshape size I don't know what if I make a tensor Bob is that going to work no don't make your tensor [Laughter] Bob I fine I'll import fucking torch i
mport torch torch. brand 1010 dtype equals torch. bflat 16 okay that TT no that's a terrible name we'll call it torch uncore random uncore tensor yeah who likes my large variable name that's right it's a large variable name okay let's work no no no don't do that index is out of bangs for the rank should be between zero and zero however is fuck you very large um while size is less than four size insert at zero oh we have to put ones oh oh of course of course of course of course I'm I'm very sorry
you're torchi we need to put some ones in here all right all right now we can put it to a tile layout yep Noe that didn't work okay well that didn't work is that what this does object one insert it zero and it's putting the shits there okay well you know I don't know maybe that's an offensive size let's try 16 no one has ever been off ended by 16 256 is the least offensive size I'd say of all the sizes I'd say 256 is the least offensive okay well that has to be like a tile I don't know like tha
t TT constants tile height I don't know let's see can we get that no it doesn't have constants here we go tile height equals tile dim tile dim is 32 oh 32 of course okay okay okay okay we have a TT tensor now let's take a look at our TT tensor let's admire it can iuse python of course that does work reinstall I Python and I can tab complete please restate current task God could you live life like that we have a tensor did you live life always thinking about what you're trying to do this sounds s
o terrible I am just trying to eat an as ball and hopefully it's here it's not should we ask the phone where it is large folding phone where is my ASO uh oh look at that it's using very little power that's good what do I think about Devon I think it's a hype I think it's hype I think that hype fuckers should you know just die and stop hyping shit like can I use it can I use it or is it hype or did you release a video with blurry shit right you know all these companies you're going to build human
oid robots like it's so laughable it's like wow remember when we we're going to solve self-driving cars because self-driving cars were the easiest robotics problem everybody proceeds to fail at self-driving cars and then they're like oh humanoid robots you know you just like you just can't with I I don't like you know what be be happy that you don't understand okay if you don't understand like if you don't understand what's going on with AI and you don't just be happy be happy because if you did
understand you'd just be infuriated all the [Music] time Ohi I don't even know what jobs people are going to have in 18 months dude like I wish these people would put their money where their mouth is so I could take it all from them not because I want their money but because I want them not to have money right we'll burn it we'll burn it in a big cash fire my EV almost here so that's exciting okay um all right we have a tensor well let's look at some of these let's look at some docs let's read
some docs some of these docs actually looked kind of useful except I don't understand why I have TT lib and not ttnn but TT lib sounds better than ttn in anyway so um luin and TT firmware and why are there no TT here TT metal why why is why is TT metal in a completely different dog from the other thing not ttnn I don't here using TT lib TT lib operations are missing some of the features print L1 buffers can I can I just ttnn work no module name ttnn I thought I saw a ttn maybe this will just wor
k okay okay no it didn't just work why not oh no module named ttnn ttnn oh I see it just didn't do the here we have an egg now that's a tiny egg a big egg okay this is this is this okay we have to copy this to this should just kind of work Yola now we need logu okay that's not too bad I can install log no module named presistent persistent collections okay this just looks like something I can pip install and then we'll have working ttnn then we use the ttnn examples oh grab is that's a normal as
s Library good no module named TT eager oh now we have a problem thank you for gifting Subs uh okay well we have TT lib so whatever we can do with TT lib we can do I don't know why we don't have ttnn but ttnn looks crappy anyway so whatever converting two and from a torch tensor what were the cool things where they were telling me about like dram and shit like that I'm interested in anything that involves torch I'm not interested in I want to like like I want to like notal single core creating D
am buffers and circular buffers oh this is in C now create buffer where can I actually like use [Music] the maybe let's look at some of those things that that that example prints out no oh no pdha only works in Python 3 a of course got to remember that all right so we have Pi Bo a TT device create input device Q connector balancer whatever like I just want to buy pass all of this I want something the T chips actually very cool and I wish they were just exposed like they should stop with this Jim
stop I told you this like nobody wants any of this shit nobody wants a 50% torch like be proud of what the chip is and expose the chip for what it is the Chip's really awesome the chip has like all these tiny little risk fives and they can talk to each other but instead they put seven layers of neural network crap on top of it that nobody wants and it doesn't work this this was my advice to TP I'm like expose your chip for what it is this is the only way this is ever going to like get traction
you know explain why you're not cuz right now all you are is shitty ass Cuda it's like Cuda but it doesn't work oh yes that's what I want um it's like Cuda but it doesn't work like I even T Inn is is too can I get the docks for TT lib does TT lib have docks and is this the lowest level thing because okay the chip is really cool the chip is like a grid of risk five processors and they can all talk to each other yet I see nothing about this I see all this crap about like the stupidest thing you ca
n do today is try to build a large language model inference engine like that's the most competed in space and you're all just going to lose to Nvidia okay like if you try to do that stuff you're just going to lose to Nvidia if you like say look we built something different because this is different and this enables new capabilities you're never going to leave how about this same thing DD same thing DD with hip maybe hip maybe hip actually kind of works like maybe hip kind of works because the Nv
idia and the AMD chips are actually very similar um so you might be able to just shove your way through um I think what apples doing with mlx is cool like like they're taking pride in the things that they can do uniquely in metal whereas what this is is just a crappy version like why you here TT metalum is designed with the needs for non ML and ml use cases yet your whole thing is talking about language model crap language model crap that nobody's going to care about look the only way this compa
ny succeeds this chip I mean this car is fine I'll buy for $800 because I think it's I think it's fun but no no like real company's going to buy these right no real companies should if I was in a real company and they're like we're using 10 sorant cards I'm like I this is what happened with Dojo same thing right no no researcher wants to use this and as far as like price per dollar on inference you're not going to compete with other people on this um you have to be in this for 10 years right lik
e tiny Court in it for 10 years tiny grad will will still be relevant when large language models no longer are here you say it's designed for non-ml and ml use cases yet this whole thing goes on to talk about ttnn designed to be intuitive to a user that's familiar with pie torch bro Wayne Gretzky don't skate where the puck already was skate where the Puck's going and it's not going to fucking pie torch and everyone knows this ttnn is based on top of metalum all right so where are the docks from
medallium this can I use this no this is only in C don't make me code in C oh here here oh TT metalium here oh okay I missed this okay okay okay this is this is I I didn't the whole thing here was ttnn I didn't realize this was a separate thing fucking pyth man okay all right boys we got to leave python it's time to go to C no no no no we're going to come and see you know okay TT metal programming example loop back make make loot back back cop no okay um make programming examples loop back okay
okay let's try this make program examples loot back nothing to be done for program examples loot back um next sport GT metal home this export Arch equals gray skull I just had e or an A that's not a e that's a a the dude wants to give me a massage he's gay bu Maybe down here no such file or directory no such file or directory oh because this is TT metal no Dev Med map no in device did I spell gray wrong I did gray skull has an a oh here we go build programming example loop back okay oh no that's
not going to work because we have to we're just going to me add this to my bass shit all these people it's AMD too like they focus on all this high level stuff and you don't want to focus on the high level stuff you want to make your programming model unique and awesome uh such that people want to use it builds examples does it WR okay cool test passed all right let's see what that actually is programming examples are in TT metal metal programing examples loop back back okay well don't do slow
low dispatch mode okay all right we created a device with device ID zero I don't know why it's on a new line then we create a command queue and then we create a program um this is very open C shit I'm create a kernel Dam create random V Smurfy thank you for gifting Subs okay we write the buffer to [Music] Dam on time ARS what are KN coordinates is this stuff documented look at data movement C am I missing something that just kind of explains what this is like it just explains what the T torn thi
ng actually actually is what's a knock does it tell me what a knock is sounds like herban SPF U [Music] SP oh we can make a chat VIP only a single core okay this stuff is much more interesting than what I was looking at before let's read some actual carels let's read them M will Kel wow this just poor tiny grab like they should expose this API which is the most sane API I've seen from them and they should expose this API to python knock might Me Network on chip do we have anyone from T store in
here okay knock is Network on chap the ARs that we pass into the [Music] program physically independent to kns there's only two like where are my cores where does this shit actually execute this is the font I just say this is normal chromium create colel most specif the kernel and core range on which to execute core core oh so this is just running it on one core but there's lots of cores so if I read like the map mall multicore it's going to use multiple if I look at the create kernel all cor oh
I see where do I get all Cor split work to course what is this return core range set I see so how many cores are there 120 course [Music] our grid of course each core includes five risk five processors five Megs of SRAM got jump go right okay I wish that this was all exposed to python wait is it exposed to python is there some way I can like create a kernel and launch it in Python like P open SE okay um a gr skulls okay okay that makes sense 10 x 12 G of course now this is the documentation why
isn't this the front page of this that would be cool yes like this explains this makes me understand things so where are my knocks something we two knocks I don't see two knocks I just see I see 120 cors ah yes wait this is the best stuff why don't you have this on the front page of this what is a knock oh here knock is a twood directional by Det tourus the oh the KN connects all the cores okay I see uh we have pcie and then what's Arc uh oh I this is the network stuff okay but that's there coo
l so if I'm just using that core how do things get to that core this this is this is all okay this should be this should be linked right on the front page of metal come all right each core has two data movement kernels and a compute kernel knock async read from a buffer uh well I said he has five processors on the things at the basic data movement kernel oh reader binary okay all right so yeah yeah I think understanding the m m thing is good I think maybe I'll try to write my own M Mall outside
of this stuff and that'll be today's stream this doesn't have the ethernet so I don't know what Arc is what's Arc what is this excal draw thing oh cool um what what what is Ark I guess I thought it was ethernet but these things are ethernet all right I don't care about this stuffff don't care about this stuff Arc is just a CPU that is used for system management okay all right let's uh let's write some programs first of off I hate coding in these sort of environments let's create a clean um we'll
call it TT twitch call 10 stor twitch we'll call it twitch torent TT twitch keeping in the 10 for name all right uh let's see if we can get all let's see if we can get a little build environment up here okay now that now that T stor guys are here I'm more motivated to try to not be a general fuck like I usually am all right it is in C++ so we do need [Music] CC want to create the device know why we need to Define device ID let's just do that what do we need to include for that TT metal host API
well that doesn't exist so we'll figure out how to do that runsh love run. sh should bang in there clang main where is this host API thing TF metal did I do TF metal yeah I did TT metal right so just an eye on there all right we don't have common assert where is that okay we also have to do this all right we got to find format color third party format okay well um let's what is that I can use that same I'll use that same environment variable you guys do TT metal [Music] home just say TT metal h
ome equals that for now spend time building a good build environment and it will pay itself back 10 times over that D TT metal home that okay well that's great um lots of shit doesn't work let's go to loop back host API that should just work I think let's figure out what's not working I probably need to say it's like C++ 11 or something um enable a new C++ what is that stand 11 sounds good all right now we're back to Dev M map I've seen this one before and this is because of you have to tell it
it's gray skull yeah so that's in here um whatever I'm just gonna only for gray skull whatever I'm going to put a comment because only gray gold cards are shipping so this will work for everybody TTR Arch types I mean this is actually interesting I'm kind of curious what's in this Dev M map M maps are always very interesting you learn very interesting things from M Maps cool UMD what's UMD why do I need a UMD host me address map we need that too um what's UMD we got some rocky oh user mode drive
r okay great we'll include some stuff for the driver got to do that okay Dev messages all right we're getting somewhere we're getting somewhere maybe who knows TT metal hardware ink okay straight up Harbor don't need Grace ball okay um certain explicit cast to silence this issue I don't know C++ 14 no don't do that okay C+ plus 20 it's 20 non-constant expression cannot be narrowed from type shit to shit see if I can figure out what arguments they're compiling with here um got just be some way to
ignore this no no you don't want to like just make another example and reuse the existing build system you'll never learn anything if you do things like that okay if you just use what's handed to you you'll be using what's handed to you for the rest of your life see this is why I don't stream anymore because I'm sick of saying things that supposedly sound like things you know just it sucks okay I don't know where modu M K yeah that's what kind of what I'm looking at I'm I'm looking at that righ
t now it is this right [Music] god let's just fix that insert an explicit cast to silence this issue what okay great uh undefined reference to tit metal create device and that's because we're going to have to link to some shit okay uh R grab build this and then we're going to have to link to I don't know like device start there okay just need a little yaml yaml not find y yaml yam gra yammo yeah just orchestrated with kubernetes great how do I link to yaml y yaml cppp all right now we're back to
undefined create device okay it just looks like these so let's try L tenser LTT metal just LTT metal let's just try that maybe we don't even need device let's just try that we don't let's get rid of Y want to do the minimum you always want to do the minimum you don't want to start linking into the world everything needs yam we love yam what TS to it cool we built something all right um building throw a quick read me on here T torrent example on Twitch we'll doal with pushing later but uh yeah s
o that uh cannot load shared Library Li tt where's Library build lib there somebody to do this um whatever build lab all okay uh now we just have a problem with that okay environment variable TT metal home is not set X but I thought I sat that okay process tear down with Device still active Okay that's a lot better I think I might even be able to remove that Sim link now that I actually did export there I don't know why it wasn't okay good uh process tear down with Device still active okay we ca
n fix that uh we have to after we create the device we have to close the device let's pass oh come here you can't INE because you got to include the shit for aert okay great good progress good progress boys good progress all right let's create a command Q on the device and a program on the device of course now we have the core coord of 0 0 that's sounds good um yeah that's fine let's create a kernel let's just not be too specific about what type of Kernel it is program we're going to call it ker
nel. CPP need that core expression and I have no idea what that data movement shit is but we'll get there all right let's make a new file called kernel. [Music] CPP let's look at the simple kernel let's make a kernel that does nothing hopefully it's capable of finding the compiler and shit no suitable what too many arguments in function call where is this Auto completing from um why does this not think it has the right why does this have a different thing than that I don't understand like why wh
en I Mouse over this does it think that the types are in and shit oh maybe there's two of them and it's just stupid okay that's reasonable it's actually pretty reasonable okay I was take you all right cool cool cool cool um that's good all right did it compile my kernel let's put crap here compile my kernel it does not compile my Kel okay we probably need to compile or something create kernel Dam config create buffer we need buffers uh andq program let's go uh finish that's nice why is false not
good too many arguments and function call I see I don't understand why all these like type hinings are wrong they look right when I look at them now this blocking operation yeah that's good why don't I just block and don't have to finish whatever okay now no such file or directory Great so everything unfortunately is with respect to um the the metal thing so let's just do that uh now can you find the file TT metal m equals Dev uh no such file or directory oh I see so even though I put in an abs
olute path it's still doesn't work at least the errors are good all right good crap was not declared in the sculp good that's exactly what I wanted to say remove the crap does it run an empty kernel great it runs an empty kernel why is this so slow why does it take so long to turn on that's slow um okay let's take a look at atals and see what we can learn it's j compiling a curent that's cool uh yeah core range that looks more useful I don't like core cward oh that's cool I don't need to call cr
eate program create program is just nothing I can just do it like that it looks better whatever we'll call it create program I I don't trust C with it's like vampire assignment shit uh all right let's this is a lot of work to create a buffer why is it tiled what does this stuff mean is that in the docks it explain to me the tiles ined buffer config drram config why why is it tiled in Dam though oh it's not Ty I see it's just called it's just called that devices device size is why is the page siz
e the dam buffer size that's sketchy all right let's make our Matrix one or two four two for the B FL B 16 is two bytes I assume this is in bytes um buffer type equals Dr probably need to include some crap for that all that's a later problem um page size equals one one page size that's a good page size right how one uh we need some semic PR any after that uh oh let me throw some e on that bitch those some e there we go all right um use of undefined identifier shit uh which one could it be I don'
t know details TT metal is it in detail it is not where is this to find imple buffers buffer do I import from impul anywhere no find import from impulse imple buffer buffer got HP let's go doesn't work I gota like use a name space or something using Nam space love names space using names space TT oh maybe it was fine I just needed a nam space all right great love it got love that C++ garbage I'm tired I'm an old man all right all right we have dram config now all right let's see if we can create
a bar why do these not work hello hi I got you a green juice it's in the fridge you got me a what a green juice oh thank you welcome uh why does that work Banas um page size must be divisible by this shit oh well I knew my page size was Tiny all right let's make my page size a very normal that's a normal page size all right good we have normal P size all right uh let's call this buffer M1 dram let's first do an element wise addition on these two and then we have M2 dram and I hopefully I can re
use that you know what I want I want Shashi magic chicken and we're g to get some shansi magic chicken we're working hard today I really didn't want that as ball and I don't know why I bought it shansi magic chicken you guys have you guys ever seen me get shansi magic chicken before it's delicious a sauté spicy chicken with hand pull noodles okay so we got two drams we made some drams oh we can get some addresses all right cool um let's give them better names like Source One and Source two and b
y one and two I mean zero and one okay well I call him that Dam buffer it's a little explicit okay we'll be explicit this is C++ we should have long variable name let's put these addresses is there a way to get more debug information that'll like print about the allocations and stuff yo that's cool that's cool how they're just like wow I can like see what the numbers are that's that's cool I don't know what the first 82 of the DS used but that's pretty cool okay um and let's make a d buffer as w
ell I do like that new hardware is like this and they haven't managed to like off escate pointers like crazy and they just give you well okay Ram starts at zero and then you count up kind of love that all right uh oops tie bu I was like what M the same buffer free shit let go uh Colonel print what do we have DD print all I want wait I can print from kernels that's kind of cool does that work I ran wow look at that terrible looking macro de print what do you OD this is going to work it didn't wor
k um may I have to like de print ches oh I ran cool W that's actually cool go work 10 T that shit never works on gpus like I feel like it's supposed to to work but I'm like you know great so that's being printed by the colonel all right we made some buffers let's try to read from from the buffers let's see if we can read from the two buffers and then uh like have them add two numbers and we'll see if we can saturate the uh the bandwidth um okay well we're GNA have to somehow tell it that it's a
oh that looks shitty what even are those numbers can we why do I need L1 okay see have to create an L1 buffer [Music] to wow that's a lot of L1 okay it's all the same except let's just make my L1 that big um one buffer address wait like is this like real can I just like look at these KN coordinates and they all actually the same is a programming model like Cuda well one of the really cool things that you see right away is this core range like Cuda doesn't have anything like this Cuda will not le
t you put certain programs on certain cores I wish they did but they don't even though they it should be able to like the GPU in theory should be able to like I should be able to use my compute units independently but their schedulers are not uh are not great okay uh what oh why I put a b all right are these KN coordinates like real L want I hate can you put T Tor can you make this in Python some of us really hate coding in C you know the stream has to be done by the time it's you know dinner um
all right well they're all on one comma zero so I'm not going to waste a lot of resources passing those in all right we're going to have to pass in some buffers let's put in less runtime ARS you have to make it generic what if someone's running on Wormhole 72 Edition blah blah blah blah this is also boring you know what I need I need co-pilot this is actually the kind of shit that it would help with one of those crappy AI coding assistants you know normally I hate those things because I don't c
ode in such verose languages but this is insane who codes in C python coming up I sweet yeah this should be in Python that'd be sweet um right let's set the runtime ARS fucking AP space Dam copy kernel ID we just called it kernel ID program kernel ID core why do I have to pass the core to runtime ARs and create kernel I guess fine whatever um why those green back to the s okay L1 buffer yeah all right um colel bu address where's time to type all this all right great we got the arguments wouldn't
it be cool if they could be like passed in as like actual arguments no that's too much to ask for all right not wasting my time with that we know that it's just one comma Zer but George one time it's not going to be one comma Z and you're going to be upset yeah I'm aware great for both garbage all right knock I'm actually kind of curious can I print that see what this actually is Minecraft tutorial that's right all right well that'd be cool if it had like some like spaces can it print it hex or
does it only print in numbers numbers are sad direct printing is supported Ford printing includes macros hacks all right let's get some hacks all right well once you get the knock address oh I see it just like shifted some shit all right great well I'm glad we did that all right we have to get these ones from the knock let's get the D Dam knock from desk d and then I don't know that it's pretty uh we're going to read it into what that's the parameter that's the one that gets it this is D and th
is is Source who wrote this why would you do that did I do it backwards you're reading from this into that where is this defined Source desk size oh oh why would you do that why would anyone write the T me copy D Source size not Source Des death dram knock knock a sync right there all right let's go this work cool let see if it actually does something um I actually want to read these both uh I do this we're going to have to give it double the L1 let's let's do pointer arithmetic um close that I
what is Page size and why does it matter Point arithmetic o can I put the re barrier at the end of both of them oh that'd be based go no I plused I plused it I did plus and I made that the size wait no I just broke something wait I come how come I can't plus it like that I made the size that now I broke it ttmi D tr0 what's TR is r uh now it's not allocating anymore we broke it but I understand why it doesn't work though I allocated that I mean I guess I didn't assert to see if the allocation fa
iled what do you mean no I did that no it's it's always just one and zero it's never it's never not one zero that's not the problem um why what's where is this hanging now boys I broke the 10 stor card I should reset the board I am resetting the board I broke it uh oh it's getting hot what do you mean I am constructing a knock address from that that's what these zeros and ones are it works fine at least this computer reboots fast hopefully oh that's really power Cy we really power cycle I if reo
's going to work but I don't understand why my plus 100 plus a th000 didn't work all right we're hard par cycling the computer it is blasphemy in the ml world you're right you only have two knocks it's not like it's going to come up with a lot of different things I'm not passing in all those arguments that one involves so much typing which cases is a CS degree useful it's useful for getting nonsubscribers bannned from okay all right we're back T torent TT twitch okay let's not break the card thi
s time ttmi can I print L1 address yeah I can reconnect you why isn't this working what for ff00 and I'll put the size back to that still ff0000 I'll put the page size up I don't really know what the page size does oh now we got something different oh no fe0 oh I see so is that like size and by size of unit being inter leave for non oh don't inter leave your buffers oh good I knew I wasn't supposed to inter leave my buffers wait but that should be okay with e 0 right that's fine it was just my p
age size that was wrong because they were like supposedly interleaved or something that should be fine right yeah Yola it's fine it's got to be fine because that's FF and that's yeah okay any L1 address should be larger than that seems okay right yeah we're good no no we're good we good we're good I know it wrapped around but it was just good I didn't set my page size correctly but now that I set my page size correctly let's just call this progress I'm going to create a quick uh GitHub new repo
for this name TT twitch T Tor colel from twitch public great care repository oh remote denied there you go anyone wants to play what do you even make a PR directly that works fine okay um this is all well and good but now I want to like let's say I want to like add things [Music] so that's stupid let's look at at some demos so this is fine for like data movement and stuff but it doesn't show me how to actually Access Data let's [Music] here here we go element wise what's [Music] SFU Sundown flas
h programming utility oh never heard of that Acron before yeah yeah yeah I'm doing an ad that's what that's what I'm doing um all right let's look at the examples in TT metal uh not that not that that's one is binary okay two wow this looks complicated um using circular buffers what's a circular buffer a fifo in between kernels oh I see so they have like a b risk and C- risk something else you I see so it's got a reader and a writer all right let's just like not do it like that let's just uh let
's start by just doing it in this kernel here okay so how do I like can I just like read L can I like dreference L1 no I don't need two kernels I'm doing one kernel I just read l so I can like dfference L1 so if I just were to do something like sick [Music] um I don't know I designed this for just do these I don't want to think about d right now so yeah the math engine I don't use the math engine that sounds complicated I'm using the plus we'll get to the math engine later uh well invalid conver
sion invalid types for a subscript you sorry what no I don't you saying oh oops no sorry right I just forgot did okay fine uh oh that's probably that F0 oh it stays in HEX mode whatever Let's uh let's copy in some A's you know what I'd love to do right now listen to some copyrighted music some of my favorite things to be on the stream oo a play that's my favorite place to eat crinkly candies right buffer all right well let's uh buff input V Vector VOR yeah you want to see some Elite C++ boys C++
is so fast that you can just do things like this input V Dot what is it push back God how does code in this language let's see if it says D AA no matching call have to be that DM's big buff page size buffer Pages must fit within the command Q Data Center 400 buffer. page size less than I knew this page size was a scam oh let's make them small is that gonna work sweet oh look at that D AA yeah yeah all right I don't know let's do this math in 32 bits let's really let's really Flex this processor
let's Flex on him let's Flex on this let's goy is it big Indian or little Indian that a a bbcc all right we ready to do math let's do math in Cuda this would be very fast so let's try one + one then after we're done we're going to read the buffer how do I read the buffer does this work and you read buffer I got do is that going to be the whole buffer that's disgusting all right let's not add one and one let's add 13 and seven let's give it something hard all [Music] right that crap oh fuck C++
death dram buffer doesn't fit an output Dr buffer we don't have that we have a desk Dr buffer reading there all right I knew it couldn't do I knew it couldn't do a oh no we just for got to send me call that's fine um 21 what's 13 + 7 guys no no it's [Laughter] 20 I did put in 13 and seven right it's cheating at blackjack no what did I do how did I even do that I put 13 and seven I put them in there and then I got the result and I added them in the kernel um oh oh no no no sorry sorry sorry my ba
d I still don't know why that's 21 oh it's hex all right never mind all right we made multiple mistakes and they almost canceled out the one a check some 20 all right look at that look at that they're using tens Tor to add numbers I knew it could add numbers guys H all right let's figure out how to use the math engine to add numbers fast where's shansi Magic chicken [Music] chicken used Devon and he said 21 guys it's coded by it's coded by 10 IMO gold medalists I'm sure they know what what 13 pl
us 7 is for okay now let's read about the math engine my magic chicken I forgot how far away it was but it wasn't here I do know that 12 minutes until chicken time those are Uber minutes so it's actually probably going to be more like all right you sold you too can buy a 10 torrent and add 137 right can I make it big but make a page size small intered oh no don't do that segmentation fault don't oh it might be the allocation that succeeds actually but then it tries to copy from more of the VC th
an fits that's actually probably right yeah that's probably the problem okay that's fine math engine works on 32x 32 tiles so we need to BU float tiles oh God why can't I just use Cuda P would just make this code fast no wait a second don't I have lots of cores what if I use lots of cores to do this does that fix the problem all right let's read about the math engine so first we're going to need circular buffers um this is for communication this looks hard oh cool I can just make up whatever add
ress I want I love that um we can control the math Fidelity compute config okay passing in some configur ready oh data movement config all right so let's read this whole thing how does it know that this one's a I see so you have data movement configs and you have why are these on different processors what's a b risk and what's an NC risk what do you mean it doesn't matter are they the same do I have to put one on one and one on the other basically yes okay great um I see so this program's creati
ng kernels to I only need one data moving kernel why does this one have two it's optimized what if I have no data movement kernels and what if I just put the compute in the data okay I see what's this let's read this low LEL CS oh okay right these are good wait kernel apis oh okay okay this is understandable circular buffer a ring why isn't it called a ring buffer compute Kernels have access to the math engine but they don't have access to ramp and it just depends on and it knows what kind of Ke
rnel it is based on what like config I pass in here I see okay data movement config compute config ethernet config I say but like I can still like both cores have the same yeah yeah no I get that um but like so each core has [Music] a compute and data transfer kernel I see oh I see so actually on each core I have two risk processors and I can put the data on I see I understand so like I put the data on uh on one of them or the other one but it's they're all still on core 0 comma 0 and I actually
have 120 of these cores am I understanding this right each core has five all right all right cool what are the other ones stupid thing with the commas to run data kernels three cores run the compute kernel I say place my content loading scene let slow it wait so can I create three compute kernels or is it the same compute kernel that just runs on the three cores automatically three risks five one compute automatically wait where's my picture I don't get a picture SRAM DM kernel risk 5 circular
buffer goes to compute okay I understand this yeah this is fine and so basically these five I don't exactly understand why the compute is three separate processors uh maybe I'll try in cognito it'll work just St missing dou a w at the end oh I see cool where you can find that W oh I see I didn't okay never mind my bad uh okay so this is knock zero and knock one um the computer kernels are automatically multiple for some reason seems reasonable I guess I see so these are just yeah they're just ci
rcular buffers inside the thing I mean this is this is yeah this is this is a lot smarter uh you see why this is better than gpus um gpus use the same stupid uh instruction stream for the uh like this is it's so annoying in time that the like it uses the same basic ALU to compute all the data addresses yeah this is what you want oh cool very understandable yeah look there's a lot of things on here that are better than gpus it's just you know underpowered right now uh well actually can I can I us
e can I are both data movement current like can I use them both to read because that's what you want to do most of the time yeah okay good like most most things you're doing are basically reading from like weights and that yeah I mean I'm curious how this stuff's implemented so like can I read the implementation of uh like I want to read the implementation of like no a and greed where is that written where's the code for that well in data flow a oh here look at knock async read one packet oh oka
y here we go cool oh great oh I'm glad this stuff's open source good you you'll you you'll do far better if this kind of stuff's open source cuz it really it really bothers me when it just like links to some stupid thing because for some reason they don't want to show off their like their mm uh and yeah all right cool so let's just Pok some M that's very [Music] reasonable uh yes D read any L is just some rapper yeah that's cool now how flexible is it uh just just just linear I guess if I have a
whole processor it's not that big of a deal yeah there's some things here that's way ahead of gpus all right these are the compute apis so let's find those uh Hardware Inc see that folder okay so it's custom for gray scale you can just leave data shared in L1 between kernels I don't know if all that um what's this compute stuff all tiles in it llk what's an llk I don't know if looks compli and this much more complicated than the dma engine gray skull metal llk API wow this looks complicated why
did that link me a wormhole end up in wormhole the wrong one the wrong one so is there any stuff about how like this is running on three risk processors what's the actual code Cod that's that's being broken down to the uh no yeah no I I know this this is not obviously but uh no I'm I'm curious how it actually talks to like the three I'm interested in the three compute engines because this is something I don't understand like there's three of them are they running the same code task splitting th
is like a thing I can Google somewhere I never heard of that but like that doesn't really make sense to me um they're running the same like instruction code like so I I see that I can dispatch three kernels to the same core but these three are just like some of them only listen to some are there any docs on this or there's there's three different three different programs B level kernels Bel is a simple example of okay sfpi is a programming interface to the SFU consist of a C++ W around blah blah
blah blah to generate SFU instructions okay so it's an EXT it's a it's an extended instruction set which is fine I can see that here it's not done with mm which I guess would kind of be unreasonable for compute um T risk no I I mean I get this think all right you have a different kind of map engine for both of them that's fine UI know but can I dump the can I dump the programs I guess this isn't really compiling three programs I guess I need to compile a compute kernel if I want to if I want to
see what's going on like I'm trying to understand this code is identical for the three I don't know uh maybe it's magic chicken time and then we're going to figure out how to compile a compute kernel we're going to figure out how to dump the risk code it's not magic chicken time uh no this is this is making me quite bullish on 10 store guys like um this file okay oh I'm wearing sweatpants that's the kind of morning it is I woke up early for this stream boys no okay let's write a look at Matt Ma
ll tiles oh I see okay there's ah unpack pack okay okay no no no I think you're okay I think I know what I'm looking for so it's the UN yeah yeah yeah I get it I get it I get it okay yeah okay good good good totally understand totally understand what's going on yeah so it actually is just compiling three separate binaries so there's five binaries pack uh math so I guess they share registers oh we can get into the levels of this and it' be fine but three-way two-way dispatch I mean this gets into
CPU stuff I don't know but they pushed my chicken DM kernels are one to one okay um comp kernels 1 to three okay yeah I understand cool uh so let me just read that bind minity I guess the last thing I'm not really understanding is how they're being dispatched but this should be pretty easy to just understand okay so here we create the kernel we create this is a good example let me just build this example um let spin up my whole thing why that requires okay so we create three kernels uh a reader
a [Music] writer because we're creating a new program you must redeclare all the circular buffers don't exactly understand oh I guess circular buffers are created for some reason on the program there were good docks somewhere where did those good docs go create a circuit buer an element memory for all cores within core ranges inclusive and adds it to the program be a total of two circular buffers per core if a config is specified by the circular buffer address space is shared how does that work
yeah okay so create kernel set your runtime Mars on the Kel oh oh I see okay so a program can have multiple kernels bound to it based on okay okay okay this is sensible I I Now understand sorry for saying that I was making fun of the open CL the open CL API has a concept of programs and kernels and it's the stupidest concept like I don't know who wrote that in the API but it's nonsensical there uh I thought this was just copying that but no this actually makes a lot of sense okay you create a p
rogram program is multiple kernels create my three kernels bind them all to the program write stuff to the thing all right I think that actually finished see if it works um okay great yeah progr have 120 different kernels I understand that no I mean that's a huge like gpus basically have the same thing so like if you look at like the 79 100 XTX they have these things called compute units but the scheduler is a global scheduler and it's it's annoying Nvidia is pushing away from this now with some
of the h100 stuff I haven't played with it that much but yeah I mean what you really want to do is you want to have kernels that are like dedicated to different things you want to have some Colonels handle memory some Colonels handle uh handle math um it's it's rare that you want them to be the same but uh so this is yeah this is this is interesting that it actually just sticks all the cores on one and I knew this this was presented to me a while ago but it's cool to actually see it okay unfort
unately this API is hell of a Bose I could WR such a sick python API for this oh like imagine a sick python API where it's just like you pass it in three strings and it just gives you the uh the program where's my chick said it was here said it was close magic chicken magic chicken recently tried to contact me go outside I'll figure it out oh I hear I hear steps check num tiles one what are tiles I guess we'll know if we look at this program tiles so this knocka and knock Y is this always one an
d zero like how much can those vary like what are those are there only two knocks I think there were two knocks chicken guys are ready to experience Shi magic chicken have eight banks on GS oh I understand yeah can you see the chicken it's chicken even time you can't see the chicken oh the chicken I love this chicken [Music] so it comes with these big flat like hand pull you know those like hand cut noodles kind of where the guy does it with like a knife and like Cuts them apart all right if eat
ing offends you and this is in Japan so I can stick my Chopsticks in like that oo GS sock or gray skull I assume oh when you say GS you mean gray skull okay um worker cores eight D Ram oh I get it that's the one comma Z it's right there one comma Z is right there that's why we got one comma Zer oh that makes a lot of sense um cool so I see I have to do the [Music] addresses I understand too it's probably like I mean the Rams are actually each like it's eight different Rams like completely oh is
that what interleaving is I mean cuz really what you want to do to like maximize bandwidth is you don't want to put the whole buffer in one thing you want to share your buffer across uh all eight do the pages okay that's sensible um I would not expect there to be a read barrier there oh I gu you have do that okay never mind file issue for one of your requests my my beautiful python API oh yeah yeah um it's just how man copy is okay pack math unpack well it's so explicit about data movement One C
ore to do math four cores to move data really want to zoom out here get the big chicken picture here are we liking the chicken eating part of the stream we'll even let non Subs talk during the chicken part controver on replacing no one's getting replaced by AI man where's the um Marvin Minsky 1970 article replace AI um a here here we go yes Marvin Minsky can I find the actual can I find the actual Life magazine article no just like an article I saw it I want the full article the full article's h
ilarious to read he was not misquoted I've read the whole article oh here we go can I see that page the internet has too many ads we got a slightly longer quote from the article ah here we go this is from Life magazine in 1970 so we're talking 55 years ago Marvin Minsky of mit's project Matt recently told me in from 3 to eight years 1973 to 1978 we will have a machine with the general intelligence of an average human being I mean a machine that will be able to read Shakespeare grease a car play
office politics tell a joke have a fight at that point the machine will begin to educate itself with fantastic speed in a few months it will be at a genius level and a few months after that its powers will be incalculable 1970 okay so for every little fuck out there who tells you but I saw Chad GPT CH GP I think that all jobs are going to be replaced in 3 years they're not and Minsky was friends with Epstein I don't know look I'm not going to hate on Marvin Minsky a lot of people have thought th
is kind of stuff but and also in 1970 you were more EXC it was more excusable for you to believe it like if the year was 19 7 you didn't know better this year you should know better AI is cool computers are cool computers can do a lot of stuff computers are going to be able to do more stuff all jobs are not going to be replaced in a few years I mean it's like dor shit it is it is and then then the worst part is like Marvin Minsky at least was not a Doomer he was an optimist about this stuff it k
ind of would be cool if that was going to happen but it's not what about driving I mean you've seen the progress there's some progress it's okay I'm working on it ches is working on it when can we replace te Bros I mean do you got if you consider me a tech bro never um like people like me are actually probably the last people to be replaced except maybe for there'll be some guys who want hookers that are genuine humans and there's some people who want like a genuine human to suffer and those job
s will never be replaced people who want the real thing not just a fact simile no no no no no you don't understand it's not about my enjoyment it's about their suffering um you wanted to add a girlfriend you can have one today there's not very good I mean for some guys it's probably better than what they can get not for most yeah Rita Casino Odyssey you trying to open the door would you like to come in do you want some chansi magic chicken on stream oh I hav't tonight I should all the magic chic
k I had as ball this morning too myice what do I think of be's chip I think Quantum stuff has a long way to go to be useful there been a lot of quantum startups and what you end up finding usually is that they can solve the problem but the problem isn't exactly what you wanted it to begin with um d-wave I don't know I mean you know I'm bearish on Quantum in general I think that P equals bqp um so I think that anything you could do with a quantum computer fast you can do with a classical computer
fast you just need better software is this Hardware similar to uh Gro Gro looks a lot more like a TPU um no this is something entirely different and one cool thing about Tor is like they're thinking about the future beyond the current training Paradigm um I'm not I'm not bullish on grock um again if grock had a card that I could buy for $800 and an open source API I might be um but they don't they're trying this stup stupid like Enterprise sale can I buy a Gro card can I buy a Gro card without
contact us and is any other stuff open source get in touch no oh Bon Master okay okay hang on $20,000 yeah I mean they do have one but it's $20,000 also is there code open source what do I get if I buy this thing the official python oh God no this is this is like opening eye level shit okay okay here we go grock flow is the easiest way to get started automated compiler what's in here do they document their chip anywhere I'll take back my hate for grock when they document their chip and they char
ge a reasonable price for it but until then and well I mean Tor is the newer Paradigm so we know that the brain doesn't do a backup like the brain um regardless of what scale you want to do it at like you're going to have to break things down to smaller uh work units Nvidia has pushed so hard on having a full fabric memory like Nvidia has 900 gigabyte per second across a data center um this is insane I mean it's incredible engineering but it's not the future of neural networks like Nvidia has go
ne so much out of their way to make it easy for the software guys and this is why they're crushing it right now oh thought you've heard about Tha complex 7 oh we're going to build fall complex 7 um AI Hardware companies are not as much of a scam as other Industries I mean some of them are again you want to figure out if something's a scam or not can you buy anything from them if you can't buy anything from them it's probably a scam or too early might be too early but probably a scam contact us s
cam all they say when they're doing contact us is we know that we're not competitive in the market but let our sales gu try to talk you into it anyway so can buy this no I know you can buy yeah I mean you can buy Grace now that's the only reason I'm talking about it wait why is it still that oh that's just a bug we should fix that link no the only reason that I'm here doing this is because Grace Gull is openly available on the market I do not care about anything that people other people can't pa
rticipate in I paid my own I paid my own 800 bucks for it um yeah so Wormhole didn't look that much better wait what like it's still 12 nanometer uh it is cool that it has the uh the link I that seems to be the big advantage of Wormhole um I mean I don't know if I were uh if I were t t and I'd switch to Tiny gr um and make the simplest most beautiful API to expose like what I was playing with today there's so much unnecessary complexity um built on top of like what really a pretty simple compute
Paradigm because it's just unsustainable to do it a different way like if you don't have something I mean you don't have to necessarily use tiny graad but if you don't have something that looks like tiny grab you're never going to be able to keep up with the state-ofthe-art machine learning because the problem is like Nvidia doesn't have to keep up Nvidia by being the state-of-the-art can just um can just everyone ports to Nvidia anyway everyone designs for NVIDIA AMD has a um has an interestin
g follower strategy where their card is similar enough to Nvidia that people will choose it just based on price um the Mi 300X I saw Microsoft got them for 10 grand like you're paying it's it's half off an h100 basically more than half off um and at that sort of price break people will tolerate some amount of annoyance in the software but not uh not much um but only because the the two gpus the the Mi 300X and the h100 are so similar that um that AMD can uh can be competitive um if you're trying
a different Paradigm thing you are never going to be able to build something that looks like Buddha um because you won't keep up with Stud AO machine learning like you need something that transpiles to like a much sort of um better intermediary I don't know what neuromorphic Computing is and I don't think it has any meaning I think that tens torrent looks a whole lot more like a neocortex I mean like Gro and gpus have taken a completely this does look kind of like a GPU um it's a GPU with a bet
ter scheder and better control but like gpus are going to we're going to kind of see merg the h100 already has a lot of these sort of like tensor looking features um like look up the tensor uh it's a TMA it's a tensor math engine it's like a fancy uh it's a fancy dma engine but it it kind of can replace the load kernel but you could even have like a straight up load kernel um like like what tor's doing with the risk five that totally works too uh it might even be better um also having like a grd
grid architecture like this you're going to need a grid architecture at some scale so why not start that scale smaller um you know we're uh we're pretty booked for the rest of the year but after the year is up if you'd like to contract uh the tiny Corp to build a uh a cleaner more functional Port than yours um we can do it I'm interested in building some of this stuff generically anyway I'm interested in building um like like so so here's kind of what I'm saying the the thing you call the graph
compiler should be generic uh I mean a port that runs tiny grad on uh on on this uh on on gr skull in normal um and I think I can beat most of your most of your examples I mean like it's not only could I beat the examples like you saw me try to use Buddha and you saw me try to do something that like looked obvious and it didn't work uh but it's not even just that the the problem comes down to there'll be new Ops added in deep learning people think like attention is this oh we're going to have a
ttention forever we're not in 5 years people are going to be like oh yeah attention that was cool that was like like early like early 20s AI um well now again I'm an overpaid contractor and actually what I would rather you do instead of paying tiny Corp what I would rather you do is Port it yourself to Tiny c um focus on building a beautiful python abstraction for as low level as you could possibly get uh I think your kernels like I I can see the python abstraction already look I'm even too I'm
too lazy right now um where where to plug in you want to plug in at that layer that I'm playing with now um You don't need like you don't have to go to the mm layer you don't need to make a risk five compiler don't do any of that stuff but um you're going to want to it's not so it's not it's not um yeah have tiny grad emit the kernels you're going to need to make some changes to the linear so right now the linearizer emits one UOP stream and this one UOP stream includes loads compute and stores
all in the same stream but it it would be like a 10line change to emit the loads computes and stores to three different kernels uh and then you're going to want to write a runtime for Tor and this runtime for t torant uh should wrap your C++ API so if you have a stable C++ API um you should just wrap that with python such that it's like a simple like here's my kernel here's my source code um yeah and then like the things should map pretty well like you can definitely deal with like Dam and L1 li
ke it's that all that all works in like the shapes yeah use tiny got to emit the C plus cels directly absolutely um again it's it's not that different from what you already have you'll have to write a renderer you'll have to make a change to linearizer to split the UOP stream for loads in like loads and you use like those circular buffers and the cues yeah I know you don't admit the kernels and I think that this is like this is exactly what I'm talking about will never scale if you're prewriting
the kernels like you'll never be competitive with uh no I understand I I I I read briefly what Buddha was but if if you're doing that you're never going to be competitive with Nvidia um because there's always going to be new Ops and you're always going to be behind versus like you tiny R can emit the kernels can emit the kernels for you um again for less money in a contract I can get a functional tiny grad Port there's like a functional tiny grad Port then there's a fast tiny gr P um you should
be able to have something functional and then the minute you get tiny grad functional all of our examples work including train like we now have we now have resnet uh we now have resonet 50 training on a tiny box we'll have all of ml perf by the end of the year but yeah no no admit the kernels definitely admit the kernels like that layer I'm playing with is the right life um no I'm not using oh I like for perf what you want to get working no no see that's the joke right so don't optimize your ke
rnel generation figure out how to parameterize your kernel generation and then use search so this is one of my favorite documents the bit lesson like read this document stop trying to hand optimize things and figure out how to parameterize them and use machines to search yeah so tiny gr has a has an option called beam and this is how we are we're not beathing torch we're better than torch on on M1 and on AMD um but only if you do a big beam search um which will just search you know a thousand pe
rmutations of the kernel that are all correct and just figure out which one's the fastest computer go bur there we go um also don't prematurely optimize I mean this is just classic like it's not like your Wormhole chip is ever going to be your gra chip is ever going to be fast no no no no no I I don't want so here I can show you in tiny grad how the uh yeah if you have like a like a mat Mall kernel thing the tiny grad uh Cen can deal with that if you have something that looks like a like a Wham
so here are the tensor cores like the tensor cores are all just specified right here um so like I think you have like 32x 32 by 32s whatever you have you can just specify it here yeah it's it's really easy like this we have working the I we have all three of the tensor cores working the AMD ones the Nvidia ones and the meta ones [Music] um yeah okay so you want to turn the the chip into a systolic array I wouldn't worry about that I I see like in each one of your cores you have something that lo
oks like a that looks like a what you're talking about there is not what you're talking about there is is like what's called uh like locals in a GPU so gpus have like like warps but I think each one of your cores is an entire warp cuz your cores look more like a compute unit on a GPU so your core is a warp and then you have locals on top of that and your locals can have a grid and yeah sure you can just like don't worry about the idea of like we're making this a systolic array for multiplication
worry about saying if this core is trying to load data that this core already has just fetch it there right you you write that primitive and then that primitive scales to the entire chip instead of trying to like plan the entire chip um everything in tiny gr is 100% static meaning [Music] um meaning you can pre-compute all the uh memory accesses we're done with chicken [Music] now okay I think we're going to put chicken in the fridge we're going to get a little bit of coffee and then [Music] uh
should we write should we start writing the runtime like it'll be beautiful all right well we'll write a little bit of runtime we we'll get some Colonels working from python um and then we'll call it for a stream thank you thank you we'll see if C types can generate C++ garbage let me take a let me take a 5 minute break um I'll be back e e e e for e e e e e e e e e e e oh hey uh what I didn't really say uh about the systolic aray thing was don't worry about speed worry about correctness um what
I found so now timey gra is speed competitive on uh apple and AMD compared to torch uh the speed of tiny grad at the beginning was completely non-existent we focused on correctness and I think I think you can always get speed later just by like adding tweaks to make things more correct again there there's nothing in tiny grad that's fundamentally slow it's incredibly flexible and can generate I mean it generates all the kernels right like if you write like a if you lock yourself into like a m m
kernel that's slow sure it'll never be fast but there's nothing fundamental in tiny guy that's slow um and we can always add you know after careful consideration new abstractions if you need them but I actually don't think you do uh so let's look a little bit at the what I think the right layer of abstraction is for the card what you like detecting a thumbs up what the hell is this is AI do you guys see a thumbs up like you're detecting what what is this some like AI detecting a thumbs up that'
s creepy is creepy all right um okay so uh let's take a look at what the it's a Mac camera feature weird uh let's take a look at what the uh what the tiny grad abstraction is so we have a compiler a program and an allocator and a device so we're going to like this is by the way this is the whole abstraction like you can read this this is all the code you need for metal um in tiny gr so what would it take to transform uh 10 torrent to be uh to be to be similar so let's write example.py so we're g
oing to have something like uh just WR out a compile function so our compiler is going to take in Source risk zero um let's let's try to reimplement the element wise thing but in Python and in like a readable way uh let's Implement Alec so yeah there's three kernels there's risk zero uh risk one let's also get that magic TT metal home uh so we have Source risk one by did I put mine on risk one or did I put mine on uh on uh that's my kernel where's my main I put mine on risk zero so I guess it do
esn't really matter uh is the ISA documented sort of I mean all the all the the stuff you maybe the firmware is not open source but all the stuff you basically want to see is open source and the the schedule is explicit so like we have like Source risk one actually let's not call them risk let's call them data uh Source compute um also does this one allocate any L1 so I found it weird kind of that it could allocate L1 you don't really want to do that you want to um like use the circular buffers
I think I I think that yeah sure maybe you can like use the L1 for longstanding kernel stuff and keep it allocated I guess that's the point of it but I don't know why you want to do that yeah I can keep chairs I mean some day Shard and your tensor across the whole thing you keep them resident in L one yeah that's cool um I see uh so yeah you sharded across uh all the cores I guess what I don't understand about that is when I Al this L1 is this on every core on every core okay so this size is act
ually times 120 well times inter leave but where do I specify inter sh how is it shed on core range I create the buffer long before I specify a core range interleaved the D is interleaved too and then also another idea I have was so all of these kernels uh unless there's some reason you can't do this all of these kernels pass in the address the source the knock X and then call this get knock address but why don't I just do this in yeah why don't I just do this in why don't I do that in C++ and t
hen pass the real address in here instead of passing I guess do the ARs have to be you and 2s why don't I just pass in the u64 knock address and then like compute that beforehand and not at uh not at not a compile time okay so circular buffers very notably apply to the program so you'd want to specify your circular buffer stuff here uh at at compile time I don't know processor knock ever yeah um compute inputs data flowing out compute outputs all right I mean I could like I'm okay also with bein
g like opinionated in the AP and doing something like that and then okay so a a allocator and Tiny grad has it also has copy in and copy out notice my use of Destin [Music] source so these are Opa pointers remember that one where was just specifying like addresses for the circular buffers oh it has to be a vector 32 uh don't need to specify any address for CBS what's a CB oh I see uh yeah yeah there there were other things that uh there was one other example I saw that that maybe it's this one t
hat no I don't know where it went there were ones that just poke put an address in for the circular buffer but yeah it didn't make sense um create a circular buffer within the core so I guess I'm more confused about that but um this how Colonels can use inly buffers just via address I don't know about that we're going to have something like compute options you see that being P have a lot here config how's it do okay yeah I mean the this might be be like a reason we doing this stream um yeah I me
an these abstractions are quite good like if you just this isn't even like the metal one is pretty complicated I think like the python one is pretty minimal uh this is like a compiler an allocator and device that a one super tiny just so much crap in C++ start with something simple I can do right buer let's figure out what library actually contains this okay TT metal where build it so we have it tiny graag called autogen stubs we'll see how well it's going to work for for m every time I do this
I always have to fix bugs in this thing for for [Music] for for all right let's take a look at this host API well hang on can I access this stuff is this accessible through uh through TT lib i t to pay the ass to install anyway though there python guys for some of these how many like can I rewrite this example in Python is there anything missing I like you really want your python to be autogenerated from your C and not compiled and then recompile ttnn no way I'm out all right ban anyone who ment
ions Devin okay I don't know how to do that but if we have any mods do that I can't believe you guys fall for that you guys I'm gonna I'm gonna I'm going to make I'm going to make a I'm going to make a AI called Karen okay and you know what Karen does she goes to the DMV and stands in front of you in line and just complains about stuff and you're like Karen we have things to do and then she screams at you and you're going to be like wow that's amazing ai's going to replace so many people's jobs
right like don't you want somebody to like you know when you're like you're hanging out with your friends you're like bro it's fucking retarded and Karen you can't say retarded there's actual people with mental retardation are you're upsetting right like we can make AI do that think about that think about that right Devin is only for VC see thing about VCS and the thing about like you ask yourself like why is the Deep state so bad right now like why is the Deep state so much deep stating and it'
s because they're losing their grasp on Power and they know it and so are VCS who respects VCS seen Ilia I love those I love those Ilia milk [Laughter] cartons you know poor guy I don't know like this is what happens when you you know get involved with with utilitarians man utilitarians eventually turn against you that's that's the only way utilitarian ends eventually you're you think you're friends with the solarian but eventually they will decide there is more utility and you ding oh man who t
riggered me you triggered me with Devin oh my God and like you're giving them press like you know he's not even a bad guy like like I don't even like I saw the people they didn't even look like bad people but like it's it's the stupidest hype I've ever seen ask yourself would you be impressed if a human did it right like like like replace Devin with Kumar okay and his name's Kumar and you know not to be stereotypical but he cheated his way through everything and he managed to solve 13.6% of issu
es on GitHub right like like no no just stop like replace AI with people and if you're not impressed when of people does it why should you be impressed by AI does it you want impressive AI Alpha go like that shit's impressive man beats everybody to go that's cool a software engineer that solves imagine we have a car that you know only doesn't crash 13.6% of the time right like oh my God no like like guys we just have to exclude VCS from polite Society okay what has ever come that was VC funded t
hat was good Uber was good yeah Uber would have existed with or without VC fun and the version without VC funding would have been better abbn was good no it wasn't Craigslist was better what did Founders fund from teal make what do any of these things make Founders from teal made like paler and Ander things that don't help you though you know it is good to know that pal milwauke is on our side and it's not going to be you know it's not going to be it's not going to be the F35 I just I was thinki
ng last night like F35 cost $100 million which would you rather one F35 or 10,000 DJI drones With Grenades like remember how the birds do you remember the time the birds took down that plane and it landed in the Hudson like imagine the birds are drones DOD GMC I can't believe you're a subscriber who gave you money I know DJI is Chinese look look I'm I'm going to show you guys a video yeah have you seen these things better one now this one this is the Chinese military industrial complex okay e ye
ah okay so I'm just saying all right where were we we were trying to get this shit to compile I'm going to end up rewriting this thing I know I'm going to end up rewriting this thing I'm sorry if you were trying to sleep you all got an ad for the Chinese military industrial complex why are those why is true oh yeah I mean we're yeah we're going to lose of course we're going to lose they're not even trying you guys you don't understand America was a country built on slavery we deserve to lose Chi
nese look I it depends what you mean by America no there's really like I I think it's really like getting to a head where you have to take money away from these people right and it's the same people it's the VCS the Deep State the government the the problem is they're they own the money printer and they're friends with the guy who owns the money printer and you think crypto's better who owns that money printer right like like like this will not be solved until we go back to Gold this is the only
way to fix this like like everybody needs to stop accepting the fake paper money or the fake shitcoins too it's not better C crypto is the same thing oh but there's only 21 million of them there there's not like that that's not a naturally defined thing it's just defined some source code somewhere defined 21 million but no they would never raise the limit they will raise the limit I promise they will raise the limit um you know what they can't raise the limit of G like money needs to be based o
n something that is naturally scarce or you will always end up in the in the shit that we're in today like it it's really it's really getting to a head now well you can try to make more gold good luck like the only way that this stuff gets fixed that all the scams get fixed that everything becomes like like we have been stagnant since then yeah golden asteroids golden asteroid are the most incredible thing because we're going to have to build sophisticated infrastructure to go extract the gold f
rom the asteroids like that's amazing right like gold is the perfect money it scales with us to space like yeah let's go into space and get some gold that's a great reason to go to space you can just mine more gold yeah you know what happens when you mine gold you're buying mining equipment and mining equipment is buying engines and engines are buying Fuel and you're stimulating the economy instead it's it's it's five people with an Epson printer hitting print a bunch of times like like I I don'
t want to sound like one of them libertarian assholes because you know you don't have to be a Libertarian to realize that like you can't let the government print the money it's not just the government who prints the money it's all the VCS too where do you think the VCS get their money the VCS get the money from LPS who are LPS LPS are just Pension funds where do Pension funds get the money they are the government it's all the same shit right there's there's there's two sides of the thing there's
the people who print the money and the people who work for the money and if you're not printing the money you're a clown and everybody who works for the money needs to get some pitchforks together you know what we don't even need pitchwars we don't even need pitchwars we just need to stop accepting their printed money is legitimate money it's not it's completely fake it's a bunch of cronies with a printer sure you can take your fake tax dollars and your fake money that's fine but when it comes
to anything real I mean I'm going to make a coffee shop and we're only going to accept Gold All right but we all need to stand strong and we all need to do this together because that's the only way this stuff gets fixed otherwise this will never be fixed and they will continue the scams will continue the beatings will continue until the money [Music] improves most money printing comes from the private Banks well if the banks were private this would be fine because Banks can sure Banks can fracti
onal Reserve all they want but people need to stop thinking of banks as a secure place to hold their money and you better keep your gold under your mattress defended by a unry with an AK-47 no I don't want Bitcoins okay they're they're they're worse they're dollar level crypto is dollar tier ah but it's it's algorithmic they algorithm says they can't make it they do remember that time ethereum rug pulled everybody and switched their proof of work coin to proof of stake right there is no sure bit
coin's a little bit Lindy right if bitcoin's been around for 200 years and nobody's changed the 21 million i' believe it a little bit more right but again that's younger than our constitution right you want to talk about how Lindy something is it really matters in money but bitcoin's been around so long it's been around less time than Nvidia okay Nvidia shares Apple shares are more hard money than Bitcoin um is to have no owner yeah but like you know what has no owner gold or gold is going to ha
ppen oh gold is going to happen right gold gold happens when everybody loses faith and all the fake money why is it doing so well another question to ask is why is gold doing so so poorly right the the problem with gold and the reason gold doesn't do the same shit that crypto does is you don't often get or they put a lot of work into avoiding uh hype Cycles right no one think gold is going to is going to 2x in the next year you don't get these you don't get these fake hype Cycles which which Dri
ve the crypto Market other minerals have similar properties which one no no no no growing economies don't need the most liquid assets this is completely fake this is you you've fallen for a scam you've fallen for a scam where it's like well we're just going to print more money so now we have more liquidity no no no this is not how it works right what you just did there so you have a map in the territory right what you just did there was you had a map and you're looking at the map and you're like
oh shit we're going to have to walk across this mountain and then like you know the guy next to you is like I got an idea and he pulls out an eraser and he erases the mountain from the map oh yeah oh that's fixed oh good job oh yeah that's going to be so much easier now no all you did was change the map you didn't change the territory he folds it right he folds the map he folds over the little Hill um all right if if we're getting to 1971 it might the end of the stream we did some good T stor a
nd stuff we struggled with this stupid crap uh no we need like a good python 198 1971 means the end of the stream you guys like it is not going to get better it is not going to get better until here what's what's the quote it's at the end I don't believe we shall ever have good money again before we take the thing out of the hands of the government that is we we can't take it violently out of the hands of government all we can do it is by some Sly roundabout way introduce something they can't st
op gold it says something when I run clang to Pi oh do I have to put it in oh do I have to like put it in C++ mode or [Music] something oh oh clang C library are different oh okay okay this this might be um we have Jamie I don't know yeah instead of banning Bitcoin they approve the ETF exactly exactly okay we can have clang 12 are we happy with clang 12 we can have clang 14 that's not that's not allowed I might have to tell it at C++ this might actually be a real thing okay we're still getting t
hat same bug but now it's that I don't think this matters we can try it no that's not the problem but there might be a flag for uh inclined to fly for it's like maybe I did like it's like DX CP or or something TV air parsing translation unit all right we we have to go back to subscribers only if we want to do anything but I agree I agree that we are the mistake was literacy right the minute you give people literacy you can brainwash them and if like most of the country was illiterate they would
understand but but the money is uh you just print the paper the money is made of paper you print the oh no no no no no you don't understand you see there's an organization and it's separate from the government and then there's this thing called blockchain technology like like it's no no why is the money not a coin made of gold but where are you going to store your gold old yeah this is not a real problem okay people have been making it work for thousands of years but today we have more money tha
n ever no we don't like we don't no oh for thousands of years everybody understood this everybody understood that you can't have one asshole making the money out of paper or God forbid fake Internet money made out of cryptography wait gold wasn't brainwashing gold could buy you land and people mean you Landing people I've always had value but like those things still have value today like land has value how is this code even supposed to work okay those structures and functions that's looks pretty
good we had that old thing which was getum PCI devices we had the uh can TTL do this what can it do all right maybe we should be thinking about this in a different way maybe we should be going even lower level than this like what are these what are these metal things doing why aren't we just calling PL to do the risk building we should be able to do running from TT lib I believe let's see what's in here um okay where is TT lip we want this to be free of all the uh let's move metal yo should we
just be using S Trace what's this actually doing let's let's let's let's let's let's instead of going higher we should go lower when they go low when they go high you go low all right so these things are actually being built what are these things oh look there oh these are the dam knocks oh that's cool the minute VCS invest in it you know it's a scam and VCS have invested heavily in blockchain technology notice how they haven't invested in Gold like imagine a VC funded like you know it's like a
little bag that you keep on your chest where you keep your gold right you don't see them making that but oh we invested in a crypto wallet see that's how you know it's a scam much simplified element wise binary at the top here oh yeah but I mean this is just this just has wrappers around it uh oh wait what I don't understand what where are these things do well this is in the kernel acquire eight tile register so this is a Compu kernel okay I mean we're not even close to this yet we're still stru
ggling I'm going to try to figure out where the risk compilers are actually being called um to figure out if we can tap into it at that layer right because like we really don't want to use we want to use as little as possible and get to like like uh yeah I mean this stuff's probably fine and there's probably like a whole like a runtime build environment that we can extract from this that includes this stuff it's prob cool um but I'm just talking about all like the outer device layer stuff it ask
s money from the bank well the bank can give them gold right like like like like I just want you to think about fractional Reserve banking with gold for a minute right like take take I bring 10 pound of gold to the bank right see with with paper money and fake databases this all seems kind of okay but when you actually get down to what fractional Reserve banking is it means that they're loaning out nine of your gold pounds to other people and assuming there's not going to be a rush on the bank w
here everybody comes and asks for their gold pounds of gold back at once right suddenly Banks don't seem like the most genius idea anymore right but when the money's all fugazi oh yeah well you see you know the bank has has deposits to no like I want you to think about this in terms of literal pounds of gold right like when I go knock on the Bank of America door if I'm depositing 10 pounds of gold right like where are they putting my 10 pounds of gold think about that right just imagine daycare
that loans out 90% of your child right that doesn't sound like a good idea why do you think fractional Reserve banking is a good idea one day everyone will come back after their children back don't worry don't worry don't worry we're lending out your gold but when you come for your gold we'll give it back to you well how are you going to do that if you're lending it out well you see we're going to take some of other people's goals and give it to you we're putting it in a big gold fat but what if
everybody comes and asks for their gold oh well like that almost like never happens also we have like FDIC insurance and the gold printer will bail everybody out oh good thing there's a gold printer what is it calling h so once we go very very deep down through tons of crap we found the the call to [Music] system see this is the problem with all these crap lares of abstraction your thing is forking multiple times to build this [Music] code oh here we go TT metal backend dump run command let's s
ee that seems cool uh all right so where's my that runs a kernel all right let's try that whoa that's a lot of system commands no wonder it's so slow these numbers whoa what's a command Cube producer how come I didn't know about command Cube producers where does that guy live well I looked at all the system commands it was running and there's a ton of them oh you know what well my kernel is very complicated because I'm doing okay we have a lot more to do before we try to run multiple kernels we
still don't really understand how one kernel runs so let's get rid of dprint does that have less commands tons of commands and I don't think there's a cache oh well no no no that only is happening because I'm calling the de print I'm passing in the magic de print course that did not change the insane number of system commands that AR wow good thing this computer's fast look at all this stuff we rebuild we be rebuild risk common every time o a dispatcher konel oh that's cool we need to expose tha
t yeah okay we we have not gotten to nearly the lowest levels yet and I apologize to everybody for trying to jump ahead and write python cuz we're nowhere near there yet we do not understand how this thing works yet we didn't even know it had a dispatcher colel okay um let's go lower what's BR risk K what if you don't believe in Gold you're being scammed okay that's all I'm saying how is it for thousands of years everybody used this but somehow in in 1971 we became enlightened and realized we di
dn't need that anymore and then everyone you know just just just just really ponder that question I mean there's only two answers to the question one is yeah bombarding lead with neutrons will produce gold that's incredible and if we could find a cheap way to bombard lead with neutrons that would be great um but we haven't right gold has has stayed this incredibly good I mean gold is still scarce right gold is still scarce it's if you could figure out how to turn lead to gold today you can still
figure out still make a lot of money um what I'm saying to you is one of two things is true and you should very carefully think about which one you believe one we discovered something new about the world in 1971 and suddenly money didn't to be backed by gold anymore and everything's okay or two you're getting massively scammed now I'll note that every single time in history they've tried paper money before and it's you know it's not it's not a brilliant idea you know it's not like it's not like
it took some like next Generation scientists to be like well instead of gold which is hard to find what if we just printed our logo on a piece of paper and told him it was money right TR trust me the Romans had that idea too you know a physical asset shortage yes yes why is there a shortage right like you can't just just again again like yes you know what sucks when you have to walk over a mountain like oh we're going there and we have to go we have to Traverse over that mountain the solution i
s not to fold the map in half and be like look the Mountain's gone right that doesn't fix it that just hides the problem right sure maybe you'll March along for another 10 miles and be like there's no mountains here I have this good quality map um it aggravates me so much it aggravates me so much that like you're you're arguing with me like that you're not like you either believe thing one or thing two you either believe that like they discovered something new in 1970 or you're getting massively
scammed and I know the latter is just like hard for you to believe like I know no one wants to believe that right no one wants to believe everyone wants to believe things are okay but like they're just not and they they won't be until like what we call money is real again no it's not like it's not like a little scam right it's not like a little scam the the entire basis of the entire world economy is fake it's like a little scam it's not like oh you know we're selling some people like some home
opathic medicine right that's like a little scam right that's like that's like a little little small tier scam yeah we're selling them water and we told them about the water memory effect and we diluted rose pedals 10,000 billion times and like and they put it in their ear and it's actually just water but they get better because of the immune system right like that's a little scam the entire Global monetary system is completely fake that's that's the biggest scam of them all like the whole subst
rate is fake the whole substrate for all other scams are fake this is this is the mac daddy of scams all right is there more debugging info I can enable somehow that would be interesting like somehow I'm pushing there has to be something that's like loading these things into memory I guess it could be the loaders but there's some like dma engine that I'm pushing over PCI actually let's take a look at the kernel driver I should have right cuz I built it KMD here we go all right good it's a very s
imple Colonel driver the character device should be in Dev I guess actually open that device search for Dev there we go Dev ttor do I have a Dev ttor oh I have a Dev 10 store great command cues are in pin huge pages and system memory dispatch Curel warning a device pulls binaries run time AR CQ PCI cool um yeah so let's take a look at how this is actually working something's opening Dev 10 torrent here in in oh in UMD oh so yeah it's this UMD thing yeah so UMD is the actual um I wish it told me
what UMD was o o emulation what's this zeu ep1 emulator what is this cool I didn't know about these things uh yay yeah so that's what I'm trying to do I'm saying like if I can get into the okay UMB is user mode driver that makes sense uh so if I get into this maybe there's like prints it's all in C++ goddamn C++ wow so much stuff stuff why are these functions implemented oh here we go TT emulation device it's TT silicon driver uh read from device right here okay so these are the things that are
actually controlling the chip TT metal logger level debug let's go oh cool okay so that shows the ah good good good I like this I like this um that's a complicated one that's my simple one it includes a big rant about gold it's cuz I'm getting tired it's almost nap time cool oh yeah so okay it's exactly this stuff so NQ WR buffer NQ program I don't know why we call Finish twice oh I guess we call Finish twice because I do the did the other one async but then why isn't that printing I would expec
t um this enq read buffer to print something the Finish must just be in there because that's probably the only way the uh sync stuff works I don't know why those are called always always uh right so the trisk are the uh are the are the uh kernel [Music] processors um uh wait why is oh is hex actually hex it's actually Hax oh no xxd what it's actually an X format uh yeah guess so that's not a binary for some reason um so I was going to see like exactly which one is which but compute kernel runs o
n trisk runs on these guys uh so that's these yeah so these are building the uh the compute kernel now that's a brick so we have brisks and trks where are my brisks doesn't have brisks only drks oh probably because uh that's not the name of the colonel yeah the colel is reader binary so this is going to have a Brisk uh that is an NC well okay there's a Brisk but it's actually this is an NC risk if I get the WR or unary it'll have a Brisk yes it has a Brisk okay that makes sense okay so we have a
n NC risk we have trisk 012 and we have a Brisk let's let's take some notes R skull has 120 it's not python stop it's not python it's text um there's that map M example you sent me that explains what the three are yeah okay so there's uh unpack math and pack respectively there's brisk and there's also NC risk um each core has five risk five processors you know it's interesting that you chose to like there's five but you chose to only expose three of them and I wonder why but in theory you could
expose was on five um oh so I actually know uh yeah writer un is brisk so this is um zero no no no no I understand they're all exposed but what I'm saying is you at that API level you only you you wrote your compute to look like this where these are actually running on three different kernels and the read and WR are completely separate compiled kernels from the thing yeah um I kind of like that seems like some arbitary distinction to me uh I mean there's probably a reason for it but uh read kern
el right kernel we can give some names to these things I mean just as an example I imagine these are pretty veral there was a bit confusing okay yeah so you just just wrap it with things wrap it with these macros um okay so uh the driver has two parts a KMD and UMD user mode driver um with uh yeah not much be game by Expos controlling three sure uh each core has one MD available and this can used for either cular buffer or long standing all NQ WR buffer I don't understand why there's not an NQ r
ead buffer here I think there should be so like in this program here I'm doing uh I'm doing this andq read buffer but I don't know why that's not uh yeah why isn't it why isn't it printing with debug I want to see like everything that can go kind of on the command [Music] que uh is this it me find the command Q manager ethernet crap let's Trace down the command Q creation oh well that's interesting so the command Q is are pre-initialized thank you for the bits uh so there's Hardware command cues
and software command cues I don't why is this like is this like fake like they do like fake queuing I hate this stuff like AMD does this too and like we ripped it all out and we had to find where the real command Q was I mean the key thing you want to be able to do to make neural networks fast with command qes is you want to be able to reuse the same command Q yeah Hardware command Q is the real que okay so that's what I really want to push to here oh here's the dispatcher okay [Music] uh okay
cool so let's find where this actually like yeah okay okay okay great so it's it's these it's these device commands that actually I mean I like looking at what the actual format of the command Q is so we have some packetized thing called a device command and I imagine there's some serializer for it um yeah yeah yeah we set some crap command header and then we have whatever it's a lot of stuff yeah okay so we we put these command qes then I imagine there's something which dumps the software comma
nd Q the hardware command Q um but that's actually really cool you're telling me that the command Q parser is just a kernel [Music] uh oh is that what that is that what that thing on the end was that I wasn't sure what it was the thing on the end in uh well yeah it's a kernel but is that what that's special was that what that Arc thing is like is is Arc your your your oh no so it's just like a normal kernel that runs on a core oh you take a row of 106 course okay so in other words I don't actual
ly get I lose 10 you're on dispatch colal command okay cool well that's cool at this it's not like the stupid Mees on the AMD GPU like this is the stuff I'm trying to AMD to open source CU I'm very interested in how all the scheduling works because scheduling is the key to make neural networks great um yeah no that's this is awesome you guys have really uh yeah open source everything I care about uh yeah I mean you don't I don't care if the farmer is open source or whatever like secret bring up
that stuff's not important as long as all the schedulers open source um because there's so much that can be done there okay so these are the kernels I see um how stuff works okay so this is like a special this is like shared memory and it's just loading those on let me see if I can find what actually look I mean there's got to be something I wonder if I see it with the dispatch right because it's got to be what is all that stuff LL runtime what is all these that one's reading the secq huge page
I'm trying to figure out where it's actually like where is it dispatching them firmware knit complete it's got to be these then but it's running on all of them not just it's running some LL runtime on everything in fact shouldn't I be able to unless it's precompiled no here it is wait that's like really close yeah right here oh so here you go you're in row 11 and you're you're dispatching it here oh producer knock there's super lean firm we running on every yeah on every Bisk uh oh but look okay
I see your producer and consumer are just I see it's you you took row 11 but your producer and your consumer are just on seven and one whoa that just accepts Colonels from the dispatcher but then there's this extra uh command Q uh thing that okay cool I understand um yeah I think with so many of these things like you don't need all these layers of abstraction what's wrong with just directly dispatching from the uh I guess I kind of understand so you don't want to have to uh manage the don't hav
e to manage that from The Host you don't have to M manage Colonel dispatch from the host so you just do it on two of the cores I mean I imagine the 10 are Reserve but it's just on two of them and you recompile them every time don't recile them every time use a cash the these are like the kind of things you just get for free in the in the tiny grad uh like like cashing around all this build stuff and um it's all free I mean yeah it can be caption to retrace sure uh the dispatch on ethernet cor ye
ah again sound like PR optimization I I wouldn't worry about any of that stuff I would just say like what's the so much complexity here I mean this is the downside of like yeah I I know like like I know it's it's tempting to go to these like big scale things but like again I tried to multiply I mean you saw my demo and I don't know what I did wrong but like I tried to put the Matrix multiply in a loop and it crashed like I I think that there's a problem here like grock is the worst about it righ
t like I'm sure grock's way behind you guys grock showed off that one Mixel demo and they're very happy with it but that doesn't mean you have an accelerator you can sell for anything uh I think that like I mean the hardware is pretty incredible that it's so flexible I mean maybe gpus are this flexible and I just don't know enough about them uh I wonder how much probably not I mean gpus weren't done with a very clean Chic design it's probably just hacks on top of hacks uh like is there a way in
a GPU if I had like firmware control that I can dispatch different kernels to the different compute units I'm so interested to read amd's uh they're going to open source it they they'll open source it the mees uh but I think I even want to go below the m yes I want their command Q parser and that one they say is harder so we'll see uh it's also all signed nothing looks signed on this I don't think it's Trade Secrets things are going well with AMD yeah I mean you could you could work Forever on t
his software there is there is like no limitations almost to how good you could make it uh this chip has so much flexibility I mean it's a it's a 600 core risk five chip and I can run arbitrary things on all the course it's not the HT htcp problem for for so what is this firware NC risk. CC I imagine that's for the NC risk probably one called brisk as well oh yes are there separate ones for each of the uh you know there is just one trisk this is the uh yeah they spin way for the uh Mak sense oka
y um I think there's one more thing I want to understand and then we'll call the stream uh so I want to understand kind of what the memory what the math engine looks like like uh like what the what the uh I think I get everything except for now what actually the math uh the math looks like so let's make a section called math um and let's look at the terlop numbers and figure out how they're actually achieved so so somehow we're getting 70600 G blobs per car uh it's 32x 32 but I say say it's a Ma
c okay so 200 yeah yeah okay so then to do that out we have you're running [Music] at what no you're only running at 300 no no no I know the megahertz it's running at 1300 mehz so you're saying each core so this should be uh 120 * 248 * that many mega flops right that would be a 319 tlop chip oh that's M ads in in something small these are the block based uh 8 bth loading points and block it okay so that's blockade I see so I don't know why I'm only getting uh 320 Tire flops there 3 * 32 is 128
so it's 32x 32 Mac I don't care if it takes 32 cyes it's pipelined um all right let's take a look yeah thanks for uh thanks for helping me um I think it's a good uh it's a fun way to it's a fun way to get some documentation uh wait for regular B FL 16 I can do that many no that this is this looks like a factor of three oh docs are wrong okay py 2 mode consumes all the mantisa but one LSP oh okay I don't know why it's easier but probably I don't know this this gets into stuff I don't know about m
aking Hardware I don't know how Math's actually fast I don't know what a lot of these things are in practice but that that gets down to the lowest levels I know maybe Sunday I'll go lower but yeah so it's just I mean it's it's instructions [Music] um and this is for Wormhole if we want the gra SK one common yes C kernel colel [Music] Ops I mean this whole there's a whole Rabbit Hole to go down here about how these uh how these dispatches how this stuff works like how unpack and it's putting them
in uh just read the yeah I I see the custom instructions more what I'm thinking is like so these must have like weird registers that are defined [Music] somewhere GPR is general purpose register face Dimension um okay so it's a 32x 32 Mac it's it's notably not a TOR Core like you're not doing you're not doing a multiply with one instruction uh I don't actually know like look I know the API for the GPU uh tensor cores I haven't actually thought about how a GPU gets built um but so it's a 32x32 M
ac and the neighbors can combine for uh bf16 uh can it do flow 32 or just bf16 and it accumulates in bf16 also is that okay or is he accumulating something bigger well I don't care about the multiply can I bf16 accumulate in float 32 or is bf16 accumulate in uh B float 16 accumulator does that work for training we have debates about this at work oh whatever that means I'll write it down what's oh in Gray skull in in Wormhole okay okay okay I see yeah yeah uh but the fp3 to accumulate is free yea
h I see I see uh yeah it's very hard to train you need really to accumulate and in in float 32 you never need to Flo 32 multiply like Flo 32 multiply is kind of useless and expensive that with that squared shit I mean yeah you know the Nvidia scam for every for everyone who doesn't know the uh Nvidia scam a PDF so uh where is it here is a here's a 4090 so the RTX 490 has a peak fp16 with fp16 accumulate 330 teraflops but then if you accumulate an fp32 you only get 165 Tera flops this is a total
scam and it's just done by software they blew an euse on the dispatcher to not let it accumulate and Float 32 fast so you can't use the thing for train which is it's just offensive like there's no reason look here here it is doing fp32 accumulate in in 330 um but yeah this is complete this is a a complete scam uh and you know because the same actually that's wrong that's wrong because if you buy the if you buy the uh expensive chip which is the same die it doesn't have the euse blown and like I
I went into the driver I was really hoping it was just something in the driver and the compiler I looked into the driver and it's just it's an euse uh so you know thanks Nvidia thank you Nvidia for giving other people a chance to catch up actually I mean if this really was the number of of 409s if 409s were that we'd have to use them for the tiny box but fortunately 490s are that and that's not actually so far off AMD doesn't do either of these things so there's two big features that Nvidia bloc
ks on their 490 and one is this and the other is PE to-peer transfers between the Cards over PCI uh they're complete the highend cards can do it uh these cards can't um I mean you know I'm too old for jailbreaking uh some all that all I see when I see that is the way that you do this is you make you just need to make Nvidia like feel the need to compete you need to put pressure on them um tiny Corp will know they're succeeding if Nvidia stops doing that in their future gpus because they literall
y have your training power of your 49s you should be upset and offended um cool uh I think that's today's stream I think I'm going to take a nap Alex and I got a nice Sushi dinner tonight uh yeah uh thanks to Tor guys for uh helping me understand this I'll push my documentation um my my general advice is this thing could be amazing but the software is going to be a huge Journey um you know we we're in luck uh gpus have done a lot of this stuff for you uh like have a lot of sort of it's not that
they've done a lot of it for you it's that the GPU like the Cuda style of programming has been around now for 15 years so people kind of understand how to do it uh whereas when you're given a chip with this much flexibility we have no idea what the correct abstractions look like um I would really avoid making things I'm glad I see that that it's using TVM I mean TVM is a great choice uh I think there's a few things that TVM exposes that they shouldn't like TVM you have to make a choice when you'
re writing something like this whether you want to allow like CNN style things CNN style things help you in the short term but hurt you massively in the long term uh because you're not going to be able to do kernel fusions you're not going to be able to do searches you're not architecture portable uh the only thing in tiny grad that's not architecture portable is that tanor core definition and it's not architecture portable because that's literally I'm describing Hardware right if you want to do
something that's not architecture portable it better completely describe physical metal transistors on your chip um whereas a lot of these things even things like command cues don't right like command cues are abstractions that you've put pretty low level in your software that don't have to be um I think that what you want to do is yeah really think about the like write the most generic software you possibly can that also works on gpus you can imagine the same abstraction being used for gpus uh
like you know my theory with with with tiny grad is we are not succeeding until we have written Until We have replaced the entire High user spaces of Cuda and Rocket it is only after we've done that that we can consider making our own Hardware um because yeah the software complexity is just is just insane and a lot of times what you end up doing when you don't understand the software you're writing you end up making the hardware too generic and that generic nature of the hardware is is burning
power burning transistors uh that could be used for uh faster faster shit uh but it's pretty cool if you're really getting those kind of teraflop numbers from a 12 nmet uh chip I don't know what's the what's the die size oh it's 600 oh okay I mean that's pretty big um but uh yeah still those those those numbers are well you're at a you're at a 3090 uh with that of course there's the ramb bandwith issue um I understand why uh what it's df4 uh I mean that's something that has to yeah that that tha
t kills you I think it's it's higher than that though right I I thought it was like 200 uh I mean yeah that makes that makes training basically like impossible until we all move to the new paradigm of training and that gets that I mean that gets even further down this down this crazy Rabbit Hole software which is like uh the current training I mean look the current training methodologies are insane if if you told me 10 years ago when I was first seeing Alex n that like L Lux hello welcome new pe
ople we were we were just getting off stream but we'll do a little recap for the raid uh and then we'll end today's stream um yeah what we did today was we played with the T torrent e uh T torrent e uh 150 you can purchase uh these guys at T torant website um again if you're planning on purchasing it to do something yeah yeah doing things but if you want to play with really uh an incredible piece of Hardware that's that's pretty unique um and look I would not endorse this Hardware at all if it w
as not open source um so not only is this Hardware quite unique and cool it exposes everything to you um you can you can go through and you can see so when you have like a GPU right like a GPU has a programming model AMD is a lot more open source than Nvidia but when you have an Nvidia GPU you're basically given the Cuda programming model and if you're not okay with the Cuda programming model well Nvidia tells you you know uh get bent um but AMD lets you go to lower level stuff and then T torren
t lets you go to an even lower level than that so in terms of like open sourness this is more open source than AMD um you can you can see things that AMD I mean I'm pushing AMD if you follow me on Twitter I'm pushing to open source stuff but uh well good oh this is another this is another thing that I can mention to them I'm L tor's going to be you guys you know Tor and open source their command C processor uh you guys didn't do that um so yeah there I I don't really know how flexible modern gpu
s are uh in a lot of ways modern gpus do look similar to to the T torn things um so what the Tor thing is if you guys are interested it's here this this is a pretty good good uh it's a pretty good picture actually you know I'll just find that guide. MD here it is um so this is basically what the 10 stor chip looks like make that don't move that make that a little smaller uh so this is pretty much what that chip looks like and each one of these workers is five risk five processors in a tiny littl
e chip uh so this is this is what happens when you zoom in they call it a 106 core uh they have two two of them for data movement I think there's something special about them and three of them for compute um the compute you can imagine one guy with a shovel over here one guy typing in the middle on his calculator and another guy with a shovel over here that's how the compute goes uh you can imagine the the data movement guys are operating something that looks like uh bulldozers right one guy's c
oming over here dumping dirt for the shovel man uh you know driving the bulldozer back and forth over here one guy's coming over here he's taking away that dirt moving it on uh back to memory uh so yeah that's kind of what that looks like um what's here um yeah so this is this is yeah this is this is basically what I was saying bulldozer here shovel man shovel man bulldozer uh cool model that is all completely exposed so if you want to play with new unique Computing Hardware uh yeah come check o
ut this card but yeah there's a there's a long long journey to building a useful neural net stack on top of this and I think a lot of the stuff is Jumping the Shark I think um companies have this very unfortunate nature of deciding well we need to get started on that problem today so they'll take a problem and they'll try so hard to subdivide that problem into multiple tasks so they can put multiple people on it uh this is this is like this is why software looks bad today um so much software loo
ks bad because they're trying to paralyze things that are fundamentally not parallelizable uh what you do is you have a 100% software team guess what happens your software is going to have 100 pieces it doesn't matter how complex a software is your software could literally be like I press a button and a Skittle comes out of a Skittle dispenser but if you had 100 people well that software is going to have 100 Parts your software will always reflect your organization um so it's it's unfortunate th
at like this is kind of what a lot of this stuff looks like theyve built this ecosystem they have two different looking docs websites um you what you you don't want you don't want you know if you have if you have a software team of 100 people and you have 10 Geniuses and 90 mediocre people your team is much stronger without the 90 um and instead of focusing on trying to do everything in parallel focus on building up slowly serially exactly what you need uh yeah and I think that would end up with
a much better uh much better much better thing at the end of the day um but yeah so what did I actually do this stream uh I wrote up some uh basic stuff uh to run a simple kernel that doesn't really do anything but uh at least all the pieces are there um a junk python API I INF finish uh using T torren python API but I'm not really sure what it can do unless I use ttnn which I couldn't get compiled uh there is this highle demo here uh well we'll throw that guy in there to um which uses pytorch
but don't put this output in a loop don't put it in a loop uh if you put in a loop you're going to have a bad time um using pi Bo like this is the kind of stuff that people exactly don't want to use right like the the best thing is compatible the second best thing is completely incompatible and different the worst thing is well it's like kind of compatible kind of compatible is the worst look at Python 3 um so yeah uh oh and we should let oh I forgot to let normally yeah I should be better about
that when people raid I should let nonsubscribers talk so you guys can all talk uh we're in the Talking part of the stream now we're going to do about 5 minutes of talking actually because I am tired it's nap time I've been on that bifas you know there's two sleep schedules that have ever worked in human history it's monophasic and basic so I've been on like a basic sleep schedule I got like six hours of course sleep and then an hour and a half nap so it's nap time skidy um please stream more o
ften well had something special for today's stream today we learned about the T stor cards uh y yeah it was good it was good having them in here uh to to to explain stuff um yeah tiny Corp has a lot of work right now but I really think you guys should try to try to Port this to Tiny grad uh and then focus on like it let you focus on what the real problems are um a common thing you'll see about uh no he didn't no he didn't you can't actually sleep 20 minutes six times a day that's that's myth bus
ted foros um a common uh thing you'll see is like again it's it's people like it's it's the humanoid robots are a perfect example of this human robots are complete cargo cult uh so for those of you that don't know the story of the cargo cult uh let's see let's just bring up a picture um so this uh it's from Business Insider but so there was an island in like the South Pacific where the US was coming and using it as a base during World War II and the US would bring planes and these planes would h
ave MREs in them and you know the indigenous people there were like wow this stuff's amazing uh then the war stopped and the planes stopped coming and they were upset about this so you know they built some planes to try to make the come back uh yeah yeah well it turns out that uh that that doesn't bring planes back right it kind of looks like a plane but it doesn't bring planes back um so you know the humanoid robots are the exact same thing people have things that have the brain of a bird but t
hey build a thing that looks like a person they think that somehow that's going to imbue it with personhood which in reality you know if you can't make the comma body work if you can't make the comma body do everything like if you can't make the comma body into like the world's best security guard stop trying to build a humanoid robot um I knowe something about cargo cult yeah learning how not to fool yourself uh I think I've actually read this uh yeah um um we'll read this uh may I never read t
his this's cool at least I don't know I still thought it was cool even though here it's like corpor and terrible now um yeah they follow all the apparent precepts and forms of scientific investigation but they're missing something essential because the planes don't land um I think there's a ton of this sort of thinking out there today um to explain to them why building a plane out of sticks isn't going to be a plane right um yeah right like don't fool yourself people fool themselves all the time
um people also look I'm guilty of this tiny gr everyone's guilty of this right like your software there's software that other people use and there software that you can use right the software that a few people can use the software that a few people are willing to put up with it is so so incredibly hard to get software over this usefulness point that people will put up with anything in order to use it first principle is you must not fool yourself and you are the easiest person to fool this is ex
actly how we started the stream and this is how we'll end the stream um if you lie to other people you lie to yourself right like you start by lying to yourself and then it manifests as lying to other people for so the men in charge of programs at n are so anxious for new results in order to get more money to keep the thing going for public relations purposes they are destroying possibly the value of the experiments themselves and so the men in charge of VC firms are so anxious for new technolog
y in order to get more money to keep the thing going for public relations purposes they are destroying possibly the possibility for technology itself what's the conclusion of T Toren the open sourcing is very smart um you know this is this is the most open uh of any of the accelerators you can buy uh I think that's true I think that's true I think uh I mean you can look at the Google Coral I did a whole bunch of reverse engineering on that um closed Apple npu lock down uh if ten torrent manages
to keep this level of Open Source uh start being really honest about like what parts of the stack they should work on and what parts of the stack they shouldn't work on don't give bounties for models like like like I understand the bounties for models thing is like tempting I understand tiny grad does it in fact I think the model bounties are some of the worst ones but um yeah don't uh like that's not your strength your strength is not that we can run a language model at X number of tokens per s
econd give that to the O llama people give that to the CP you give that to the the you know what we'll end the stream with this thing where's the Truffle truffle AI I thought it was I thought it was an overpriced scam but it's actually less of an overpriced scam than I first thought uh oh I don't know oh maybe maybe they don't actually have any software at all but regardless regardless it doesn't seem like a if they really have an Orin with uh wait how do you know it has 64 gigs of RAM who foole
d me into saying it has 64 gigs of RAM does it have 64 gigs of RAM I don't know 60 gigs of RAM okay that's Ram um regardless like the Truffle one how many truffles in a tiny box a ton a ton truffle if you're just going they their their headlining thing that they're talking about is 200 gigabytes per second the Tiny Box has 5,700 gigabytes per second so by that ratio there's about 30 truffles in a tiny box um and there's only there's only uh 13 truffle you can buy 13 truffles for the price of one
tiny box so you know it does it does work uh uh how would you use an llm at high speed without using grock dude you can build that demo on Nvidia just nobody gives a shit you can build grock's exact demo you give me an ad100 machine a month and you know what how about this if someone buys me an h100 machine I will build you that exact same mixt demo if you let me keep the machine when we're done okay that's my challenge to anyone about that stupid grop demo like oh it's 500 toget per second you
give me an 8100 like an sxm 8100 machine like a $400,000 machine cheaper than the grock uh I will build you that exact same demo if you let me keep the machine when we're done um so yeah that's a that's a challenge to uh no you will work but you won't be able to build the demo I can can actually build the demo I know exactly how to do it uh it'll take me a little more than a twitch it will probably take me a month uh but yeah i' work for a month for 400k sure why not um and I learned a lot of c
ool things about gpus so if if if someone hey Nvidia hey hey Nvidia if you want to take me up on that send me an ad100 machine and I'll build you that grock demo I'll build you something faster than that grock demo or you can come take the machine back for me I do know you you're not even a subscriber bro um yeah if if if if Nvidia would like uh yeah yeah a h100 sxm give me give me give me one of these machines and I will build you that stupid grock Mixel demo uh yeah one of them that looks nice
wow it's not even that unreasonably priced it's only uh why am I skeptic about grock um well if someone wants to give me one of these I'll build you that demo because you can build that demo on Nvidia just as easily in fact I might even be able to build a demo on a single maybe not a single h100 but uh I'm not going to spend my own money on this like giving Nvidia money it's just 90% gross margins 90% well it's not that GRS are I'll tell you what happened so like look Gro let let me tell you wh
y T torr's going to succeed and grock's going to fail right what does t torrent do ten torrent figures out and it's not easy it is not easy you can go on archive.org and you can see how long ten torr has said they were going to ship the cards for and ship the open source software it took them two years but they did it it's really hard but you can't fool yourself to do that right you have to like here it actually is it's open source right you can't take it back with grock they built a single demo
that looks kind of cool but it's a demo right weo drove a blind man across Austin in in a long time ago um that's a bold statement that I can do better buy me the computer how about this I'll do one better you buy me the computer and if I fail at it if I fail at it I'll give you 10 grand in the computer back right so I'll put some I'll put some I'll put some I'll put something on the line yes I know you can try grock on playground but I I if you don't understand what's wrong with that I can't e
xplain it to you 300K on the computer back is slow count I mean i' do 10K in like a month um see the problem with 300K and the computer back is like the the problem with that is not even that I don't think I can do it it's just that some absolutely ludicrous opportunity might come up in my life and I might just want to do that instead um look ex dropic again uh they're early okay they're an early startup but don't believe in hype ask yourself what have they shipped and even better than what have
they supposedly shipped right like a YouTube video is one thing but what have they shipped that I can buy because once you they've shipped something that you can click Buy It Now on well that's it right like like that's that's where the dollars end up right that's like fundamentally I gave $800 to 10 torrent for this card and I never expect to see any amount of that $800 again right it's not an investment it's not a scam it's straight up that's how consumerism works it's like if I buy an Apple
I eat the apple great um so I'm not saying grock is a scam I'm just saying their demo is not impressive and I can replicate their demo on that machine or 10K and your machine back I me look grock is better than grock is better grock is the same tiers like cabus right actually you know like Services actually I think make more Revenue too which is which is something else to be said um the era of one bit llms maybe I don't know I'm I'm is it inference in one bit or is it training in one bit right t
he largest llns today are still being trained in B float 16 uh their lpu thing is bullshit for me it's not bullshit it's just not competitive right the 10 card is not competitive either you should not if you trying to do any sort of serious AI work like normal straight up AI work you should not buy 10or Cards um you should buy probably h100s maybe Mi 300 A's and maybe maybe tiny boxes um but you know again Tor also like I think that the like Gro and Google TPU architecture are they don't look li
ke the brain um whereas the Tor thing looks a lot more like the brain and not only does it look a lot more like the brain it's a much more buildable kind of architecture because like the future is okay so we can look at sarab right service is kind of cool um again I had a lot of this stuff wrong I wrote an old blog post I wrote an update blog posst and I've learned so much more since then I mean this is like what I spend all my time on now so so this is this is cabus they're doing wer scale um s
o I I talked to Jim about this and what he said was like look you don't actually want to do waer scale what you want to do is chiplets and a substrate and completely uh the problem with this kind of stuff is it's really hard to get yields uh and I would guess that a lot of those waivers end up thrown away and you're like oh we can route around that but you don't know how much they can Route Around It um yeah so like what you probably want is a chiplet style architecture and something like tens t
o lends itself really well to that because you're going to have to solve the grid communication problem at some scale right you're going to have to solve the data shoveling engine at some scale right so gpus don't have five risk fives on each compute core but they have sdma engines they usually have two sdma engines right which is going to I just know that the AMD GPU has two sdma engines so you can you know send data there pull data from there at the same time right that's why you need two engi
nes whoop uh what's my current goto llm Claude I paid $20 a month for Claude right it's better than what I can get [Music] uh why does a chip company make a well why is it inference only I don't know if they're saying it's inference only I mean the gray skull chip is inference only because it can't accumulate an fp32 it also has crappy memory bam with I don't think that's fixed with Wormhole either but again these things are all fixable like in theory um qu uh no the open source ones are cool if
you want to need something offline or you want to you know talk talk about naughty things um Wormhole can do training it has yeah that fp32 more memory band you need you need a lot of memory B for training uh yeah we holding Afters for training yeah no I mean training is the training is the place to compete here here's another one of these these people who are like we're in only company no you can't be the future of AI is training always the the the future is not models that you train once and
then don't continue to update their weights right humans aren't like this right the the the future is going to be some form of always on training always on fine-tuning uh it has to be uh right and we're in some we're in some uh very limited uh space of models now that don't do this um all right cool it's good note to leave the stream on thanks again guys have a good day enjoy your weekend

Comments