Main

#21 Lex-Free Man Podcast | Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators

Lex-free version of Episode #211

Lex FreeMan

4 days ago

this is the Lex free podcast where we  abridge The Lex podcast With Love by replacing everything Lex says  with a pleasant guitar strum enjoy my first program back and when  was it I think I started as a kid and my parents got a basic programming  book and so when I started it was typing Out programs from a book and and  seeing how they worked and then typing them in wrong and trying to figure out why  they were not working right that kind of stuff I don't know I mean I feel like I've  learned a
lot along the way and and each of them have a different special thing about them  so I started in basic and then went like GW basic which was the thing back in the Doss days  and then upgrade to Q basic and eventually quick basic which are all slightly more fancy versions  of Microsoft basic um made the jump to Pascal and started doing machine language programming and  assembly in Pascal which was really cool turbo Pascal was amazing for its day um eventually got  to C C++ and then kind of did
lots of other weird things it was straight straight into the  machine so started with basic Pascal and then assembly and then wrote a lot of assembly  and um eventually eventually did small talk and other things like that but that was not the  starting point but so what uh what is this journey into see is that in high school is that  in college that was in high school yeah so and then that was uh was really about trying to  be able to do more powerful things and what Pascal could do and also to
learn a different  world see it was really confusing to me with pointers and the syntax and everything and it  took a while but Pascal is much more principled in various ways C is more I mean it has its  historical Roots but it's it's not as easy to learn well so you have that in Pascal as well but  in Pascal they use like the carrot instead of the star and there's some small differences like that  but it's not about uh pointer arithmetic and and and C you end up thinking about how things get  l
aid out in memory a lot more and so in Pascal you have allocating and deallocating and owning  the the but just the programs are simpler and you don't have to well for example Pascal has a string  type and so you can think about a string instead of an array of characters which are consecutive in  memory and so it's a little bit of a higher level abstraction sure so I guess they're different  things so let's start with what is a compiler it's a is that a good place to start what are the  phases o
f a compiler where are the Parts yeah what is so what what is even a compiler used for so  the way the way I look at this is you have a two-sided problem of you have humans that need  to write code and then you have machines that need to run the program that the human wrote  and for lots of reasons the humans don't want to be writing in binary and want to think about  every piece of hardware and so at the same time that you have lots of humans you also have lots of  kinds of hardware and so comp
ilers are the art of allowing humans to think at a level of abstraction  that they want to think about and then get that program get the thing that they wrote to run on  a specific piece of hardware and the interesting and exciting part of all this is that there's now  lots of different kinds of Hardware chips like x86 and power PC and arm and things like that but also  highperformance accelerators for machine learning and other things like that or also just different  kinds of Hardware gpus the
se are new kinds of hardware and at the same time on the programming  side of it you have you have basic you have C you have JavaScript you have python you have Swift  you have like lots of other languages that are all trying to talk to the human in a different  way to make them more expressive and capable and powerful and so compilers are the thing that  goes from one to the other now end to end from the very beginning to the very end end to end and so  you go from what the human wrote and prog
ramming languages end up being about um expressing intent  not just for the compiler and the hardware but the programming language's job is really to to  capture and an expression of what the programmer wanted that then can be maintained and adapted  and evolved by other humans as well as by the interpreted by the compiler so so when you look  at this problem you have on the one hand humans which are complicated you have Hardware which  is complicated and so compilers typically work in multiple
phases and so the software engineering  challenge that you have here is try to get maximum reuse out of the the amount of code that you write  because this these compilers are very complicated and so the way typically works out is that you  have something called a front end or a parser that is language specific and so you'll have  a c parser that's what clang is um or C++ or JavaScript or python or whatever that's the front  end then you'll have a middle part which is often the optimizer and the
n you'll have a late part  which is Hardware specific and so compilers end up there's many different layers often but these  three big groups are very common in compilers and what lvm is trying to do is trying to standardize  that middle and last part and so one of the cool things about lvm is that there are a lot of  different languages that compile through to it and so things like um Swift but also Julia rust  clang for C C++ subjective C like these are all very different languages and they ca
n all use  the same optimization infrastructure which gets better performance and the same code generation  infrastructure for Hardware support and so lvm is is really that that layer that is common that all  these different specific compilers can use and is it um is it a standard like a specification  or is it literally an implementation it's an implementation and so it's uh I think there's a  couple of different ways of looking at it right because it depends on which which angle you're  lookin
g at it from lvm ends up being a bunch of code okay so it's a bunch of code that people  reuse and they build compilers with we call it a compiler infrastructure because it's kind of  the underlying platform that you build a concrete compiler on top of but it's also a community  and the lvm community is hundreds of people that all collaborate and one of the most fascinating  things about lovm over the course of time is that we've managed somehow to successfully get harsh  competitors in the comm
ercial space to collaborate on shared infrastructure and so you have Google  and apple you have AMD and Intel you have Nvidia and AMD on the graphic side you have cray and  everybody else doing these things and like all these companies are collaborating together to  make that shared infrastructure really really great and um they do this not out the goods  of their heart but they do it because it's in their commercial interest of having really great  infrastructure that they can build on top of a
nd facing the reality that it's so expensive that no  one company even the big companies no one company really wants to implement it all themselves  expensive or difficult both that's a great point because it's also about the skill sets right and  these uh um the skill sets are very hard hard to find yes so the lvm came from a university  project and so I was at the University of Illinois and there it was myself my adviser and  then a team of two or three research students in the research group
and we built many of  the core pieces initially I then graduated and went to Apple and at Apple brought it to  the products first in the open Gil Graphics stack um but eventually to the C compiler realm  and eventually built clang and eventually built Swift and these things along the way building a  team of people that are really amazing compiler Engineers that help build a lot of that and  so as it was gaining momentum and as Apple was using it being open source and public and  encouraging cont
ribution many others for example at Google came in and started contributing  and in some cases uh Google effectively owns clang now because it cares so much about C++  and the tion of that that ecosystem and so it's investing a lot in um the C++ world and the  tooling and things like that and so um likewise uh Nvidia cares a lot about Cuda and so Cuda  uses clang and uses lvm for for graphics and gpgpu yeah no no it was nothing like that so  I mean my my goal when I went to University of Illinoi
s was to get in and out with a non-thesis  masters and year and get back to work so I was not I was not uh planning to stay for 5 years and and  build this massive infrastructure I got um nerd sniped into stay and a lot of it was because lvm  was fun and I was building cool stuff and learning really interesting things and uh facing both  software engineering challenges but also learning how to work in a team and things like that i'  had worked at many companies as interns before that um but it w
as really a different a different  thing to have a team of people that are working together and trying to collaborate in version  control and it was it was just a little bit different yeah that's one of the major things it  does so um I got into that because of a person actually so when I was in my undergraduate I  had an adviser or a a professor named Steve vegall and he I I went to this little tiny private  school we there were I think seven or nine people in my computer science department or
students  in in my in my class so it was a very tiny uh very small school it was a kind of a wart on the  side of the math department kind of a thing at the time I think it's evolved a lot in the many  years since then but um but uh Steve vegal was a compiler guy and he was super passionate and he um  his passion rubbed off on me and one of the things I like about compilers is that they're large  complicated software pieces and so one of the um culminating classes that many computer science  dep
artments at least at the time did was to say that you would take algorithms and data structures  in all these core classes but then the compilers class was one of the last classes you take because  it pulls everything together and then you work on one piece of code over the entire semester and um  and so you keep building on your own work which is really interesting and it's also very challenging  because in many classes if you don't get a project done you just forget about it and move on to  th
e next one and get your you know your be or whatever it is but here you have to live with the  decisions you make and continue to reinvest in it and I really like that and uh and so I did a extra  study project with him the the following semester and he was just really great and he was also a  great mentor in a lot of ways and so from from him and from his advice he encouraged me to go to  graduate school I wasn't super excited about going to grad school I I wanted the master's degree but  I did
n't want to be an academic and um but like I said I kind of got tricked into saying and was  having a lot of fun and I definitely do not regret it for me it was more so I'm I'm not really a math  person I can do math I understand some bits of it when I get into it um but math is never the thing  that that attracted me and so a lot of the parser part of the compiler has a lot of good formal  theories that Dawn for example knows quite well still waiting for his book on that but um but the  uh but
I just like like building a thing and and seeing what it could do and exploring and getting  to do more things and then setting new goals and reaching for them and um and with in the case of  comp in the case of uh lvm when I started working on that my research adviser that I was working for  was a compiler guy and so I he and I specifically found each other because we're both interested  in compilers and so I started working with them and taking his class and a lot of LM initially was  it's fun
implementing all the standard algorithms and all the all the things that people had been  talking about and were well known and they were in the the curricula for uh advanced studies and  compilers and so just being able to build that was really fun and I was learning a lot by instead of  reading about it just building and so I I enjoyed that hard so I'll give you examples of the  hard Parts along the way so C++ is a very complicated programming language is something  like 14400 pages in the sp
ec so C++ by itself is crazy complicated can can we just sorry  pause what makes the language complicated in terms of uh what's syntactically like uh so  it's what they call syntax so the actual how the characters arranged yes it's also semantics  how it behaves um it's also in the case of C++ there's a huge amount of History C++ built  on top of C you play that forward and then a bunch of suboptimal in some cases decisions  were made and they compound and then more and more and more things keep
getting added to C++  and it will probably never stop but the language is very complicated from that perspective and  so the interactions between subsystems is very complicated there's just a lot there and when  you talk about the front end one of the major challenges which cang as a project the the cc++  compiler that I built I and many people built um one of the challenges we took on was we we  looked at GCC okay GCC at the time was like a really good industry standardized compiler that  had
really consult validated a lot of the other compilers in the world and was was a standard but  it wasn't really great for research um the design was very difficult to work with and it was full  of global variables and other other things that made it very difficult to reuse in ways that it  wasn't originally designed for and so with cang one of the things that we wanted to do is push  forward on better user interface so make error messages that are just better than gcc's and that  that's actually
hard because you have to do a lot of bookkeeping in an efficient way to be able to  do that we want to make compile time better and so compile time is about making it efficient  which is also really hard when you're keeping track of extra information we wanted to make new  tools available so refactoring tools and other analysis tools that the GCC never supported  also leveraging the extra information we kept um but enabling those new classes of tools that  then get built into idees and so that'
s been one of the one of the areas that clang has really  helped uh push the world forward in is in the tooling for C and C++ and things like that but  C++ and the front end piece is complicated and you have to build syntax trees and you have  to check every rule in the spec and you have to turn that back into an error message to the  human human that the human can understand when they do something wrong but then you start  doing the uh what's called lowering so going from C++ in the way that it
represents code down  to the machine and when you do that there's many different phases you go through often there are  I think lvm is something like 150 different what are called p in the compiler that the code pass  passes through and these get organized in very complicated ways which affect the generated code  in the performance and compile time and many other things struct you here yeah so in in the  parser it's usually a tree it's called an abstract syntax tree and so the idea is you you 
have a node for the plus that the human wrote in their code or the function call you'll have a  node for call with the function that they call and the arguments they pass things like that  this then gets lowered into what's called an intermediate representation and intermediate  representations are like lvm has one and there it's a um it's what's called a control flow  graph and so you represent each operation in the program as a very simple like this is  going to add two numbers this is going t
o multiply two things maybe we'll do a call but  then they get put in what are called blocks blocks and so you get blocks of these straight  line operations where instead of being nested like in a tree it's straight line operations  and so there's a sequence and an ordering to these operations within the block or outside the  block that that's within the block and so it's a straight line sequence of operations within the  block and then you have branches like conditional branches between blocks
and so when you write a  loop for example um in a syntax tree you would have a four node like for four statement in a c  like language you a forward node then you have a pointer to the expression for the initializer  a pointer to the expression for the increment a pointer to the expression for the comparison a  pointer to the body okay and these are all nested underneath it in a control flow graph you get a  block for the the code that runs before the loop so the initializer code then you have a
block for  the body of the loop and so the the body of the loop code goes in there but also the increment  and other things like that and then you have a branch that goes back to the top and a comparison  and a branch that goes out and so it's more of a assembly level kind of representation but the  nice thing about this level of representation is it's much more language independent and so  there's lots of different kinds of languages with different kinds of um you know JavaScript  has a lot of
different ideas of what is false for example and all that can stay the front end  but then that middle part can be shared across all those they're they're quite different in  details but they're very similar in idea so one of the things that noral networks do  is they learn representations for data at different levels of abstraction right and then  they transform those through layers right um so the compiler does very similar things but one of  the things the compiler does is it has relatively
few different representations where a neural  network often as you get deeper for example you get many different representations and each  you know layer or set of Ops is transforming between these different represent ations in a  compiler often you get one representation and then do many transformations to it and these  Transformations are often applied iteratively um and for programmers they're familiar types of  things for example trying to find uh Expressions inside of a loop and pulling the
m out of a loop  so they execute for your times or find redundant computation or find um uh constant folding or  other simplifications turning you know 2 * X into x shift left by one and and things like this  are all all all the examples of the things that happen but compilers end up getting a lot of uh  theorem proving and other kinds of algorithms that try to find higher level properties of the  program that then can be used by the optimizer cool so what's like the biggest bang for the buck  w
ith uh optimization what's uh today yeah well no not even today at the very beginning the ' 80s  I don't know but okay yeah so for the ' 80s a lot of it was things like register allocation so  the idea of in in a modern like a microprocessor what you'll end up having is you'll end up having  memory which is relatively slow and then you have registers they relatively fast but registers  uh you don't have very many of them okay and so when you're writing a bunch of code you're  just saying like co
mpute this put in temporary variable compute this compute this compute this  put in temporary variable I have a loop I have some other stuff going on well now you're running  on an x86 like a desktop PC or something well it only has in some cases some modes eight registers  right and so now the compiler has to choose what values get put in what registers at what points  in the program and this is actually a really big deal so if you think about you have a loop an an  inner loop that executes mil
lions of times maybe if you're doing loads and stores inside that Loop  then it's going to be really slow but if you can somehow fit all the values inside that Loop in  registers now it's really fast and so getting that right requires a lot of work because there's  many different ways to do that and often what the compiler ends up doing is it ends up thinking  about things in a different representation than what the human wrote right you wrote in X well  the compiler thinks about that as four di
fferent values each which have different lifetimes across  the function that it's in and each of those could be put in a register or memory or different memory  or maybe in some parts of the code recomputed instead of stored and reloaded and there are many  of these different kinds of techniques that can be used absolutely and so the the risk era made  things RIS so so risk chips RC the the risk risk chips as opposed to cisk chips the risk  chips uh made things more complicated for the compiler
because um what they ended up doing  is ending up adding pipelines to the processor where the processor can do more than one thing at  a time but this means that the order of operations matters a lot and so one of the classical compiler  techniques that you use is called uh scheduling and so moving the instructions around so that the  processor can like keep its pipelines full instead of stalling and getting blocked and so there's a  lot of things like that that are kind of bread and butter comp
iler techniques that have been studied  a lot over the course of decades now but the engineering side of making them real is also still  quite hard and you talk about machine learning this is this is a huge opportunity for machine  learning because many of these algorithms are full of these like Hokey handrolled her istics which  work well on specific benchmarks don't generalize and full of magic numbers and you know I he  there's some techniques that are good at handling that you can pick your
metric  and there's there's running time there's memory use there's there's lots  of different things that you can optimize for code code size is another one that  some people care about in the embedded space so this is something that is I I would  say research right now um there are a lot of research systems that have been applying search in  various forms and using re enforcement learning is one form but also Brute Force search has  been tried for quite a while and usually these are in small s
mall problem spaces so find  the optimal way to cenate a matrix multiply for a GPU right something like that where you say  there there's a lot of uh design space of do you unroll Loops a lot do you execute multiple  things in parallel and there's many different founding factors here because graphics  cards have different numbers of threads and registers and execution ports and memory  bandwidth and many different constraints that interact in uh nonlinear ways and so search is  very powerful for
that and it gets used in in certain ways but it's not very structured this is  something that we need we as an industry need to fix yeah so it's largely been driven by Hardware  so um hard well hardware and software so in the mid 90s Java totally changed the world right and  and and I'm still amaz by how much change was introduced by in in a good way so like reflecting  back Java introduced things like all at once introduce things like jit compilation I None  of these were novel but it pulled i
t together and made it mainstream and and made people invest  in it jit compilation garbage collection portable code safe code s like memory safe code um like a  a very Dynamic dispatch execution model like many of these things which had been done in research  systems and had been done in small small ways and various places really came to the Forefront and  really changed how things worked and therefore changed the way people thought about the problem  um JavaScript was another major world chang
e based on the way it works um but also on the hardware  side of things um uh multicore and and Vector instructions really change the problem space and  are very um they don't remove any of the problems that compilers faced in the past but they they add  new kinds of problems of how do you find enough work to keep a four wide Vector busy right or if  you're doing a matrix multiplication how do you do different columns out of that Matrix in at the  same time and how do you maximum utilize the the
arithmetic compute that one core has and then how  do you take it to multiple cores and how did the whole virtual machine thing change the compilation  pipeline the yeah so so what what the Java virtual machine does is it splits just like I was talking  about before where you have a front end that parses the code and then you have an intermediate  representation that gets transformed what Java did was they said we will parse the code and then  compile to what's known as Java bik code and that b
ik code is now a portable code representation  that is industry standard and locked down and can't change and then the the back part of the  compiler that that does optimization and code generation can now be built by different vendors  okay and Java B code can be shipped around across the wire it's memory safe and relatively trusted  and because of that it can run in the browser and that's why it runs in the browser right and so  that way you can be in you know again back in the day you would w
rite a Java applet and you use  a as a little as a web developer you'd build this mini app that would run a web page well a user of  that is running a web browser on their computer you download that that Java bite code which can  be trusted and then you do uh all the compiler stuff on your machine so that you know that you  trust that was that a good idea or a bad idea it's a great idea I mean it's a great idea for  certain problems and I'm I'm very much a believer that technology is itself neit
her good nor bad  it's how you apply it you know this would be a very very bad thing for very low levels of the  software stack but but in terms of solving some of these software portability and transparency or  portability problems I think it's been really good now Java ultimately didn't win out on the desktop  and like there are good reasons for that but um it's been very successful on servers and in many  places it's been a very successful thing over over decades yeah I think that the interes
ting thing  about lvm is not the Innovations and compiler research it has very good implementations of  very important algorithms no doubt um and a lot of really smart people have worked on it  but I think that the thing that's most profound about lovm is that through standardization it  made things possible that otherwise wouldn't have happened okay and so interesting things  that have happened with lvm for example Sony has picked up lvm and used it to do all  the graphics compilation in their
movie production Pipeline and so now they're able to  have better special effects because LM that's kind of cool that's not what it was designed  for right but that's that's the sign of good infrastructure when uh it can be used in ways  it was never designed for because it has good layering and software engineering and it's  composable and and things like that which is where as you said it differs from GCC yes  GCC is also great various ways but it's not as good as a infrastructure technology 
it's it's you know it's really a c compiler or it's or it's a for train compiler it's  not it's not infrastructured in the same way entirely possible if you well so you've  used code it's generated probably so clang is and LM are used to compile all the apps  on the iPhone effectively and the os's IT compiles Google's produ server applications  it's used to build like GameCube games and Playstation 4 and things like that so a user I  have but uh just everything I've done U that I experienced thr
ough Linux has been I believe  always GCC yeah I think Linux still defaults to GCC and and is there a reason for that or is  it be I mean is there reason it's a combination of Technical and social reasons um many G or  many link developers do use do use clang um but the distributions for lots of reasons uh  used GCC historically and they've not switched yeah the way I would say it is that they're with  they're so close it doesn't matter yeah exactly like they're slightly better in some way sligh
tly  worse than other ways but it doesn't actually really matter anymore um that level so in terms  of optimization breakthroughs it's just been solid incremental work you're saying yeah which which  is which describes a lot of compiling the hard the hard thing about compilers in my experience  is the engineering the the software engineering making it so that you can have hundreds of people  collaborating on really detailed low-level work and scaling that and that's that's really hard and  that'
s one of the things I think lvm has done well uh and that kind of goes back to the original  design goals with it to be modular and things like that and and incidentally I don't want to  take all the credit for this right I mean some of the the the best parts about lvm is that was  designed to be modular and when I started I would write for example a register allocator and then  somebody much smarter than me would come in and pull it out and replace it with something else  that they would come u
p with and because it's modular they were able to do that and that's one  of the challenges with with GCC for example is replacing subsystems is is incredibly difficult  it it it can be done but it wasn't designed for that and that's one of the reasons the lvm has  been very successful in the research world as well oh yeah so I I mean I still have  something like an order magnitude more patches in lvm than anybody else  um and uh many of those I wrote myself yeah I S right code not as much as I
was  able to in grad school but that's an important part of my identity but the way the Elum is worked  over time is that when I was a grad student I could do all the work and steer everything and  review every patch and and make sure everything was done exactly the way my opinionated sense felt  like it should be done um and that was fine but as s scale you can't do that right and so what ends  up happening is lvm has a hierarchical system of what's called code owners these code owners are  giv
en the responsibility not to do all the work not necessarily to review all the patches but  to make sure that the patches do get reviewed and make sure that the right things happening  architecturally in their area and so what you'll see is you'll see that um for example Hardware  manufacturers end up owning the the the hardware specific parts of their their their Hardware  that's very common um leaders in the community that have done really good work naturally  become the deao owner of somethin
g and then usually somebody else is like how about we make  them the official code owner and then and then we'll have somebody to make sure all patches get  reviewed in a timely manner and then everybody's like yes that's obvious and then it happens right  and usually this is a very organic thing which is great and so I'm nominally the top of that stack  still but I don't spend a lot of time reviewing patches what I do is I help um negotiate a lot  of the the technical disagreements that end up
happening and making sure that the community as  a whole makes progress and is moving in the right direction and and doing that so we also started  a nonprofit six years ago 7 years ago it's time's gone away and the nonprofit the the lvm foundation  nonprofit uh helps oversee all the business sides of things and make sure that the events that the  lvm community has are funded and set up and run correctly and stuff like that but the foundation  is very much uh stays out of the technical side of u
h where where the project is going right so it  sounds like a lot of it is uh just organic just uh yeah well and this is lvm is almost 20 years  old which is hard to believe somebody pointed out to me recently that uh lvm is now older than GCC  was when lvm started right so time has a way of getting away from you but the the good thing  about that is it has a really robust really amazing community of people that are in their  professional lives spread across lots of different companies but it's
a it's a community of um people  that are interested in similar kinds of problems and have been working together effectively for  years and um have a lot of trust and respect for each other and even if they don't always  agree that you know we're able to find a path forward so so the these are different questions yeah I  know but I want to talk about the other stuff I'll stay I'll stay on the technical side then  we can talk about the big team pieces that's okay sure so it has to really oversimp
lify many years  of hard work um lvm started joined Apple became a thing became successful and became deployed but  then there's a question about how what how do we actually parse the source code so lvm is that back  part the optimizer and the code generator and lvm was really good for apple as it went through a  couple of Hardware transitions I joined right at the time of the Intel transition for example and  uh 64-bit Transitions and then the transition to arm with the iPhone and so lvm was ve
ry useful for  some of these kinds of things um but at the same time there's a lot of questions around developer  experience and so if you're a programmer pounding out at the time Objective C code the error message  you get the compile time the turnaround cycle the the Tooling in the IDE were not great um were not  as good as they could be and so um uh you know as as I occasionally do I'm like well okay how hard  is it to write to see compiler right right and so I I'm not going to commit to anyb
ody I'm not  going to tell anybody I'm just going to just do it on nights and weekends and start working  on it and then you know I built up in and see there's this thing called the pre-processor which  people don't like but it's actually really hard and complicated and includes a bunch of really  weird things like trigraphs and other stuff like that that are that're really nasty and it's the  Crux of of a bunch of the performance issues in the compiler um started working on the parser  and kind
of got to the point where I'm like ah you know what we could actually do this this  everybody's saying that this is impossible to do but it's actually just hard it's not impossible  and um eventually told my manager about it and he's like oh wow this is great we do need to  solve this problem Oh this is great we can like get you one other person to work with you on  this you know and uh slowly a team is formed and it starts taking off and C++ for example  huge complicated language people always
assume that it's impossible to implement and it's very  nearly impossible but um it's just really really hard and the way to get there is to build it one  piece at a time incrementally and and there that was only possible because we were lucky to hire  some really exceptional Engineers that that knew various parts of it very well and and could do  great things Swift was kind of a similar thing so Swift came from um we were just finishing  off the first version of C++ support in in playing and u
m C++ is a very formidable and  very important language but it's also ugly in lots of ways and you can't Implement C++ without  thinking there has to be a better thing right and so I started working on Swift again with no hope  or ambition that would go anywhere just uh let's see what could be done let's play around with  this thing it was you know me in my spare time not telling anybody about it kind of a thing um  and it made some good progress I'm like actually it would make sense to do this
at the same time I  started talking with the um senior VP of software at the time a guy named brron surle and brron was  very encouraging he was like well you know let's let's have fun let's talk about this and he was  a little bit of a language guy and so he helped guide some of the the early work and encouraged me  and like got things off the ground and eventually told other told like my manager and told other  people and um and it started making progress um the the complicating thing with swi
ft was  that the idea of doing a new language is not obvious to anybody including myself and the tone  at the time was that the iPhone was successful because of objective c right oh interesting not  despite of or just because of and and you have to understand that at the time uh Apple was hiring  software people that loved Objective C right and it wasn't that they came despite Objective C  it's they loved Objective C and that's why they got hired and so you had a software team that the  leadersh
ip in in many cases went all the way back to next where Objective C really became real  and so they grew quote unquote grew up writing Objective C and many of the individual Engineers  all were hired because they loved Objective C and so this notion of okay let's do new language was  kind of heretical in many ways right um Meanwhile my my sense was that the outside Community wasn't  really in love with Objective C some people were and some of the most outspoken people were but  other people were
hitting challenges because it has very sharp corners and it's difficult to  learn um and so one of the challenges of making Swift happened that was totally non-technical is  the the social part of what do we do like if we do a new language which at Apple many things happen  that don't ship right so if we if we ship it what what what is the metrics of success why would we  do this why wouldn't we make obcy better if oby has problems let's file off those rough corners  and edges MH and one of the
major things that became the reason to do this was this notion of  safety memory safety and the way Objective C works is that a lot of the object system and everything  else um is built on top of pointers in C Objective C is an extension on top of c and so pointers are  unsafe and if you get rid of the pointers it's not Objective C anymore right and so fundamentally  that was an issue that you could not fix safety or memory safety without fundamentally changing  the language and so once we got
through that part of the the mental process and the thought  process it became a design process of saying okay well if we're going to do something new what  what is good like how do we think about this and what do we like and what are we looking for  and and that that was a very different phase of it yeah so so some of those were obvious  given the context so a types language for example objective SE a typed language  and going with an untyped language um wasn't really seriously considered we  w
anted we wanted the performance and we wanted refactoring tools and other  things like that to go with type languages yes that's not a dumb question earlier  I think late '90s Apple had seriously considered moving its development experience to Java but  Swift started in 2010 which was several years after the iPhone and was when the iPhone was  definitely on an upper trajectory and the iPhone was still extremely and is still a bit memory  constrained right and so being able to compile the code an
d then ship it and then have having  Standalone code that is not jit compiled was is is a very big deal and it's very much part of the  Apple um value system okay now javascript's also a thing right I mean it's not it's not that this  is exclusive and Technologies are good depending on how they're applied right um but in the design  of Swift saying like how can we make objectiv see better right objectiv see was statically compiled  and that was the contiguous natural thing to do the right thing
still the right thing yeah so  the the funny thing after working on compilers for a really long time is that uh and one of this  is one of the things that lvm has helped with is that I don't look as comp at compilations  being static or dynamic or interpreted or not this is a spectrum okay and one of the cool things  about Swift is that Swift is not just statically compiled MH it's actually dynamically compiled  as well and it can also be interpreted though nobody's actually done that um and so
what what  ends up happening when you use Swift in workbook for example in collab or in Jupiter is it's  actually dynamically compiling the statements as you execute them and so this get back to the  the software engineering problems right where if you layer the stack properly you can actually  completely change how and when things get compiled because you have the right abstractions there  and so the way that a collab workbook works with swift is that um when we start typing into  it it creates
a process a Unix process and then each line of code you type in it compiles it  through the Swift compiler the the front end part and then sends it through the optimizer  jit compiles machine code and then injects it into that process and so as you're typing new  stuff it's putting it's like squirting in new code and overwriting and replacing and updating  code in place and the fact that it can do this is not an accident like Swift was designed for  this um but it it's an important part of how
the language was set up and how it's layered  and and this is a non- obvious piece and one of the things with swift that was for me a very  strong design point is to make it so that you can learn it very quickly and so from a language  design perspective the thing that I always come back to is this UI principle of um Progressive  disclosure of complexity and so in Swift you can start by saying print quote hello world quote  right and there's no sln just like python one line of code no main no no
header files no header  files no public static class void blah blah blah string like Java has right it's one line of code  right and you can teach that and it works great um then you can say well let's introduce variables  and so you can declare a variable with VAR so VAR x equal 4 what is a variable you can use x X+  one this is what it means then you can say well how about control flow well this is what an if  statement is this is what a force statement is this is what a while statement is um
then you can  say let's introduce functions right and and many languages like python have had this this kind  of notion of let's introduce small things and then you can add complexity then you can introduce  classes and then you can add generics in the case of Swift and then you can in modules and build out  in terms of the things that you're expressing but um this is not very typical for compil languages  and so this was a very strong design point and one of the reasons that um Swift in genera
l is  designed with this factoring of complexity in mind so that the language can express powerful  things you can write firmware and Swift if you want to um but uh it has a very high level feel  which is really this Perfect Blend because often you have very Advanced Library writers that  want to be able to use the the nitty-gritty details but then other people just want to use the  libraries and work at a higher abstraction level yeah it's as easy as it look that's not that's  not a stage magic
hack or anything like that no no I I don't mean from the user perspective  I mean from the implementation perspective to make it happen so it's it's easy Once all the  pieces are in place the way it works so if you think about a dynamically typed language like  python right you can think about it is in two different ways you can say it has no types right  which is what most people would say or you can say it has one type mhm right and you can say  has one type and it's like the python object MH
right and the python object gets passed around  and because there's only one type it's implicit okay and so what happens with Swift and python  talking to each other Swift has lots of types right it has arrays and it has strings and all  all all like classes and that kind of stuff but it now has a python object type right so there  is one python object type and so when you say import uh numpy what you get is a python object  which is the numpy module then you say NP array and it says okay hey h
ey python object I have  no idea what you are give me your array member right okay cool and just it just uses Dynamic  stuff talks to the python interpreter and says hey python what's the dot array member in that  python object it gives you back another python object and now you say parentheses for the call  and the arguments you're going to pass and so then it says Hey a python object that is the  result of np. array call with these arguments right again calling into the python interpreter  to
do that work and so um right now this is all really simple and if you if you dive into the code  what you'll see is that the the python module in Swift is something like 1,200 lines of code or  something it's written in pure Swift it's super simple and it's and it's built on top of the C  interoperability because just talks to the python interpreter um but making that possible required  us to add two major language features to Swift to be able to express these Dynamic calls and the  dynamic memb
er lookups and so what we've done over the last year is we've proposed Implement  standardized and contributed new language features to the Swift language in order to make it so  it is really trivial right and this is one of the things about Swift that is uh critical to  the Swift reten flow work which is that we can actually add new language features and the bar for  adding those is high but it's it's what makes it possible yeah so I I I'm tangentially involved  in this but the the way that it
works with autograph is that you mark your your function  with a decorator and when python calls it that decorator is invoked and then it says before I  call this function you can transform it and so the way autograph works is as far as I understand  is it actually uses the python Purser to go purse that turn into a syntax tree and now apply  compiler techniques to again transform this down into tensorflow graphs and so it you  can think of it as saying hey I have an if statement I'm going to cr
eate an if node in  the graph like you say TF cond um you have a a multiply well I'll turn that into a multiply  node in the graph and it becomes this tree transformation so so the tensorflow world has  a couple of different what I'd call Front End Technologies and so Swift and Python and go  and Russ and Julia and all these things share the tensorflow graphs and all the runtime and  everything that's later okay and so Swift for tens flow is merely another front end for tensor  flow um just like
any of these other systems are um there's a major difference between I would  say three camps of Technologies here there's python which is a special case because the vast  majority of the community efforts go into the python interface and python has its own approaches  for automatic differentiation it has its own apis and all all kind of stuff um there's Swift which  I'll talk about in a second and then there's kind of everything else and so the everything else in  are effectively language bind
ings so they they call into the tensorflow runtime but they're not  they usually don't have automatic differentiation or they usually don't um provide anything other  than apis that call the C API in tensorflow and so they're kind of wrappers for that Swift is  really kind of special and it's and it's a very different approach um Swift forens flow that is  is is a very different approach because there we're saying let's look at all the problems  that need to be solved in the full stack of um the
tensorflow compilation process if you  think about it that way because tensorflow is fundamentally a compiler it takes models and then  it makes them go fast on Hardware M that's what a compiler does and it has a front end it has an  Optimizer and it has many back ends and so if you think about it the right way or in in if if  you if you look at it in a particular way like it is a compiler mhm okay and um and so Swift  is merely another front end but it's saying in the the design principle is s
aying let's look at  all the problems that we face as machine learning practitioners and what is the best possible way  we can do that given the fact that we can change literally anything in this entire stack and  python for example where the vast majority of the engineering and uh and effort has gone into  is constrained by being the best possible thing you can do with the python Library like there  are no Python language features that are added because of machine learning that I'm aware of  um
they added a matrix multiplication operator with that but that's as close as you get and so  with swift you can you it's hard but you can add language features to the language and there's  a community process for that and so we look at these things and say well what is the right  division of labor between the human programmer and the compiler and Swift has a number of things  that shift that balance so because it's it's a um uh because it has a type system for example it  makes certain things p
ossible for analysis of the code and the compiler can automatically build  graphs for you without you thinking about them like that's that's a big deal for a programmer  you just get free performance you get clustering and fusion and optimization and things like  that um without you as a programmer having to manually do it because the compiler can do it  for you automatic differentiation is another big deal and it's I think one of the the the key  contributions of of the Swift for t flow project
is that there's this entire body of work on  automatic differentiation that dates back to the Fortran days people doing a tremendous amount of  numerical Computing in Fortran used to write these what they call Source to Source translators where  you where you take a bunch of code shove it into a mini compiler and it would push out more Fortran  code but it would generate the backwards passes for your functions for you the derivatives right  and so um in that work in the 70s a tremendous number
of optimizations a tremendous number of  uh techniques for fixing numerical instability and other other kinds of problems were developed  but they're very difficult to Port into a world where in eager execution you get an OP by op  at a time like you need to be able to look at an entire function and be able to reason about  what's going on and so when you have a language integrated automatic differentiation which is one  of the things that the Swift project is focusing on um you can open open al
l these techniques  and and reuse them and and in familiar ways um but the language integration piece has  a bunch of design room in it and it's also complicated there's an incredible amount so  we're on our third generation of tpus which are now 100 pedop flops in a very large liquid  cooled box virtual box with no cover and as you might imagine we're not out of ideas yet the the  great thing about tpus is that they're a perfect example of Hardware software Cod design and so  it's about it's ab
out saying what Hardware do we build to solve certain classes of um machine  learning problems well the algorithms are changing like the hardware takes you know some cases years  to produce right and so you have to make bets and decide what is going to happen and so and what  is the best way to spend the transistors to get the maximum you know performance per watt  or area per cost or like what whatever it is that you're optimizing for and so one of the  amazing things about tpus is this numeric
format called B float 16 B float 16 is a compressed  16bit floating Point format but it puts the bits in different places in numeric terms it  has a smaller mantisa and a larger exponent that means that it's less uh precise but it can  represent larger ranges of values which in the machine learning context is really important and  useful because sometimes you have very small um gradients you want to accumulate and very very  small numbers that uh are important to to move things as you're learni
ng but sometimes you  have very large magnitude numbers as well and B flat 16 is not as precise the mantisa is small  but it turns out the machine learning algorithms actually want to generalize and so there's you  know theories that this actually increases gener the ability for the network to generalize across  data sets and um regardless of whether it's good or bad it's much cheaper at the harder level  to implement because the area and time of a multiplier is n squ in the number of bits in  t
he mantisa but it's linear with size of the exponent and you connected to both efforts here  both on the hardware and the software side yeah and so that that was a breakthrough coming  from the research side and people working on optimizing Network transport of uh weights  across a network originally and trying to find ways to compress that but then it got burned  into silicon and it's a key part of what makes TPU performance so amazing and and and great now  tpus have many different aspects of
that that are important but the uh the code design between  the low-level compiler bits and the software bits and the algorithms is all super important and  it's this amazing trifactor that only Google can do yeah so mlr is a project that we announced at  a compiler conference 3 weeks ago or something at the compilers for machine learning conference  basically if again if you look at tensorflow as a compiler stack it has a number of compiler  algorithms within it it also has a number of compiler
s that get embedded into it and they're  made by different vendors for example uh Google has xlaa which is a great compiler system  Invidia has tensor RT Intel has Eng graph um there there's a number of these different  compiler systems and um they're very Hardware specific and they're trying to solve different  parts of the problems um but they're all kind of similar in a sense of they want to integrate  with tensorflow now tensorflow has an Optimizer and it has these different code generation 
Technologies built in the idea of M is to build a common infrastructure to support all  these different subsystems and initially it's to be able to make it so that they all plug in  together and they can share a lot more code and can be reusable but over time we hope that the  industry will uh start collaborating and sharing code and instead of Reinventing the same things  over and over again that we can actually Foster some of that that uh you know working together to  solve common problem ene
rgy that has been useful in the compiler field before beyond that ml is  uh some people have joke that it's kind of lvm2 it learns a lot about what lvm has been good and  what lvm has done wrong and it's a chance to fix that um and also there are challenges in the  lvm ecosystem as well where lvm is very good at the thing it was designed to do but you  know 20 years later the world has changed and people are trying to solve higher level  problems and we need we need some new tech technology so b
etween the two I prefer the Google  approach if that's what you're saying the Apple approach makes sense given the historical  context that Apple came from but that's been 35 years ago and I think that apple is definitely  adapting and the way I look at it is that there's different kinds of concerns in the space right  it is very rational for a business to to care about making money that fundamentally is what  a business is about right but I think it's also incredibly realistic to say it's not y
our string  Library that's the thing that's going to make you money it's going to be the amazing UI product  differentiating features and other things like that that you build on top of your string library  and so um keeping your string Library proprietary and secret and things like that is maybe not the  the important thing anymore right where before platforms were were different right and even 15  years ago it things were a little bit different but the world is changing so Google strikes  a ve
ry good balance I think and um I think the tensor flow being open source really changed  the entire machine learning field and it caused a revolution in its own right and so I think  it's amazing forward amazingly forward looking because um I could have imagined and I wasn't at  Google at the time but I could imagine a different context in a different world where a company says  machine learning is critical to what we're doing we're not going to give it to other people right  and so that decisio
n is a profound a profoundly brilliant Insight that I think has really led to  the world being better and better for Google as well well and it's been and again I can understand  the the the concern about if we release our machine learning software our our competitors  could go faster but on the other hand I think that open sourcing test flow has been fantastic  for Google and um it I'm sure that obvious was that that that decision was very non obvious  at the time but I think it's worked out ve
ry well well so I mean I I don't think Tesla has a  culture of taking things slow and seeing how it goes so I one of the things that attracted me  about Tesla is it's very much a gung-ho let's change the world let's figure it out kind of a  place and so I have a huge amount of respect for that Tesla has done very smart things with  Hardware one particular and the hardware 1 design was originally designed to be um very  simple automation features in the car for like traffic aware cruise control a
nd things like that  and the fact that they were able to effectively feature creep it into Lane holding and and um  a very useful driver assistance feature is is pretty astounding particularly given the details  of the hardware Hardware 2 built on that in a lot of ways and the challenge there was that they  were transitioning from a third party provided Vision stack to an in-house built Vision stack  and so for the first step which I mostly helped with was getting onto that new vision stack and 
um that was very challenging and there were it was time critical for various reasons and it was  a big leap but it was fortunate that it built on a lot of the knowledge and expertise and the team  that had built Hardware 1's driver assistance features yeah so I guess I would say that when  I was at Tesla I experienced and saw very the highest degree of turnover I'd ever seen in  a company right which was a bit of a shock but one of the things I learned and I came to  respect is that elon's able
to attract amazing talent because he has a very clear vision of  the future and you can get people to buy into it because they want that future to happen  right and the power of vision is something that I have a tremendous amount of respect for  and I think that Elon is fairly singular in the world in terms of the uh things he's able  to get people to believe in and it's it's a very it's very there may people that stand  in the street corner and say ah we're going to go to Mars right but then b
ut then there  are a few people that can get other others to buy into it and believe in build the path  and make it happen and so I I respect that um I I don't respect all of his methods but  but I I have a huge amount of respect for that yeah good question um so working hard can  be defined a lot of different ways so a lot of hours and so that's that is true the thing to  me that's the hardest is both being shortterm focused on delivering an executing and making a  thing happen while also think
ing about the longer term and trying to balance that right because if  you are myopically focus on solving a task and getting that done um and only think about that  incremental next step you will miss the next big hill you should jump over to right and so I've  been really fortunate that I've been able to kind of oscillate between the two and historically at  Apple for example that was made possible because I was able to work with some really amazing people  and build up teams and Leadership st
ructures and um and allow them to grow in their careers and  take on responsibility thereby freeing up me to be a little bit crazy and thinking about the  next thing and so it's it's a lot of that but it's also about you know with the experience  you make connections that other people don't necessarily make and so I think that is that's  a big part as well but the Bedrock is just a lot of hours and you know that's that's okay with  me um there's different theories on work life balance and uh my
theory for myself which I do not  project onto the team but my theory for myself is that you know I I want to love what I'm doing  and work really hard and my purpose I feel like in my goal is to change the world and make it a  better place and that's that's where I'm really motivated so those are all very kind ways of  explaining it the do you want to know the real reason it's a dragon well yeah yeah is that better  so there is a a seminal book on compiler design called the Dragon book and so t
his is a really  old now uh book on compilers and so um the the dragon logo for lovm came about because at app we  kept talking about lvm related Technologies and there's no logo to put on a slide and we're like  what what do we do and somebody's like well what kind of logo should a compiler technology have  and I'm like I don't know I mean the dragons are a dragon is the best thing that that we've got and  you know Apple somehow magically came up with the logo and and it was a great thing and t
he whole  Community rallied around it and uh and then it got better as other graphic designers got involved but  that's that's originally where it came from Lord of the Rings is great I also like role playing  games and things like in computer role playing games and so dragons often show up in there but  but really it comes back to to to the book oh no we need we need a thing and and hilariously  one of the one of the the funny things about lvm is that my wife who's amazing runs the the lvm  fou
ndation and she goes to Grace Hopper and is trying to get more women involved in the she's  also compiler engineer so she's trying to get other other women to get interested in compilers  and things like this and so she hands out the stickers and people like the Elum sticker  because of Game of Thrones and so sometimes culture has this helpful effect to like get the  next generation of compiler Engineers engaged with the cause okay awesome Chris thanks so much  for talking it's been great talkin
g with [Music] you [Music] this is the Lex free podcast

Comments