Main

Are GFlowNets the future of AI?

Should you care about GFlowNets? What are they anyway? Learn about how GFlowNets are aiding drug discovery and reasoning in large language models! **Like, subscribe, and share if you find this video valuable!** Tutorial: https://milayb.notion.site/The-GFlowNet-Tutorial-95434ef0e2d94c24aab90e69b30be9b3 0:00 - Why care about GFlowNets? 0:54 - The problems GFlowNets solve 1:39 - A concrete example: drug discovery 3:53 - What GFlowNet really is 4:46 - Applications: GFlowNet-EM 5:58 - Applications: Better LLM reasoning 6:55 - Conclusion Papers mentioned: - GFlowNet for drug discovery (first GFlowNet paper) https://arxiv.org/abs/2106.04399 - Jointly training a GFlowNet and an energy-based model https://arxiv.org/abs/2202.01361 - GFlowNet-EM https://arxiv.org/abs/2302.06576 - GFlowNet for better reasoning in LLMs https://arxiv.org/pdf/2310.04363.pdf Follow me on Twitter: https://twitter.com/edwardjhu 🙏 This video would not be possible without my wonderful labmates at Mila and, of course, Yoshua.

Edward Hu

4 days ago

Why should you care about GFlowNets? Is it the next Transformer? Is it Yoshua bengio's pet project? Or is it one of those ideas that are so 2010s when all the cool kids are training  large language models in the 2020s? Today, I'll talk about why you should care  about GFlowNets and why it is the future. My name is Edward Hu. I'm not exactly  impartial here because Yoshua Bengio is my PhD advisor, and I directly worked  on GFlowNet, but I also love what works! I invented low-rank adaptation, or 
LoRA, when I was a researcher at Microsoft working with GPT-3 -- and  by the way here's a video on LoRA. Today, I'm a research scientist at  OpenAI. So, I want to talk about what makes GFlowNets so exciting, and how  it's going to shape the future of AI. First of all, GFlowNet sounds like a type of  neural network like a Transformer or a ResNet. However, it is not. GFlowNet stands for generative  flow networks and is a learning algorithm. And before I tell you more  about the algorithm itself, I
'm going to start with the problem it solves. So, if you ask an AI practitioner  what their worst nightmare is, most people will tell you it's either  over-fitting or hyper-parameter tuning. And by the way, I have another video  on muTransfer which is a technique that allows you to tune hyper-parameters  for a large model much more easily, which I'll link here, but today we're  going to talk about overfitting. Overfitting usually happens when we  ask the model to maximize something. For example,
we might be maximizing the likelihood  of a dataset; the best way -- the perfect way -- to maximize the likelihood of a dataset is to  memorize it completely without "understanding" it. And if you just memorize the  answers to certain questions, it's not going to help you to generalize  to the questions you've never seen before. And by the way, something similar happens  in reinforcement learning as well, where if you maximize the reward,  very often what you find is hacks. I'm going to give yo
u a really  concrete scenario where this shows up. Say you're inventing a new drug  molecule through trials and errors. What you're doing is basically reinforcement  learning albeit with a really expensive reward function, because the real reward function  is: you take this drug and run a clinical trial and you get a reward in the end,  but it's extremely slow and expensive. What people do in practice is  they collect some clinical data and then they train a neural network  to simulate the real
reward function. Now, you can imagine what happens  if you maximize a reward under this proxy model that is far from perfect. You're going to find a molecule that obtains  an extremely high reward under this model, but the chance of that molecule being actually  the drug you want is low, because there's so many ways to trick a neural network into  thinking a molecule has a high reward. However, this reward function is still  capturing some information about drug worthiness, even though we don't
trust it 100%. Practically speaking, we don't just want the single best molecule under this  reward -- we want many good ones. Even better, these good molecules should be  as different as possible from one another. What we do then is we try  them all in the real world, and hopefully some of them are actually good. If we take just the best one,  it is almost guaranteed that this one molecule is exploiting some  imperfections in our reward model. In this example here, I'm highlighting  the importa
nce of having diversity as opposed to finding just the max like in maximum  likelihood estimation or reward maximization. In fact, here's an idea: say we have  a reward function for molecules. What If instead of just getting a single  molecule, we have a generator of molecules, and here's what a generator does: it generates a  molecule with a probability that is proportional to the reward, meaning that if a molecule has a  really high reward, then it's more likely that it's going to be generated
, and if they have  a bunch of molecules that are equally highly likely under the reward function, then  they are equally likely to be generated. Now, as I generate molecules using this generator,  we're going to get candidates, and most of them are going to have high rewards because low-reward  ones have low probabilities getting generated. Imagine an objective function that allows  you to train a generator like that. The objective function and the  algorithm that allows you to do that could be
generative flow network or GFlowNet, and this drug discovery example is actually the  motivating use case in the first GFlowNet paper. However, GFlowNet is much  more than just drug discovery. The high level takeway is that  GFlowNet is a novel training algorithm. Instead of looking at a dataset or  a reward function and ask "how can I find the function that maximizes this?" GFlowNet  asks "okay, how can I find a sampler -- a neural network sampler -- that allows me to sample  proportional to a
given reward function?" If we're given a dataset  instead of a reward function, there's another paper from our lab  which says "okay, given a dataset, I'm going to first learn an energy-based  model, and then I'll train a sampler." The sampler will sample proportional  to my energy-based model. So, GFlowNet is really  shifting the question we ask: instead of maximizing something,  we're matching a distribution. And finally, I'm going to give you a quick example  of how this can be useful in the
real world, and, actually I'm going to give you two  examples -- two papers that I led. And I'm happy to dive into these  two papers in future videos, but today I'm just going to give you a  quick taste of what it could look like. The first paper is called GFlowNet-EM, and we're tackling a problem  fundamental to machine learning. And the second one is going to be more empirical:  it has something to do with large language models. Many of us have heard of the  expectation-maximization algorithm
, which is used to find maximum likelihood  estimates in latent-variable models. So here, the big idea is that in the expectation  step, we want to sample from a posterior distribution over latent variables and what  happens is that this posterior distribution for non-trivial models is usually intractable, so  people do things like Markov Chain Monte Carlo, where they make simplifying assumptions  so the posterior becomes easy to model. So now, I have this intractable  posterior distribution whi
ch can be described by a reward function. Reinforcement learning can help us  find the maximum of this posterior, but we want to match the distribution. We want to draw samples from the distribution  for learning, which is a hard inference problem, and here GFlowNet converts this hard inference  problem, usually solved with simulation, into something we can use a neural network to  solve, and we love training big neural network these days, because we're good at it, which  makes GFlowNet a bridge
between classical problems in machine learning and scaling  neural networks, which is the future of AI. In the second example, we  have a large language model, and we want to use it to solve, say,  a certain kind of reasoning task. However, we don't have a lot of data points. What  we have is, say, maybe 10, 20, or 50 data points. What we going to do, instead of doing  fine-tuning, which easily leads to overfitting, we're actually going to search the posterior  or potential reasoning chains und
er this model. Usually, people either find the  most likely reasoning chain using reinforcement learning or  maybe few-shot prompting, basically hoping the model will come up with a  good reasoning chain if we ask it really nicely. Here, we're using GFlowNet to actually  train the model to directly sample good reasoning chains that could have led  to the correct answer under the model proportional to how likely the reasoning  chain can lead to the correct answer. So the result here is that we're
able  to boost data efficiency in many cases. This paper will actually be an oral  presentation at ICLR this year, so maybe I'll see many of you in person. So long story short, GFlowNet is not  a new neural network architecture. It's a new learning algorithm that allows you  to train a sampler that samples proportional to a reward function, and it has many  applications going forward, especially as we focus on improving the generalization  and data efficiency of our neural networks. On the theo
retical side, it has connections  to maximum-entropy reinforcement learning with path consistency objectives, which I'm  happy to dive deeper into in a future video. If you find this video helpful,  please like, subscribe, and share it with somebody else who might be  interested. I'll see you in the next video!

Comments

@lobiqpidol818

I clicked the video thinking.... 😑 Oh look another "AI Expert". I invented LoRA.. Ok sir you have my attention. Hard to find real experts out here. Glad I clicked 😀.

@alfellati

Finally a real AI expert with a great track record, inventor and researcher, author of multiple research papers on AI. Great having you brother.

@tylerk3130

"Huh, what's this guy's deal? Well, GFlowNets sound cool, let's check him out" "I invented LoRA, Prof. Bengio was my PhD Advisor and I'm currently an OpenAI Research Scientist" "Ah, so 'the real' is this guy's deal...gotcha". Easy subscribe.

@pollomarzo

Great content! Might want to get a mic for your next one, it'll make all the difference

@diga4696

Thank you for taking the time to produce great content!

@arjandhaliwal4962

Fantastic video! No hype just the intuition behind current research. I love this type of content because it sets the foundational thinking required to make sense of research papers (which are normally opaque to someone who isn't deep in the field of study)

@jarno_r

Such a great video, massively underrated channel! Liked & subbed, keep it up

@christopherd.winnan8701

Bill Mollison would be proud - Optimisation rather than maximisation is the best way to go.

@Drone256

That was excellent. Thank you for taking the time to share things like this.

@definty

Oh wow. Awesome dude! Thanks for your work with LoRA and crating this video! Subbed. You sir are awesome!

@MaxenceFrenette

Love the style of video. Hope you can explain some of these more advances things in the future, it's really interesting.

@JL-zl6ot

super interesting and clear presentation as always. Thank you so much for making this content available!

@RamphyRojas

Thank you for making it open to everyone. ❤❤❤ thank you from Venezuela!

@alexander73848

People like you made YouTube great! Thank you for contributing

@Shaunmcdonogh-shaunsurfing

Nice delivery. Thanks for bringing this to us.

@zacharykosove9048

Awesome video, it really sparks my interest in neural nets. Thanks for the broad overview of what works and what doesn't when training a model.

@shaunakgalvankar4502

Awesome!!I am about to start my PhD in cs and stuff like this is way motivating and inspiring…keeping sharing your research through videos…I literally read the Lora paper…then watched your video then went back and read the paper again and then came back and watched the video again

@user-wr4yl7tx3w

this is really informative. thanks for taking time to make such videos.

@miikalewandowski7765

Brilliant topic and explanation! Looking forward for your deep dives and if you like some observations or insights you have made on your saga as a researcher 😊

@AK-ox3mv

Exciting. Waiting for more videos✌️