Are GFlowNets the future of AI?

Why should you care about GFlowNets? Is it the next Transformer? Is it Yoshua bengio's pet project? Or is it one of those ideas that are so 2010s when all the cool kids are training large language models in the 2020s? Today, I'll talk about why you should care about GFlowNets and why it is the future. My name is Edward Hu. I'm not exactly impartial here because Yoshua Bengio is my PhD advisor, and I directly worked on GFlowNet, but I also love what works! I invented low-rank adaptation, or

LoRA, when I was a researcher at Microsoft working with GPT-3 -- and by the way here's a video on LoRA. Today, I'm a research scientist at OpenAI. So, I want to talk about what makes GFlowNets so exciting, and how it's going to shape the future of AI. First of all, GFlowNet sounds like a type of neural network like a Transformer or a ResNet. However, it is not. GFlowNet stands for generative flow networks and is a learning algorithm. And before I tell you more about the algorithm itself, I

'm going to start with the problem it solves. So, if you ask an AI practitioner what their worst nightmare is, most people will tell you it's either over-fitting or hyper-parameter tuning. And by the way, I have another video on muTransfer which is a technique that allows you to tune hyper-parameters for a large model much more easily, which I'll link here, but today we're going to talk about overfitting. Overfitting usually happens when we ask the model to maximize something. For example,

we might be maximizing the likelihood of a dataset; the best way -- the perfect way -- to maximize the likelihood of a dataset is to memorize it completely without "understanding" it. And if you just memorize the answers to certain questions, it's not going to help you to generalize to the questions you've never seen before. And by the way, something similar happens in reinforcement learning as well, where if you maximize the reward, very often what you find is hacks. I'm going to give yo

u a really concrete scenario where this shows up. Say you're inventing a new drug molecule through trials and errors. What you're doing is basically reinforcement learning albeit with a really expensive reward function, because the real reward function is: you take this drug and run a clinical trial and you get a reward in the end, but it's extremely slow and expensive. What people do in practice is they collect some clinical data and then they train a neural network to simulate the real

reward function. Now, you can imagine what happens if you maximize a reward under this proxy model that is far from perfect. You're going to find a molecule that obtains an extremely high reward under this model, but the chance of that molecule being actually the drug you want is low, because there's so many ways to trick a neural network into thinking a molecule has a high reward. However, this reward function is still capturing some information about drug worthiness, even though we don't

trust it 100%. Practically speaking, we don't just want the single best molecule under this reward -- we want many good ones. Even better, these good molecules should be as different as possible from one another. What we do then is we try them all in the real world, and hopefully some of them are actually good. If we take just the best one, it is almost guaranteed that this one molecule is exploiting some imperfections in our reward model. In this example here, I'm highlighting the importa

nce of having diversity as opposed to finding just the max like in maximum likelihood estimation or reward maximization. In fact, here's an idea: say we have a reward function for molecules. What If instead of just getting a single molecule, we have a generator of molecules, and here's what a generator does: it generates a molecule with a probability that is proportional to the reward, meaning that if a molecule has a really high reward, then it's more likely that it's going to be generated

, and if they have a bunch of molecules that are equally highly likely under the reward function, then they are equally likely to be generated. Now, as I generate molecules using this generator, we're going to get candidates, and most of them are going to have high rewards because low-reward ones have low probabilities getting generated. Imagine an objective function that allows you to train a generator like that. The objective function and the algorithm that allows you to do that could be

generative flow network or GFlowNet, and this drug discovery example is actually the motivating use case in the first GFlowNet paper. However, GFlowNet is much more than just drug discovery. The high level takeway is that GFlowNet is a novel training algorithm. Instead of looking at a dataset or a reward function and ask "how can I find the function that maximizes this?" GFlowNet asks "okay, how can I find a sampler -- a neural network sampler -- that allows me to sample proportional to a

given reward function?" If we're given a dataset instead of a reward function, there's another paper from our lab which says "okay, given a dataset, I'm going to first learn an energy-based model, and then I'll train a sampler." The sampler will sample proportional to my energy-based model. So, GFlowNet is really shifting the question we ask: instead of maximizing something, we're matching a distribution. And finally, I'm going to give you a quick example of how this can be useful in the

real world, and, actually I'm going to give you two examples -- two papers that I led. And I'm happy to dive into these two papers in future videos, but today I'm just going to give you a quick taste of what it could look like. The first paper is called GFlowNet-EM, and we're tackling a problem fundamental to machine learning. And the second one is going to be more empirical: it has something to do with large language models. Many of us have heard of the expectation-maximization algorithm

, which is used to find maximum likelihood estimates in latent-variable models. So here, the big idea is that in the expectation step, we want to sample from a posterior distribution over latent variables and what happens is that this posterior distribution for non-trivial models is usually intractable, so people do things like Markov Chain Monte Carlo, where they make simplifying assumptions so the posterior becomes easy to model. So now, I have this intractable posterior distribution whi

ch can be described by a reward function. Reinforcement learning can help us find the maximum of this posterior, but we want to match the distribution. We want to draw samples from the distribution for learning, which is a hard inference problem, and here GFlowNet converts this hard inference problem, usually solved with simulation, into something we can use a neural network to solve, and we love training big neural network these days, because we're good at it, which makes GFlowNet a bridge

between classical problems in machine learning and scaling neural networks, which is the future of AI. In the second example, we have a large language model, and we want to use it to solve, say, a certain kind of reasoning task. However, we don't have a lot of data points. What we have is, say, maybe 10, 20, or 50 data points. What we going to do, instead of doing fine-tuning, which easily leads to overfitting, we're actually going to search the posterior or potential reasoning chains und

er this model. Usually, people either find the most likely reasoning chain using reinforcement learning or maybe few-shot prompting, basically hoping the model will come up with a good reasoning chain if we ask it really nicely. Here, we're using GFlowNet to actually train the model to directly sample good reasoning chains that could have led to the correct answer under the model proportional to how likely the reasoning chain can lead to the correct answer. So the result here is that we're

able to boost data efficiency in many cases. This paper will actually be an oral presentation at ICLR this year, so maybe I'll see many of you in person. So long story short, GFlowNet is not a new neural network architecture. It's a new learning algorithm that allows you to train a sampler that samples proportional to a reward function, and it has many applications going forward, especially as we focus on improving the generalization and data efficiency of our neural networks. On the theo

retical side, it has connections to maximum-entropy reinforcement learning with path consistency objectives, which I'm happy to dive deeper into in a future video. If you find this video helpful, please like, subscribe, and share it with somebody else who might be interested. I'll see you in the next video!

Comments

@lobiqpidol818

I clicked the video thinking.... 😑 Oh look another "AI Expert". I invented LoRA.. Ok sir you have my attention. Hard to find real experts out here. Glad I clicked 😀.

@alfellati

Finally a real AI expert with a great track record, inventor and researcher, author of multiple research papers on AI. Great having you brother.

@tylerk3130

"Huh, what's this guy's deal? Well, GFlowNets sound cool, let's check him out" "I invented LoRA, Prof. Bengio was my PhD Advisor and I'm currently an OpenAI Research Scientist" "Ah, so 'the real' is this guy's deal...gotcha". Easy subscribe.

@pollomarzo

Great content! Might want to get a mic for your next one, it'll make all the difference

@diga4696

Thank you for taking the time to produce great content!

@arjandhaliwal4962

Fantastic video! No hype just the intuition behind current research. I love this type of content because it sets the foundational thinking required to make sense of research papers (which are normally opaque to someone who isn't deep in the field of study)

@jarno_r

Such a great video, massively underrated channel! Liked & subbed, keep it up

@christopherd.winnan8701

Bill Mollison would be proud - Optimisation rather than maximisation is the best way to go.

@Drone256

That was excellent. Thank you for taking the time to share things like this.

@definty

Oh wow. Awesome dude! Thanks for your work with LoRA and crating this video! Subbed. You sir are awesome!

@MaxenceFrenette

Love the style of video. Hope you can explain some of these more advances things in the future, it's really interesting.

@JL-zl6ot

super interesting and clear presentation as always. Thank you so much for making this content available!

@RamphyRojas

Thank you for making it open to everyone. ❤❤❤ thank you from Venezuela!

@alexander73848

People like you made YouTube great! Thank you for contributing

@Shaunmcdonogh-shaunsurfing

Nice delivery. Thanks for bringing this to us.

@zacharykosove9048

Awesome video, it really sparks my interest in neural nets. Thanks for the broad overview of what works and what doesn't when training a model.

@shaunakgalvankar4502

Awesome!!I am about to start my PhD in cs and stuff like this is way motivating and inspiring…keeping sharing your research through videos…I literally read the Lora paper…then watched your video then went back and read the paper again and then came back and watched the video again

@user-wr4yl7tx3w

this is really informative. thanks for taking time to make such videos.

@miikalewandowski7765

Brilliant topic and explanation! Looking forward for your deep dives and if you like some observations or insights you have made on your saga as a researcher 😊

@AK-ox3mv

Exciting. Waiting for more videos✌️

Are GFlowNets the future of AI?

Related articles

Comments