3 Comments

I can't wait to see how these advancements will shape the future of AI. Keep up the fantastic work!

Expand full comment

Great article. Some of these intuitions, such as why MoE can fail in the beginning and how the MoE can be equivalent to a dense model with some many params are super useful.

Any insight into methods that could be solved to make the routing problem better? Additionally, as OAI is not really a consumer company, I doubt their decision was made based on a lower inference cost (they just want the best models to find AGI-like things). Thoughts on what that means?

Expand full comment
author

> Any insight into methods that could be solved to make the routing problem better?

Not particularly, other than the soft MoE paper, which I'm going to write about next as a follow-up. I'm also quite keen on this though.

> OAI is not really a consumer company, I doubt their decision was made based on a lower inference cost (they just want the best models to find AGI-like things)

I disagree about this tbh. I think that they have to pursue consumer applications just because that's where the money is (as ChatGPT is such a success). I think that, for their commercial applications, they need to care about inference cost, if only for GPU availability reasons.

I agree that for their flagship research models they don't care about inference cost though.

Expand full comment