Before I studied machine learning, I was an Econ grad student banging out OLS problem sets (I see the OLS equation— (X’X)^-1X’y— whenever I close my eyes, I derived it so many times).
Gavin Uberti was on Invest like the Best talking about this exact issue. He made really interesting points about the possibility of small scale runs of model-on-chip designs. It sounds a lot like your steel example.
I got excited when I thought you were going to audit some major use cases and specify which were out of reach today, which can be done by gpt4, and which can be done by open weight models. Ofc the possibilities are infinite but some tangible examples of the “low end steel” market would be v interesting
Alas. That would be cool. I don't have a strong sense of the actual use cases, to be honest. I know what I use it for, and what I've seen it be used for professionally, but I'm really familiar with the research side of things, so the pure modelling perspective. If you come across anything that shows this, I'd be quite interested.
Regarding the statement that architectures have much lower memory bandwidth than compute. I'm curious how Cerebras performs for these usecases. 40GB of on-chip SRAM at 20 PETABytes/s of memory bandwidth seems like it might upset the status quo.
It’s not either or question. If you want to analyze million docs, you might use inexpensive low precision model. If you want to work on high profile law suit, you want the best model that is there. I actually don’t think LLM API market is overvalued. It’s sure ahead of its time but the number of calls are growing exponentially as far as I can tell. We might see a crash skin to year 2000 when Internet companies went under. They were ahead of its time but right about what would future economics look like.
It's the equation for the OLS estimator (google "OLS equation"). The left-hand side is just `\hat{\beta} = ` Many people are familiar with it as it crops up a lot in ML.
Love the personal background. Would love to see more commentary on how scaling laws and capital costs predictions fit into this too
that's a great idea tbh. will try to incorporate
Yeah it's fun, which will win first: new abilities or crazy costs.
Gavin Uberti was on Invest like the Best talking about this exact issue. He made really interesting points about the possibility of small scale runs of model-on-chip designs. It sounds a lot like your steel example.
Ah, that's cool! Glad to hear he agrees. Etched is a really cool company.
I got excited when I thought you were going to audit some major use cases and specify which were out of reach today, which can be done by gpt4, and which can be done by open weight models. Ofc the possibilities are infinite but some tangible examples of the “low end steel” market would be v interesting
Alas. That would be cool. I don't have a strong sense of the actual use cases, to be honest. I know what I use it for, and what I've seen it be used for professionally, but I'm really familiar with the research side of things, so the pure modelling perspective. If you come across anything that shows this, I'd be quite interested.
Cool, no worries, maybe I'll take a stab for a few domains I'm familiar with. Nice article!
Regarding the statement that architectures have much lower memory bandwidth than compute. I'm curious how Cerebras performs for these usecases. 40GB of on-chip SRAM at 20 PETABytes/s of memory bandwidth seems like it might upset the status quo.
I'm also very curious- I don't know anyone who's run Cerebras chips that's not affiliated with them. If you find anything please let me know.
It’s not either or question. If you want to analyze million docs, you might use inexpensive low precision model. If you want to work on high profile law suit, you want the best model that is there. I actually don’t think LLM API market is overvalued. It’s sure ahead of its time but the number of calls are growing exponentially as far as I can tell. We might see a crash skin to year 2000 when Internet companies went under. They were ahead of its time but right about what would future economics look like.
Why are you calling "(X’X)^-1X’y" an equation? Where is the equals sign?
It's the equation for the OLS estimator (google "OLS equation"). The left-hand side is just `\hat{\beta} = ` Many people are familiar with it as it crops up a lot in ML.