13 Comments

Love the personal background. Would love to see more commentary on how scaling laws and capital costs predictions fit into this too

Expand full comment
author

that's a great idea tbh. will try to incorporate

Expand full comment

Yeah it's fun, which will win first: new abilities or crazy costs.

Expand full comment

Gavin Uberti was on Invest like the Best talking about this exact issue. He made really interesting points about the possibility of small scale runs of model-on-chip designs. It sounds a lot like your steel example.

Expand full comment
author

Ah, that's cool! Glad to hear he agrees. Etched is a really cool company.

Expand full comment

I got excited when I thought you were going to audit some major use cases and specify which were out of reach today, which can be done by gpt4, and which can be done by open weight models. Ofc the possibilities are infinite but some tangible examples of the “low end steel” market would be v interesting

Expand full comment
author

Alas. That would be cool. I don't have a strong sense of the actual use cases, to be honest. I know what I use it for, and what I've seen it be used for professionally, but I'm really familiar with the research side of things, so the pure modelling perspective. If you come across anything that shows this, I'd be quite interested.

Expand full comment

Cool, no worries, maybe I'll take a stab for a few domains I'm familiar with. Nice article!

Expand full comment

Regarding the statement that architectures have much lower memory bandwidth than compute. I'm curious how Cerebras performs for these usecases. 40GB of on-chip SRAM at 20 PETABytes/s of memory bandwidth seems like it might upset the status quo.

Expand full comment
author

I'm also very curious- I don't know anyone who's run Cerebras chips that's not affiliated with them. If you find anything please let me know.

Expand full comment

It’s not either or question. If you want to analyze million docs, you might use inexpensive low precision model. If you want to work on high profile law suit, you want the best model that is there. I actually don’t think LLM API market is overvalued. It’s sure ahead of its time but the number of calls are growing exponentially as far as I can tell. We might see a crash skin to year 2000 when Internet companies went under. They were ahead of its time but right about what would future economics look like.

Expand full comment

Why are you calling "(X’X)^-1X’y" an equation? Where is the equals sign?

Expand full comment
author

It's the equation for the OLS estimator (google "OLS equation"). The left-hand side is just `\hat{\beta} = ` Many people are familiar with it as it crops up a lot in ML.

Expand full comment