Before I studied machine learning, I was an Econ grad student banging out OLS problem sets (I see the OLS equation— (X’X)^-1X’y— whenever I close my eyes, I derived it so many times). My research area was antitrust theory, and in particular, vertical integration. That gives me a unique perspective: how will the LLM API market evolve as more companies enter the space?
I got excited when I thought you were going to audit some major use cases and specify which were out of reach today, which can be done by gpt4, and which can be done by open weight models. Ofc the possibilities are infinite but some tangible examples of the “low end steel” market would be v interesting
Regarding the statement that architectures have much lower memory bandwidth than compute. I'm curious how Cerebras performs for these usecases. 40GB of on-chip SRAM at 20 PETABytes/s of memory bandwidth seems like it might upset the status quo.
It’s not either or question. If you want to analyze million docs, you might use inexpensive low precision model. If you want to work on high profile law suit, you want the best model that is there. I actually don’t think LLM API market is overvalued. It’s sure ahead of its time but the number of calls are growing exponentially as far as I can tell. We might see a crash skin to year 2000 when Internet companies went under. They were ahead of its time but right about what would future economics look like.
Love the personal background. Would love to see more commentary on how scaling laws and capital costs predictions fit into this too
I got excited when I thought you were going to audit some major use cases and specify which were out of reach today, which can be done by gpt4, and which can be done by open weight models. Ofc the possibilities are infinite but some tangible examples of the “low end steel” market would be v interesting
Regarding the statement that architectures have much lower memory bandwidth than compute. I'm curious how Cerebras performs for these usecases. 40GB of on-chip SRAM at 20 PETABytes/s of memory bandwidth seems like it might upset the status quo.
It’s not either or question. If you want to analyze million docs, you might use inexpensive low precision model. If you want to work on high profile law suit, you want the best model that is there. I actually don’t think LLM API market is overvalued. It’s sure ahead of its time but the number of calls are growing exponentially as far as I can tell. We might see a crash skin to year 2000 when Internet companies went under. They were ahead of its time but right about what would future economics look like.
Why are you calling "(X’X)^-1X’y" an equation? Where is the equals sign?