Papers I've read this week

Jul 10, 2024

This is a grab bag of papers.

4 Comments

"Google LLMs use SentencePiece, which does basically the same thing with a slight tweak" - SentencePiece is a library with different tokenizers (including BPE variants), not an algorithm itself. I was confused about this as well until I read the ambiguity in https://arxiv.org/abs/2112.10508

Expand full comment

Reply (1)

Finbarr Timbers

Jul 10

ahhhh ty!

Expand full comment

Leonardo Perelli

Jul 13

Very nice overview and paper choice! Regarding the MoD, i was somewhat surprised to see that the routing mechanism acts on tokens independently. For a MoE, i can understand this, as you act based on the token's semantics. But when choosing if you need to do more computation on a token or not, i was expecting this decision to somehow require additional information. The fact that this is achieved just through a projection layer (so basically an inner product, ie similarity) could be possible due to the layers specialising in different abstractions: at lower levels you can focus on the grammar and sentence structure, and while you go up you focus more on the "abstract" tokens. And also the fact that you have to implement this for a layer yes and the one after no, seems a bit weird.

Expand full comment

Reply (1)

Finbarr Timbers

Jul 13

It does seem weird. I’m curious if anyone else has used it. It seemed to not have been picked up by any other labs, and I haven’t seen any OS models use it.

Expand full comment

Artificial Fintelligence

Papers I've read this week