Discussion about this post

User's avatar
Mohamed Yousef's avatar

Nice post.. thanks!

But I am missing a very important piece of info.. benchmarks :)

what net effect did this have ? e.g. a side by side comparison between two models w and w/o the encoders.. do we need more data ? more training ? what gap we have in terms of performance

Thanks!

siyu's avatar

"The removal of the encoders, which are typically in charge of making sense of the multimodal inputs, places the burden of making sense of all outputs on the LLM."

What kind of sloppy writing is this??

No one is making "sense" of anything!! Model weights are just being updated to reduce the output error.. the author should know better

11 more comments...

No posts

Ready for more?