9 Comments
Feb 23Liked by Maarten Grootendorst

Thank you for taking the time to break down the complexity into visually appealing, bite sized chunks. This is very interesting!

Expand full comment

Thank you for your sharing, could you please share which tool you use to create these beautiful diagrams?

Expand full comment

Í had been reading the mamba paper, and it went all over my head. Read this after the paper, and this is so good, so easy to understand, and very well structured. Thank you!!

Expand full comment

Thank you Maarten, great post! I have a question regarding the parallel scan: are the B_'s suppose to be different ?

Expand full comment

This post is brilliant Maarten, loved how you sucked us in to this idea from ax to bread (explaining idea down to its delicious cuda)! Sorry if you answered before but how do you make the matrix vizualisations? They are great

Expand full comment

I am very grateful to you Maarten for writing such an incredible piece. I am looking forward to your book.

Expand full comment
Feb 26·edited Feb 26

Thank you so much for the clear explain of the process and making the visualized figures.

I am still a little confused about some of definitions:

1. Does k in x_k mean the timestamp t?

2. Does x_0, x_1, X_2 ... represent each token in the input sequence? If so, the number of timestamp (k) is equal to the input length (token number)?

3. For the BX_k, is the matrix multiplication (1xD)*(DxN)=1xN, or (LxD)*(DxN)=LxN ?

Expand full comment

Thank you. This was very helpful. The visualization is great.

Can you highlight how the matrices B, and C are parametrized and learned?

Thank you!

Expand full comment