Topic Modeling with Llama 2

Maarten Grootendorst

Aug 21, 2023

Create easily interpretable topics with Large Language Models

Read →

7 Comments

linwang

Aug 21, 2023

I really enjoy reading your post. It's intuitive and easy to follow. Thank you for sharing your work!

Expand full comment

Reply (1)

Maarten Grootendorst

Aug 21, 2023

Thanks for the kind words! If there's any content you'd like to see, please let me know :-)

Expand full comment

nicolas garzon

Nov 17, 2023

is it not viable to let the LLM do the entire task of topic modelling instead of just the topic representation at the end? wonder if this would give better results.

Expand full comment

Reply (1)

Maarten Grootendorst

Nov 17, 2023

Yes and no.

If the intend is to pass all documents to the LLM, then that might be computationally difficult especially with large datasets. Passing a million documents is, at the moment, not an easy feat even with longer context lengths.

Passing a subset of documents is currently a more viable approach. You pass a small subset of documents that your LLM can handle, based on hardware (VRAM/GPU) and context lengths and let the LLM infer the topics. It does, however, often require an iterative approach to get the topics right. You do, however, often miss out on uncommon topics.

It is interesting that you mention this since I am currently exploring some possibilities for integrating something similar into BERTopic!

Expand full comment

Frozhen

Aug 22, 2023

Looks like the llama2 model needs a temperature being strictly positive instead of being 0.0

Expand full comment

Reply (1)

Maarten Grootendorst

Aug 22, 2023

I see, this was updated a few days ago! I'll make to change it. Thanks for spotting it!

Expand full comment

Reply (1)

Frozhen

Aug 22, 2023

I was just trying it and got that error. Thanks for the great post! I'm also exploring different representation models like gpt-3.5 or gpt-4.

Expand full comment

Exploring Language Models

Topic Modeling with Llama 2