Watch the video:
Timestamps:
0:00 - What's new (It's CRAZY!)
0:44 - Open Oobabooga install directory
1:02 - Update Oobabooga WebUI
1:18 - VRAM usage & speed before update (4.3 tokens/s)
1:56 - Fix missing option or update errors
2:33 - Choosing new ExLlama model locader
2:52 - Downloading new model types (8k models)
4:25 - New VRAM & Speed (20 tokens/s! INSANE!)
5:25 - Raise token limit from 2,000 to 8,000+!
7:17 - How many tokens is your text?
7:50 - How long is 8k tokens?
8:45 - EVEN LESS VRAM with ExLlama_HF
Oobabooga WebUI had a HUGE update adding ExLlama and ExLlama_HF model loaders that use LESS VRAM and have HUGE speed increases, and even 8K tokens to play around with compared to the previous limit of 2K! This is insanely powerful and will be a huge timesaver for creators, and may even help users with less powerful graphics cards use LLMs!
OpenAI Tokenizer: https://platform.openai.com/tokenizer