TroubleChute Logo
VIDEOS

LESS VRAM, 8K+ Tokens & HUGE SPEED INCRASE | ExLlama for Oobabooga


Published: Jun 29, 2023
Last Edit: Jun 29, 2023
154 Words, 1 Minute.

Watch the video:


Timestamps:
0:00 - What's new (It's CRAZY!)
0:44 - Open Oobabooga install directory
1:02 - Update Oobabooga WebUI
1:18 - VRAM usage & speed before update (4.3 tokens/s)
1:56 - Fix missing option or update errors
2:33 - Choosing new ExLlama model locader
2:52 - Downloading new model types (8k models)
4:25 - New VRAM & Speed (20 tokens/s! INSANE!)
5:25 - Raise token limit from 2,000 to 8,000+!
7:17 - How many tokens is your text?
7:50 - How long is 8k tokens?
8:45 - EVEN LESS VRAM with ExLlama_HF

Oobabooga WebUI had a HUGE update adding ExLlama and ExLlama_HF model loaders that use LESS VRAM and have HUGE speed increases, and even 8K tokens to play around with compared to the previous limit of 2K! This is insanely powerful and will be a huge timesaver for creators, and may even help users with less powerful graphics cards use LLMs!

OpenAI Tokenizer: https://platform.openai.com/tokenizer

TroubleChute © Wesley Pyburn (TroubleChute)
Support Me Privacy Policy Cookies Policy Terms of Service Change privacy settings Contact