LESS VRAM, 8K+ Tokens & HUGE SPEED INCRASE | ExLlama for Oobabooga

Published: Jun 29, 2023
Last Edit: Jun 29, 2023

154 Words, 1 Minute.

Watch the video:

Timestamps:

00 - What's new (It's CRAZY!)
44 - Open Oobabooga install directory
02 - Update Oobabooga WebUI
18 - VRAM usage & speed before update (4.3 tokens/s)
56 - Fix missing option or update errors
33 - Choosing new ExLlama model locader
52 - Downloading new model types (8k models)
25 - New VRAM & Speed (20 tokens/s! INSANE!)
25 - Raise token limit from 2,000 to 8,000+!
17 - How many tokens is your text?
50 - How long is 8k tokens?
45 - EVEN LESS VRAM with ExLlama_HF

Oobabooga WebUI had a HUGE update adding ExLlama and ExLlama_HF model loaders that use LESS VRAM and have HUGE speed increases, and even 8K tokens to play around with compared to the previous limit of 2K! This is insanely powerful and will be a huge timesaver for creators, and may even help users with less powerful graphics cards use LLMs!

OpenAI Tokenizer: https://platform.openai.com/tokenizer