Tips on how to load SafeTensors checkpoints throughout other frameworks
Because the AI ecosystem continues to conform, there are increasingly more techniques to outline gadget studying fashions, and much more techniques to save lots of the type weights that outcome from coaching and fine-tuning. On this rising set of alternatives, KerasHub permits you to mix ‘n match in style type architectures and their weights throughout other ML frameworks.
For instance, a well-liked position to load checkpoints from is the Hugging Face Hub. A lot of the ones type checkpoints have been created with the Hugging Face transformers
library within the SafeTensors structure. Irrespective of what ML framework used to be used to create the type checkpoint, the ones weights can also be loaded right into a KerasHub type, which lets you use your collection of framework (JAX, PyTorch, or TensorFlow) to run the type.
Sure, that implies you’ll be able to run a checkpoint from Mistral or Llama on JAX, and even load Gemma with PyTorch – it does not get any further versatile than that.
Let’s check out a few of these phrases in additional element, and discuss how this works in apply.
Style structure vs. type weights
When loading fashions, there are two distinct portions that we want: the type structure and the type weights (continuously known as “checkpoints”). Let’s outline every of those in additional element.
Once we say “type structure”, we’re regarding how the layers of the type are organized, and the operations that occur inside of them. In a different way to explain this could be to name it the “construction” of the type. We use Python frameworks like PyTorch, JAX, or Keras to precise type architectures.
Once we discuss “type weights”, we’re regarding the “parameters” of a type, or numbers in a type which might be modified over the direction of coaching. The precise values of those weights are what give a skilled type its traits.
“Checkpoints” are a snapshot of the values of the type weights at a selected level within the coaching. The standard checkpoint recordsdata which might be shared and broadly used are those the place the type has reached a in particular excellent coaching consequence. As the similar type structure is additional delicate with fine-tuning and different ways, further new checkpoint recordsdata are created. For instance, many builders have taken Google’s gemma-2-2b-it type and fine-tuned it with their very own datasets, and you’ll be able to see over 600 examples. All of those fine-tuned fashions use the similar structure as the unique gemma-2-2b-it type, however their checkpoints have differing weights.
So there we’ve got it: the type structure is described with code, whilst type weights are skilled parameters, stored as checkpoint recordsdata. When we’ve got a type structure along side a suite of type weights (within the type of a checkpoint record), we create a functioning type that produces helpful outputs.
Sorry, your browser does not make stronger playback for this video
Other type weights can also be loaded into the similar type structure. Those other units of weights are stored as checkpoints.
Gear like Hugging Face’s transformers library and Google’s KerasHub library supply type architectures and the APIs you want to experiment with them. Examples of checkpoint repositories come with Hugging Face Hub and Kaggle Models.
You’ll be able to mix ‘n match type structure libraries together with your collection of checkpoint repositories. For instance, you’ll be able to load a checkpoint from Hugging Face Hub right into a JAX type structure and fine-tune it with KerasHub. For a unique process, it’s possible you’ll discover a checkpoint on Kaggle Fashions that is appropriate to your wishes. This pliability and separation manner you don’t seem to be boxed into one ecosystem.
What’s KerasHub?
So we’ve discussed KerasHub a couple of instances– let’s move into it in additional element.
KerasHub is a Python library that is helping make defining type architectures more straightforward. It incorporates a lot of the preferred and frequently used gadget studying fashions lately, and extra are being added always. As a result of it is according to Keras, KerasHub helps all 3 primary Python gadget studying libraries used lately: PyTorch, JAX, and TensorFlow. This implies you’ll be able to have type architectures outlined in whichever library you would like.
Moreover, since KerasHub helps the most typical checkpoint codecs, you’ll be able to simply load checkpoints from many checkpoint repositories. For instance, you’ll be able to to find masses of hundreds of checkpoints on Hugging Face and Kaggle to load into those type architectures.
Comparisons to the Hugging Face transformers
library
A commonplace workflow via builders is to make use of the Hugging Face transformers
library to fine-tune a type and add it to the Hugging Face Hub. And if you happen to’re a person of transformers
, you’ll additionally to find many acquainted API patterns in KerasHub. Take a look at the KerasHub API documentation to be told extra. An enchanting side of KerasHub is that most of the checkpoints discovered on Hugging Face Hub have compatibility with now not best the transformers
library, but additionally KerasHub. Let’s check out how that works.
KerasHub is suitable with Hugging Face Hub
Hugging Face has a type checkpoint repository, known as Hugging Face Hub. It is one of the most many puts the place the gadget studying neighborhood uploads their type checkpoints to proportion with the sector. Particularly in style on Hugging Face is the SafeTensors format, which is suitable with KerasHub.
You’ll be able to load these checkpoints from Hugging Face Hub without delay into your KerasHub type, so long as the type structure is to be had. Questioning in case your favourite type is to be had? You’ll be able to test https://keras.io/keras_hub/presets/ for a listing of supported type architectures. And do not omit, all of the neighborhood created fine-tuned checkpoints of those type architectures also are suitable! We lately created a new guide to lend a hand provide an explanation for the method in additional element.
How does this all paintings? KerasHub has integrated converters that simplify using Hugging Face transformers
fashions. Those converters mechanically care for the method of translating Hugging Face type checkpoints right into a structure that is suitable with the KerasHub. This implies you’ll be able to seamlessly load all kinds of pretrained Hugging Face transformer fashions from the Hugging Face Hub without delay into KerasHub with only some strains of code.
Should you realize a lacking type structure, you’ll be able to upload it via filing a pull request on GitHub.
Tips on how to load a Hugging Face Hub checkpoint into KerasHub
So how will we get checkpoints from Hugging Face Hub loaded into KerasHub? Let’s check out some concrete examples.
We’re going to get started via first opting for our gadget studying library as our Keras “backend”. We’re going to use JAX within the examples proven, however you’ll be able to make a choice from JAX, PyTorch, or TensorFlow for any of them. All of the examples under paintings without reference to which one you select. Then we will be able to continue via uploading keras
, keras_hub
, and huggingface_hub
, after which login with our Hugging Face Person Get right of entry to token so we will be able to get admission to the type checkpoints.
import os
os.environ["KERAS_BACKEND"] = "jax" # or "torch" or "tensorflow"
import keras
from keras_hub import fashions
from huggingface_hub import login
login('HUGGINGFACE_TOKEN')
Python
Put a Mistral type on JAX
First up, possibly we wish to run a checkpoint from Mistral on JAX? Over on KerasHub, there are a handful of Mistral fashions to be had on KerasHub’s checklist of available model architectures, let’s check out mistral_0.2_instruct_7b_en
. Clicking into it, we see that we will have to use the MistralCausalLM
magnificence to name from_preset
. At the Hugging Face Hub facet of items, we see that the corresponding type checkpoint is saved here, with over 900 fine-tuned variations. Surfing that checklist, there is a in style cybersecurity-focused fine-tuned model called Lily, with the pathname of segolilylabs/Lily-Cybersecurity-7B-v0.2
. We’re going to additionally want to upload “hf://
” earlier than that trail to specify that KerasHub will have to take a look at Hugging Face Hub.
Sorry, your browser does not make stronger playback for this video
Striking all of it in combination, we get the next code:
# Style checkpoint from Hugging Face Hub
gemma_lm = fashions.MistralCausalLM.from_preset("hf://segolilylabs/Lily-Cybersecurity-7B-v0.2")
gemma_lm.generate("Lily, how do evil dual wi-fi assaults paintings?", max_length=30)
Python
Operating Llama 3.1 on JAX
Llama 3.1-8B-Instruct is a well-liked type, with over 5 million downloads closing month. Let’s put a fine-tuned model on JAX. With over 1400 fine-tuned checkpoints, there is not any loss of selection. The xVerify fine-tuned checkpoint appears attention-grabbing, let’s load that into JAX on KerasHub.
We’re going to use the Llama3CausalLM class to mirror the type structure that we’re the usage of. As earlier than, we’re going to want the correct trail from Hugging Face Hub, prefixed with “hf://
“. It is beautiful wonderful that we will be able to load and make contact with a type with simply two strains of code, proper?
# Style checkpoint from Hugging Face Hub
gemma_lm = fashions.Llama3CausalLM.from_preset("hf://IAAR-Shanghai/xVerify-8B-I")
gemma_lm.generate("What's the tallest construction in NYC?", max_length=100)
Python
Load Gemma on JAX
In the end, let’s load a fine-tuned Gemma-3-4b-it checkpoint into JAX. We’re going to use the Gemma3CausalLM magnificence, and make a choice one of the most fine-tuned checkpoints. How about EraX, a multilingual translator? As earlier than, we’re going to use the pathname with the Hugging Face Hub prefix to create the overall trail of “hf://erax-ai/EraX-Translator-V1.0
“.
# Style checkpoint from Hugging Face Hub
gemma_lm = fashions.Gemma3CausalLM.from_preset("hf://erax-ai/EraX-Translator-V1.0")
gemma_lm.generate("Translate to German: ", max_length=30)
Python
Flexibility at your fingertips
As now we have explored, a type’s structure does now not want to be tied to its weights, because of this you’ll be able to mix architectures and weights from other libraries.
KerasHub bridges the distance between other frameworks and checkpoint repositories. You’ll be able to take a type checkpoint from Hugging Face Hub — even one created the usage of the PyTorch-based transformers library—and seamlessly load it right into a Keras type operating in your collection of backend: JAX, TensorFlow, or PyTorch. This permits you to leverage an unlimited number of neighborhood fine-tuned fashions, whilst nonetheless having complete selection over which backend framework to run on.
By means of simplifying the method of mix and matching architectures, weights, and frameworks, KerasHub empowers you to experiment and innovate with easy, but tough flexibility.
Source link