A Coding Tutorial of Model Context Protocol Focusing on Semantic Chunking, Dynamic Token Management, and Context Relevance Scoring for Efficient LLM Interactions


Managing context successfully is a essential problem when running with massive language fashions, particularly in environments like Google Colab, the place useful resource constraints and lengthy paperwork can temporarily exceed to be had token home windows. On this educational, we information you via a sensible implementation of the Fashion Context Protocol (MCP) by means of construction a ModelContextManager that routinely chunks incoming textual content, generates semantic embeddings the use of Sentence-Transformers, and rankings each and every bite in keeping with recency, significance, and relevance. You’ll learn to combine this supervisor with a Hugging Face sequence-to-sequence type, demonstrated right here with FLAN-T5, so as to add, optimize, and retrieve handiest probably the most pertinent items of context. Alongside the best way, we’ll quilt token counting with a GPT-2 tokenizer, context-window optimization methods, and interactive periods that can help you question and visualize your dynamic context in genuine time.

import torch
import numpy as np
from typing import Listing, Dict, Any, Non-compulsory, Union, Tuple
from dataclasses import dataclass
import time
import gc
from tqdm.pocket book import tqdm

We import very important libraries for construction a dynamic context supervisor: torch and numpy maintain tensor and numerical operations, whilst typing and dataclasses supply structured sort annotations and knowledge boxes. Software modules, corresponding to time and gc, fortify timestamping and reminiscence cleanup, in addition to tqdm.pocket book gives interactive growth bars for bite processing in Colab.

@dataclass
magnificence ContextChunk:
    """A piece of textual content with metadata for the Fashion Context Protocol."""
    textual content: str
    embedding: Non-compulsory[torch.Tensor] = None
    significance: drift = 1.0
    timestamp: drift = 0.0
    metadata: Dict[str, Any] = None
   
    def __post_init__(self):
        if self.metadata is None:
            self.metadata = {}
        if self.timestamp == 0.0:
            self.timestamp = time.time()

The ContextChunk dataclass encapsulates a unmarried section of textual content together with its embedding, a user-assigned significance rating, a timestamp, and arbitrary metadata. Its __post_init__ way guarantees that each and every bite is stamped with the present time upon advent and that metadata defaults to an empty dictionary if none is equipped.

magnificence ModelContextManager:
    """
    Supervisor for enforcing Fashion Context Protocol in LLMs on Google Colab.
    Handles context window optimization, token control, and relevance scoring.
    """
   
    def __init__(
        self,
        max_context_length: int = 8192,
        embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2",
        relevance_threshold: drift = 0.7,
        recency_weight: drift = 0.3,
        importance_weight: drift = 0.3,
        semantic_weight: drift = 0.4,
        tool: str = "cuda" if torch.cuda.is_available() else "cpu"
    ):
        """
        Initialize the Fashion Context Supervisor.
       
        Args:
            max_context_length: Most collection of tokens in context window
            embedding_model: Fashion to make use of for textual content embeddings
            relevance_threshold: Threshold for bite relevance to be incorporated
            recency_weight: Weight for recency in relevance calculation
            importance_weight: Weight for significance in relevance calculation
            semantic_weight: Weight for semantic similarity in relevance calculation
            tool: Instrument to run computations on
        """
        self.max_context_length = max_context_length
        self.tool = tool
        self.chunks = []
        self.current_token_count = 0
        self.relevance_threshold = relevance_threshold
       
        self.recency_weight = recency_weight
        self.importance_weight = importance_weight
        self.semantic_weight = semantic_weight
       
        check out:
            from sentence_transformers import SentenceTransformer
            print(f"Loading embedding type {embedding_model}...")
            self.embedding_model = SentenceTransformer(embedding_model).to(self.tool)
            print(f"Embedding type loaded effectively on {self.tool}")
        apart from ImportError:
            print("Putting in sentence-transformers...")
            import subprocess
            subprocess.run(["pip", "install", "sentence-transformers"])
            from sentence_transformers import SentenceTransformer
            self.embedding_model = SentenceTransformer(embedding_model).to(self.tool)
            print(f"Embedding type loaded effectively on {self.tool}")
           
        check out:
            from transformers import GPT2Tokenizer
            self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
        apart from ImportError:
            print("Putting in transformers...")
            import subprocess
            subprocess.run(["pip", "install", "transformers"])
            from transformers import GPT2Tokenizer
            self.tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
   
    def add_chunk(self, textual content: str, significance: drift = 1.0, metadata: Dict[str, Any] = None) -> None:
        """
        Upload a brand new bite of textual content to the context supervisor.
       
        Args:
            textual content: The textual content content material so as to add
            significance: Significance rating (0-1)
            metadata: Further metadata for the bite
        """
        with torch.no_grad():
            embedding = self.embedding_model.encode(textual content, convert_to_tensor=True)
       
        bite = ContextChunk(
            textual content=textual content,
            embedding=embedding,
            significance=significance,
            timestamp=time.time(),
            metadata=metadata or {}
        )
       
        self.chunks.append(bite)
        self.current_token_count += len(self.tokenizer.encode(textual content))
       
        if self.current_token_count > self.max_context_length:
            self.optimize_context()
   
    def optimize_context(self) -> None:
        """Optimize context by means of getting rid of much less related chunks to suit inside of token restrict."""
        if now not self.chunks:
            go back
           
        print("Optimizing context window...")
       
        rankings = self.score_chunks()
       
        sorted_indices = np.argsort(rankings)[::-1]
       
        new_chunks = []
        new_token_count = 0
       
        for idx in sorted_indices:
            bite = self.chunks[idx]
            chunk_tokens = len(self.tokenizer.encode(bite.textual content))
           
            if new_token_count + chunk_tokens <= self.max_context_length:
                new_chunks.append(bite)
                new_token_count += chunk_tokens
            else:
                if rankings[idx] > self.relevance_threshold * 1.5:
                    for i, included_chunk in enumerate(new_chunks):
                        included_idx = sorted_indices[i]
                        if rankings[included_idx] < self.relevance_threshold:
                            included_tokens = len(self.tokenizer.encode(included_chunk.textual content))
                            if new_token_count - included_tokens + chunk_tokens <= self.max_context_length:
                                new_chunks.take away(included_chunk)
                                new_token_count -= included_tokens
                                new_chunks.append(bite)
                                new_token_count += chunk_tokens
                                damage
       
        removed_count = len(self.chunks) - len(new_chunks)
        self.chunks = new_chunks
        self.current_token_count = new_token_count
       
        print(f"Context optimized: Got rid of {removed_count} chunks, {len(new_chunks)} final, the use of {new_token_count}/{self.max_context_length} tokens")
       
        gc.accumulate()
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
   
    def score_chunks(self, question: str = None) -> np.ndarray:
        """
        Rating chunks in keeping with recency, significance, and semantic relevance.
       
        Args:
            question: Non-compulsory question to calculate semantic relevance in opposition to
           
        Returns:
            Array of rankings for each and every bite
        """
        if now not self.chunks:
            go back np.array([])
           
        current_time = time.time()
        max_age = max(current_time - bite.timestamp for bite in self.chunks) or 1.0
        recency_scores = np.array([
            1.0 - ((current_time - chunk.timestamp) / max_age)
            for chunk in self.chunks
        ])
       
        importance_scores = np.array([chunk.importance for chunk in self.chunks])
       
        if question isn't None:
            query_embedding = self.embedding_model.encode(question, convert_to_tensor=True)
            similarity_scores = np.array([
                torch.cosine_similarity(chunk.embedding, query_embedding, dim=0).item()
                for chunk in self.chunks
            ])
           
            similarity_scores = (similarity_scores - similarity_scores.min()) / (similarity_scores.max() - similarity_scores.min() + 1e-8)
        else:
            similarity_scores = np.ones(len(self.chunks))
       
        final_scores = (
            self.recency_weight * recency_scores +
            self.importance_weight * importance_scores +
            self.semantic_weight * similarity_scores
        )
       
        go back final_scores
   
    def retrieve_context(self, question: str = None, okay: int = None) -> str:
        """
        Retrieve probably the most related context for a given question.
       
        Args:
            question: The question to retrieve context for
            okay: The utmost collection of chunks to go back (None = all related chunks)
           
        Returns:
            String containing the mixed related context
        """
        if now not self.chunks:
            go back ""
           
        rankings = self.score_chunks(question)
       
        relevant_indices = np.the place(rankings >= self.relevance_threshold)[0]
       
        relevant_indices = relevant_indices[np.argsort(scores[relevant_indices])[::-1]]
       
        if okay isn't None:
            relevant_indices = relevant_indices[:k]
           
        relevant_texts = [self.chunks[i].textual content for i in relevant_indices]
        go back "nn".sign up for(relevant_texts)
   
    def get_stats(self) -> Dict[str, Any]:
        """Get statistics concerning the present context state."""
        go back {
            "chunk_count": len(self.chunks),
            "token_count": self.current_token_count,
            "max_tokens": self.max_context_length,
            "usage_percentage": self.current_token_count / self.max_context_length * 100 if self.max_context_length else 0,
            "avg_chunk_size": self.current_token_count / len(self.chunks) if self.chunks else 0,
            "oldest_chunk_age": time.time() - min(bite.timestamp for bite in self.chunks) if self.chunks else 0,
        }


    def visualize_context(self):
        """Visualize the present context window distribution."""
        check out:
            import matplotlib.pyplot as plt
            import pandas as pd
           
            if now not self.chunks:
                print("No chunks to visualise")
                go back
           
            rankings = self.score_chunks()
            chunk_sizes = [len(self.tokenizer.encode(chunk.text)) for chunk in self.chunks]
            timestamps = [chunk.timestamp for chunk in self.chunks]
            relative_times = [time.time() - ts for ts in timestamps]
            significance = [chunk.importance for chunk in self.chunks]
           
            df = pd.DataFrame({
                'Measurement (tokens)': chunk_sizes,
                'Age (seconds)': relative_times,
                'Significance': significance,
                'Rating': rankings
            })
           
            fig, axs = plt.subplots(2, 2, figsize=(14, 10))
           
            axs[0, 0].bar(vary(len(chunk_sizes)), chunk_sizes)
            axs[0, 0].set_title('Token Distribution by means of Bite')
            axs[0, 0].set_ylabel('Tokens')
            axs[0, 0].set_xlabel('Bite Index')
           
            axs[0, 1].scatter(chunk_sizes, rankings)
            axs[0, 1].set_title('Rating vs Bite Measurement')
            axs[0, 1].set_xlabel('Tokens')
            axs[0, 1].set_ylabel('Rating')
           
            axs[1, 0].scatter(relative_times, rankings)
            axs[1, 0].set_title('Rating vs Bite Age')
            axs[1, 0].set_xlabel('Age (seconds)')
            axs[1, 0].set_ylabel('Rating')
           
            axs[1, 1].scatter(significance, rankings)
            axs[1, 1].set_title('Rating vs Significance')
            axs[1, 1].set_xlabel('Significance')
            axs[1, 1].set_ylabel('Rating')
           
            plt.tight_layout()
            plt.display()
           
        apart from ImportError:
            print("Please set up matplotlib and pandas for visualisation")
            print('!pip set up matplotlib pandas')

The ModelContextManager magnificence orchestrates the end-to-end dealing with of context for LLMs by means of chunking enter textual content, producing embeddings, and monitoring token utilization in opposition to a configurable restrict. It implements relevance scoring (combining recency, significance, and semantic similarity), computerized context pruning, retrieval of probably the most pertinent chunks, and handy utilities for tracking and visualizing context statistics.

magnificence MCPColabDemo:
    """Demonstration of Fashion Context Protocol in Google Colab with a Language Fashion."""
   
    def __init__(
        self,
        model_name: str = "google/flan-t5-base",
        max_context_length: int = 2048,
        tool: str = "cuda" if torch.cuda.is_available() else "cpu"
    ):
        """
        Initialize the MCP Colab demo with a specified type.
       
        Args:
            model_name: Hugging Face type title
            max_context_length: Most context period for the MCP supervisor
            tool: Instrument to run the type on
        """
        self.tool = tool
        self.context_manager = ModelContextManager(
            max_context_length=max_context_length,
            tool=tool
        )
       
        check out:
            from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
            print(f"Loading type {model_name}...")
            self.type = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(tool)
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            print(f"Fashion loaded effectively on {tool}")
        apart from ImportError:
            print("Putting in transformers...")
            import subprocess
            subprocess.run(["pip", "install", "transformers"])
            from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
            self.type = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(tool)
            self.tokenizer = AutoTokenizer.from_pretrained(model_name)
            print(f"Fashion loaded effectively on {tool}")
   
    def add_document(self, textual content: str, chunk_size: int = 512, overlap: int = 50) -> None:
        """
        Upload a record to the context by means of chunking it correctly.
       
        Args:
            textual content: Report textual content
            chunk_size: Measurement of each and every bite in characters
            overlap: Overlap between chunks in characters
        """
        chunks = []
        for i in vary(0, len(textual content), chunk_size - overlap):
            bite = textual content[i:i + chunk_size]
            if len(bite) > 20:  
                chunks.append(bite)
       
        print(f"Including {len(chunks)} chunks to context...")
        for i, bite in enumerate(tqdm(chunks)):
            pos = i / len(chunks)
            significance = 1.0 - 0.5 * min(pos, 1 - pos)
           
            self.context_manager.add_chunk(
                textual content=bite,
                significance=significance,
                metadata={"supply": "record", "place": i, "total_chunks": len(chunks)}
            )
   
    def process_query(self, question: str, max_new_tokens: int = 256) -> str:
        """
        Procedure a question the use of the context supervisor and type.
       
        Args:
            question: The question to procedure
            max_new_tokens: Most collection of tokens in reaction
           
        Returns:
            Fashion reaction
        """
        self.context_manager.add_chunk(question, significance=1.0, metadata={"sort": "question"})
       
        relevant_context = self.context_manager.retrieve_context(question=question)
       
        recommended = f"Context: {relevant_context}nnQuestion: {question}nnAnswer:"
       
        inputs = self.tokenizer(recommended, return_tensors="pt").to(self.tool)
       
        print("Producing reaction...")
        with torch.no_grad():
            outputs = self.type.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                do_sample=True,
                temperature=0.7,
                top_p=0.9,
            )
       
        reaction = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
       
        self.context_manager.add_chunk(
            reaction,
            significance=0.9,
            metadata={"sort": "reaction", "question": question}
        )
       
        go back reaction
   
    def interactive_session(self):
        """Run an interactive consultation within the pocket book."""
        from IPython.show import clear_output
       
        print("Beginning interactive MCP consultation. Kind 'go out' to finish.")
        conversation_history = []
       
        whilst True:
            question = enter("nYour question: ")
           
            if question.decrease() == 'go out':
                damage
               
            if question.decrease() == 'stats':
                print("nContext Statistics:")
                stats = self.context_manager.get_stats()
                for key, worth in stats.pieces():
                    print(f"{key}: {worth}")
                self.context_manager.visualize_context()
                proceed
               
            if question.decrease() == 'transparent':
                self.context_manager.chunks = []
                self.context_manager.current_token_count = 0
                conversation_history = []
                clear_output(wait=True)
                print("Context cleared!")
                proceed
           
            reaction = self.process_query(question)
            conversation_history.append((question, reaction))
           
            print("nResponse:")
            print(reaction)
            print("n" + "-"*50)
           
            stats = self.context_manager.get_stats()
            print(f"Context utilization: {stats['token_count']}/{stats['max_tokens']} tokens ({stats['usage_percentage']:.1f}%)")

The MCPColabDemo magnificence ties the context supervisor to a seq2seq LLM, loading FLAN-T5 (or any specified Hugging Face type) at the selected tool, and gives software strategies for chunking and drinking complete paperwork, processing consumer queries by means of prepending handiest probably the most related context, and operating an interactive Colab consultation entire with real-time stats, visualizations, and instructions for clearing or analyzing the evolving context window.

def run_mcp_demo():
    """Run a easy demo of the Fashion Context Protocol."""
    print("Operating Fashion Context Protocol Demo...")
   
    context_manager = ModelContextManager(max_context_length=4096)
   
    print("Including pattern chunks...")
   
    context_manager.add_chunk(
        "The Fashion Context Protocol (MCP) is a framework for managing context "
        "home windows in massive language fashions. It is helping optimize token utilization and toughen relevance.",
        significance=1.0
    )
   
    context_manager.add_chunk(
        "Context control comes to ways like sliding home windows, chunking, "
        "and relevance filtering to maintain massive paperwork successfully.",
        significance=0.8
    )
   
    for i in vary(10):
        context_manager.add_chunk(
            f"That is check bite {i} with some filler content material to simulate a bigger context "
            f"window that wishes optimization. This is helping show the MCP capability "
            f"for context window control in language fashions on Google Colab.",
            significance=0.5 - (i * 0.02)  
        )
   
    stats = context_manager.get_stats()
    print("nInitial Statistics:")
    for key, worth in stats.pieces():
        print(f"{key}: {worth}")
       
    question = "How does the Fashion Context Protocol paintings?"
    print(f"nRetrieving context for: '{question}'")
    context = context_manager.retrieve_context(question)
    print(f"nRelevant context:n{context}")
   
    print("nVisualizing context:")
    context_manager.visualize_context()
   
    print("nDemo entire!")

The run_mcp_demo serve as ties the entirety in combination in one script: it instantiates the ModelContextManager, provides a chain of pattern chunks with various significance, prints out preliminary statistics, retrieves and shows probably the most related context for a check question, and in spite of everything visualizes the context window, offering a whole, end-to-end demonstration of the Fashion Context Protocol in motion.

if __name__ == "__main__":
    run_mcp_demo()

After all, this usual Python entry-point guard guarantees that the run_mcp_demo() serve as executes handiest when the script is administered immediately (slightly than imported as a module), triggering the end-to-end demonstration of the Fashion Context Protocol workflow.

In conclusion, we will be able to have an absolutely useful MCP device that now not handiest curbs runaway token utilization but in addition prioritizes context fragments that really topic to your queries. The ModelContextManager equips you with equipment to stability semantic relevance, temporal freshness, and user-assigned significance. On the identical time, the accompanying MCPColabDemo magnificence supplies an obtainable framework for real-time experimentation and visualization. Armed with those patterns, you’ll be able to prolong the core rules by means of adjusting relevance thresholds, experimenting with more than a few embedding fashions, or integrating with choice LLM backends to tailor your domain-specific workflows. In the end, this way allows you to create concise but extremely related activates, leading to extra correct and environment friendly responses out of your language fashions.


This is the Colab Notebook. Additionally, don’t fail to remember to apply us on Twitter and sign up for our Telegram Channel and LinkedIn Group. Don’t Fail to remember to sign up for our 90k+ ML SubReddit.

🔥 [Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the opportunity of Synthetic Intelligence for social just right. His most up-to-date undertaking is the release of an Synthetic Intelligence Media Platform, Marktechpost, which sticks out for its in-depth protection of system finding out and deep finding out information this is each technically sound and simply comprehensible by means of a large target market. The platform boasts of over 2 million per thirty days perspectives, illustrating its recognition amongst audiences.



Source link

Leave a Comment