AI21 Labs · General LLM

Jamba

A novel hybrid architecture model combining transformer and Mamba state-space model layers for efficient long-context processing.

Overview

Jamba by AI21 Labs introduces a groundbreaking hybrid architecture that interleaves transformer attention layers with Mamba structured state-space model (SSM) layers. This combination achieves a 256K effective context window with significantly reduced memory footprint compared to pure transformer models. Jamba demonstrates that hybrid architectures can offer the best of both worlds: the strong in-context learning of transformers and the linear-time sequence processing of SSMs, enabling efficient handling of very long documents.

Parameters

52B total, 12B active (MoE)

Context Window

256K tokens

Architecture

Hybrid Transformer-Mamba SSM + MoE

Memory Efficiency

~2x less KV cache than pure transformer

License

Apache 2.0 (Jamba base)

Capabilities

Ultra-long context processing up to 256K tokens

Efficient memory usage through hybrid SSM-transformer architecture

General-purpose text generation and reasoning

Document analysis across extremely long inputs

Use Cases

Processing and analyzing extremely long documents and codebases

Building AI applications requiring efficient long-context handling

Research into hybrid transformer-SSM architectures

Enterprise document analysis with extensive context requirements

Pros

+Novel hybrid architecture achieves efficient long-context processing
+256K context window with manageable memory requirements
+Open-source base model enables research and customization
+MoE design keeps active parameters efficient for inference

Cons

-Newer architecture with less community tooling and optimization
-Hybrid approach adds complexity for deployment and fine-tuning
-Benchmark performance may trail pure transformer models on some tasks
-Limited ecosystem support compared to standard transformer models

Pricing

Jamba base model is free under Apache 2.0. AI21 API offers Jamba Instruct with usage-based pricing. Enterprise plans available for high-volume use.

JambaJamba