Advancing AI for humanity ⇛
  • Research
  • Blog
  • About
Star ...

Building Frontier Efficiency Model

coming, April 1, 2025

A New Paradigm of AI

coming, Jan 1, 2025

The Next Recipe / Dec 15, 2024

The Second Curve of Scaling Law / Jan 15, 2024

Scaling Factors / Jun 15, 2024

BitNet

Introducing RPT: Reinforcement Pre-Training

June 10, 2025
BitNet

Rectified Sparse Attention

June 5, 2025
On-Policy RL with Optimal Reward Baseline

Advancing the Foundation of RL in LLMs

May 30, 2025
BitNet

The Frontier of Reward Models

May 21, 2025
Think Only When You Need with Large Hybrid-Reasoning Models

Adaptive Thinking Models

May 21, 2025
BitNet

Introducing BitNet b1.58 2B4T - Scaling Native 1-bit LLMs

Apr 15, 2025
BitNet

BitNet v2: Native 4-bit Activations for 1-bit LLMs

Apr 17, 2024
BitNet

LatentLM: A Grand Unification of Multimodality

Dec 12, 2024
BitNet

Differential Transformer

Oct 7, 2024
BitNet

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Feb 28, 2024
BitNet

Scaling Laws of Synthetic Data for Language Models

Mar 25, 2025
YOCO

YOCO: Decoder-Decoder Architectures for Large Language Models
// Gated RetNet (RetNet-3)

May 9, 2024
BitNet

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Nov 8, 2024
MoE

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought

Jan 13, 2025
MoE

Bootstrap Your Own Context Length

Dec 25, 2024
MoE

MH-MoE (v2): Multi-Head Mixture-of-Experts

Nov 26, 2024
Fully Sparsely-Activated LLMs

1-bit AI Infra / bitnet.cpp: Running LLMs on CPUs

Oct 17, 2024
Fully Sparsely-Activated LLMs

Q-Sparse / Block Q-Sparse: Fully Sparsely-Activated LLMs

Jul 15, 2024
BitNet

The Era of 1-bit LLMs: Training Tips, Code and FAQ

Mar 20, 2024
MoE

MELLE: Autoregressive Speech Synthesis without Vector Quantization

Jul 11, 2024
MoE

VALL-E 2: Human Parity Zero-Shot Text to Speech Synthesis

Jun 8, 2024
Learning Law

The Learning Law: Towards Optimal Learning of Language Models

Feb 28, 2024

The Mind's Eye of (M)LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Apr 4, 2024

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Feb 20, 2024

Multilingual E5 Text Embeddings

Feb 8, 2024

BitNet: 1-bit Transformers and LLMs

Oct 18, 2023
RetNet

Retentive Network: Revolutionizing Transformers for Large Language Models

Jul 18, 2023
LongViT

LongViT (LongNet for Vision): When an Image is Worth 1,024 × 1,024 Words

Dec 7, 2023
Kosmos

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Oct 4, 2023
Kosmos 2.5

Kosmos-2.5: A Multimodal Literate Model

Sep 20, 2023
LLM4Science

Large Language Model for Science: A Study on P vs. NP

Sep 13, 2023
LongNet

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Jul 6, 2023
KOSMOS-2

Kosmos-2: Grounding Multimodal Large Language Models (MLLMs) to the World

Jun 26, 2023
KOSMOS-1

Kosmos-1: A Multimodal Large Language Model (MLLM)

Feb 28, 2023
VALL-E

WavMark: Watermarking for Audio Generation

Aug 24, 2023
VALL-E

VALL-E (X): Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Jan 6, 2023
LLMA

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Sep 19, 2023
LLMA

AdaLLM: Adapting Large Language Models via Reading Comprehension

Sep 18, 2023
MiniLLM

MiniLLM: Knowledge Distillation of Large Language Models

Jun 14, 2023
LongMem

Large Language Models with Long-Term Memory

Jun 12, 2023
LLMA

LLM Accelerator: Lossless Acceleration of Large Language Models

Apr 11, 2023
XPos

A Length-Extrapolatable Transformer

Dec 20, 2022
ICL

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers

Dec 20, 2022
Promptist

Promptist: Optimizing Prompts for Text-to-Image Generation

Dec 19, 2022
Structured Prompting

Structured Prompting: Scaling In-Context Learning to 1,000 Examples

Dec 12, 2022
TorchScale

TorchScale: Transformers at (Any) Scale

Nov 24, 2022

Magneto: A Foundation Transformer

October 13, 2022
BEiT-3

BEiT-3: A General-Purpose Multimodal Foundation Model

Aug 30, 2022

Language Models are General-Purpose Interfaces

June 13, 2022

DeepNet: Scaling Transformers to 1,000 Layers

Mar 1, 2022

BEiT: BERT Pre-Training of Image Transformers

June 15, 2021

MiniLM: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

Feb 25, 2021

XLM-E: Efficient Multilingual Language Model Pre-training

June 30, 2021

UniLM: Unified Language Model Pre-training

May 8, 2019
© 2022