Advancing AI for humanity
⇛
Research
Blog
About
Star
...
The Next Recipe
coming soon
The Second Curve of Scaling Law
Jan 15, 2024
Scaling Factors
Jun 15, 2024
Differential Transformer
Oct 7, 2024
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Feb 28, 2024
1-bit AI Infra / bitnet.cpp: Running LLMs on CPUs
Oct 17, 2024
Q-Sparse / Block Q-Sparse: Fully Sparsely-Activated LLMs
Jul 15, 2024
You Only Cache Once: Decoder-Decoder Architectures for Large Language Models
// Gated RetNet (RetNet-3)
May 9, 2024
The Era of 1-bit LLMs: Training Tips, Code and FAQ
Mar 20, 2024
MELLE: Autoregressive Speech Synthesis without Vector Quantization
Jul 11, 2024
VALL-E 2: Human Parity Zero-Shot Text to Speech Synthesis
Jun 8, 2024
Multi-Head Mixture-of-Experts
Apr 23, 2024
The Learning Law: Towards Optimal Learning of Language Models
Feb 28, 2024
The Mind's Eye of (M)LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Apr 4, 2024
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Feb 20, 2024
Multilingual E5 Text Embeddings
Feb 8, 2024
BitNet: 1-bit Transformers and LLMs
Oct 18, 2023
Retentive Network: Revolutionizing Transformers for Large Language Models
Jul 18, 2023
LongViT (LongNet for Vision): When an Image is Worth 1,024 × 1,024 Words
Dec 7, 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Oct 4, 2023
Kosmos-2.5: A Multimodal Literate Model
Sep 20, 2023
Large Language Model for Science: A Study on P vs. NP
Sep 13, 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jul 6, 2023
Kosmos-2: Grounding Multimodal Large Language Models (MLLMs) to the World
Jun 26, 2023
Kosmos-1: A Multimodal Large Language Model (MLLM)
Feb 28, 2023
WavMark: Watermarking for Audio Generation
Aug 24, 2023
VALL-E (X): Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Jan 6, 2023
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Sep 19, 2023
AdaLLM: Adapting Large Language Models via Reading Comprehension
Sep 18, 2023
MiniLLM: Knowledge Distillation of Large Language Models
Jun 14, 2023
Large Language Models with Long-Term Memory
Jun 12, 2023
LLM Accelerator: Lossless Acceleration of Large Language Models
Apr 11, 2023
A Length-Extrapolatable Transformer
Dec 20, 2022
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta Optimizers
Dec 20, 2022
Promptist: Optimizing Prompts for Text-to-Image Generation
Dec 19, 2022
Structured Prompting: Scaling In-Context Learning to 1,000 Examples
Dec 12, 2022
TorchScale: Transformers at (Any) Scale
Nov 24, 2022
Magneto: A Foundation Transformer
October 13, 2022
BEiT-3: A General-Purpose Multimodal Foundation Model
Aug 30, 2022
Language Models are General-Purpose Interfaces
June 13, 2022
DeepNet: Scaling Transformers to 1,000 Layers
Mar 1, 2022
BEiT: BERT Pre-Training of Image Transformers
June 15, 2021
MiniLM: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Feb 25, 2021
XLM-E: Efficient Multilingual Language Model Pre-training
June 30, 2021
UniLM: Unified Language Model Pre-training
May 8, 2019