Our mission is to advance artificial general intelligence, focusing on the generality, generalizability, and adaptability of AI. Among other efforts, we are committed to the research and development of a general-purpose foundation model that can be systematically adapted and generalized to a broad set of tasks with general modalities as input, aiming to realize the grand vision of AGI-as-a-Service.


As part of the mission-focused research, our work on Foundation Models has been pushing large-scale AI and The Big Convergence of large-scale pre-training across tasks, languages, and modalities, including UniLM(-2) for language model pre-training; InfoXLM, XLM-E for multilingual pre-training; BEiT(-2) for vision pre-training; WavLM, SpeechLM, VALL-E for speech pre-training; BEiT-3 for multimodal pre-training; Layout(X)LM(-2/3) as the first multimodal document foundation model; MetaLM as the general-purpose foundation model; Kosmos-1 as MLLM (Multimodal LLM); Multiway Transformers for multimodal modeling and Magneto (Foundation Transformers) for true general-purpose modeling, to name a few. Also, our research has been pushing fundamental AI, such as the TorchScale initiative which focuses on fundamental research to improve modeling generality and capability as well as training stability and efficiency for Transformers at any scale, including DeepNet, Magneto (Foundation Transformers), and X-MoE. We also work on fundamental research and technology for building AI products w/ foundation models. For example, we develop effective and efficient approaches to deploying large AI models in practice, including MiniLM(-2), xTune, EdgeFormer, and Aggressive Decoding. We also have the LMOps initiative which specifically focuses on general technology for enabling AI capabilities w/ (M)LLMs and Generative AI models, including Extensible Prompts, Promptist, and Structured Prompting. In addition to the research achievements, these models are significant parts of Microsoft's own family of large AI (foundation) models powering language and multimodal tasks and scenarios across products in Microsoft. Moreover, our research tops public benchmarks and leaderboards across language (GLUE, XTREME), vision (ADE20k, COCO), speech (SUPERB), and multimodal (NLVR2, VQAv2) tasks, and hugely contributes to the open source community through GitHub and Hugging Face. More information about our Research and Highlights.


microsoft/unilm: Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

microsoft/torchscale: Transformers at (Any) Scale / Next-generation General-purpose AI Architecture

microsoft/lmops: General technology for enabling AI capabilities w/ (M)LLMs


We are hiring at all levels (including FTE researchers and interns)! If you are interested in working with us on Foundation Models (aka large-scale pre-trained models) and AGI, NLP, MT, Speech, Document AI and Multimodal AI, please send your resume to fuwei@microsoft.com.