MosaicML is introducing a new model series called MPT (MosaicML Pretrained Transformer) designed to provide an open-source and commercially-usable language model that rivals LLaMA-7B. MPT-7B is trained on 1T tokens of text and code and can handle long inputs thanks to its Adaptive Lookahead Bidirectional (ALiBi) mechanism. The model series is also optimized for fast training and inference, with MPT-7B achieving similar quality to LLaMA-7B across various benchmarks.
To showcase what the MPT-7B can do, MosaicML released the base model along with three finetuned variants: MPT-7B-StoryWriter-65k+, which can handle extremely long context lengths; MPT-7B-Instruct, for short-form instruction following; and MPT-7B-Chat, a chatbot-like dialogue generation model. These models were created using the MosaicML platform, leveraging tools such as Composer, PyTorch FullyShardedDataParallelism (FSDP), and MosaicML LLM Foundry.
MPT-7B was trained with no human intervention, at a cost of around $200k over 9.5 days with 440 GPUs. Developed for ease-of-use and commercial integration, MPT models are designed to be fast and easy to deploy for inference, compatible with HuggingFace ecosystem, as well as FasterTransformer and ONNX. The release of MPT-7B is the first in a series of more advanced foundation models from MosaicML.
