Mistral AI
TypePrivate
IndustryArtificial intelligence
Founded28 April 2023
FoundersArthur Mensch
Headquarters,
France Edit this on Wikidata
Products
  • Mistral 7B
  • Mixtral 8x7B
Websitemistral.ai

Mistral AI is a French company in artificial intelligence. It was founded in April 2023 by researchers previously employed by Meta and Google: Arthur Mensch, Timothée Lacroix and Guillaume Lample.[1] It has raised 385 million euros, or about $415 million in October 2023.[2] In December 2023, it attained a valuation of more than $2 billion.[3][4][5]

It produces open large language models,[6] citing the foundational importance of open-source software, and as a response to proprietary models.[7]

As of December 2023, two models have been published, and are available as weights.[8] Another prototype model is available via API only.[9]

History

Mistral AI was co-founded in April 2023 by Arthur Mensch, Guillaume Lample and Timothée Lacroix. Prior to co-founding Mistral AI, Arthur Mensch worked at DeepMind, Google's artificial intelligence laboratory, while Guillaume Lample and Timothée Lacroix worked at Meta.[10]

In June 2023, the start-up carried out a first fundraising of 105 million euros (117 million US$) with investors including the American fund Lightspeed Venture Partners, Eric Schmidt, Xavier Niel and JCDecaux. The valuation is then estimated by the Financial Times at 240 million € (267 million US$).

On September 27, 2023, the company made its language processing model “Mistral 7B” available under the free Apache 2.0 license. This model has 7 billion parameters, a small size compared to its competitors.

On December 10, 2023, Mistral AI announced that it had raised 385 million € (428 million US$) as part of its second fundraising. This round of financing notably involves the Californian fund Andreessen Horowitz, BNP Paribas and the software publisher Salesforce.[11]

On December 11, 2023, the company released the “Mixtral 8x7B” model with 46.7 billion parameters but using only 12.9 billion per token thanks to the mixture of experts architecture. The model masters 5 languages (French, Spanish, Italian, English and German) and outperforms, according to its developers' tests, the "LLama 2 70B" model from Meta. A version trained to follow instructions and called “Mixtral 8x7B Instruct” is also offered.[12]

Models

Mistral 7B

Mistral 7B is a 7.3B parameter language model using the transformers architecture. Officially released on September 27, 2023 via a BitTorrent magnet link,[13] and Hugging Face.[14] The model was released under the Apache 2.0 license. The release blog post claimed the model outperforms LLaMA 2 13B on all benchmarks tested, and is on par with LLaMA 34B on many benchmarks tested.[15]

Mistral 7B uses a similar architecture to LLaMA, but with some changes to the attention mechanism. In particular it uses Grouped-query attention (GQA) intended for faster inference and Sliding Window Attention (SWA) intended to handle longer sequences.

Sliding Window Attention (SWA) reduces the computational cost and memory requirement for longer sequences. In sliding window attention, each token can only attend to a fixed number of tokens from the previous layer in a "sliding window" of 4096 tokens, with a total context length of 32768 tokens. At inference time, this reduces the cache availability, leading to higher latency and smaller throughput. To alleviate this issue, Mistral 7B uses a rolling buffer cache.

Mistral 7B uses grouped-query attention (GQA), which is a variant of the standard attention mechanism. Instead of computing attention over all the hidden states, it computes attention over groups of hidden states.[16]

Both a base model and "instruct" model were released with the later receiving additional tuning to follow chat-style prompts. The fine-tuned model is only intended for demonstration purposes, and does not have guardrails or moderation built-in.[15]

Mixtral 8x7B

Much like Mistral's first model, Mixtral 8x7B was released via BitTorrent on December 9, 2023,[6] and later Hugging Face and a blog post were released two days later.[12]

Unlike the previous Mistral model, Mixtral 8x7B uses a sparse mixture of experts architecture. The model has 8 distinct groups of "experts", giving the model a total of 46.7B usable parameters.[17][18] Each single token can only use 12.9B parameters, therefore giving the speed and cost that a 12.9B parameter model would incur.[12]

Mistral AI's testing shows the model beats both LLaMa 70B, and GPT-3.5 in most benchmarks.[19]

References

  1. "France's unicorn start-up Mistral AI embodies its artificial intelligence hopes". Le Monde.fr. 2023-12-12. Retrieved 2023-12-16.
  2. "Mistral, French A.I. Start-Up, Is Valued at $2 Billion in Funding Round". TNYT.
  3. Fink, Charlie. "This Week In XR: Epic Triumphs Over Google, Mistral AI Raises $415 Million, $56.5 Million For Essential AI". Forbes. Retrieved 2023-12-16.
  4. "A French AI start-up may have commenced an AI revolution, silently". Hindustan Times. December 12, 2023.
  5. "French AI start-up Mistral secures €2bn valuation". www.ft.com.
  6. 1 2 "Buzzy Startup Just Dumps AI Model That Beats GPT-3.5 Into a Torrent Link". Gizmodo. 2023-12-12. Retrieved 2023-12-16.
  7. "Bringing open AI models to the frontier". mistral.ai. Mistral AI. 27 September 2023. Retrieved 4 January 2024.
  8. "Open-weight models | Mistral AI Large Language Models". docs.mistral.ai. Retrieved 2024-01-04.
  9. "Endpoints | Mistral AI Large Language Models". docs.mistral.ai.
  10. https://www.lemonde.fr/en/economy/article/2023/12/12/french-unicorn-start-up-mistral-ai-embodies-its-artificial-intelligence-hopes_6337125_19.html
  11. https://www.lemondeinformatique.fr/actualites/lire-mistral-leve-385-meteuro-et-devient-une-licorne-francaise-92392.html
  12. 1 2 3 "Mixtral of experts". mistral.ai. 2023-12-11. Retrieved 2024-01-04.
  13. Goldman, Sharon (2023-12-08). "Mistral AI bucks release trend by dropping torrent link to new open source LLM". VentureBeat. Retrieved 2024-01-04.
  14. Coldewey, Devin (27 September 2023). "Mistral AI makes its first large language model free for everyone". TechCrunch. Retrieved 4 January 2024.
  15. 1 2 "Mistral 7B". mistral.ai. Mistral AI. 27 September 2023. Retrieved 4 January 2024.
  16. Jiang, Albert Q.; Sablayrolles, Alexandre; Mensch, Arthur; Bamford, Chris; Chaplot, Devendra Singh; Casas, Diego de las; Bressand, Florian; Lengyel, Gianna; Lample, Guillaume (2023-10-10). "Mistral 7B". arXiv.org. Retrieved 2024-01-04.
  17. "Mixture of Experts Explained". huggingface.co. Retrieved 2024-01-04.
  18. Marie, Benjamin (2023-12-15). "Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts". Medium. Retrieved 2024-01-04.
  19. Franzen, Carl (2023-12-11). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". VentureBeat. Retrieved 2024-01-04.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.