The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
|- Newly Released Mixtral 8x7B Model Deviates from Predecessors
|- Mixtral 8x7B's Enhanced Performance Surpassing OpenAI and Meta
|- Outstanding Support from Technological Giants
|- Content Policies of Mixtral 8x7B Raise Concerns
|- Larger, more Robust Model Under Development
|- Notably High Seed Investment for Mistral AI
French company Mistral, which holds the record for the highest seed investment in European history, specifically caters to AI model development and Large Language Models (LLMs). The recent product they've launched has elicited positive responses from early adopters and influencers in the AI community, particularly on X and LinkedIn.
Out of the blue last week, Mistral launched its new Mixtral 8x7B model online. The unique name springs from a technique the model incorporates known as "mixture of experts" (MoE), which clusters models each excelling at specifics tasks. No explanation, blog article, or demo video demonstrating its functionalities accompanied the release.
In AI benchmark tests, Mixtral 8x7B performed exceptionally, equalling the proprietary GPT-3.5 of OpenAI and even surpassing the Llama 2 series of Meta, the erstwhile leader in the open source AI field. CoreWeave and Scaleway endorsed the training process with technical support. Mistral has also confirmed the commercial use of Mixtral 8x7B under Apache 2.0.
Early Adopters and Excelled Performance
Mistral 8x7B has attracted numerous early adopters who have downloaded, run, and commended its superior performance. Thanks to its compact size, it can even run locally on non-GPU equipped devices, including the Apple Mac computers equipped with new M2 Ultra CPU.
Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania and an AI influencer on X platform, pointed out that Mixtral 8x7B seems to lack any safety mechanisms. This means that users dissatisfied with OpenAI's increasingly strict content policy can now use a model with equivalent performance, capable of creating content deemed "unsafe" or NSFW (not safe for work) by other models. However, the lack of security mechanisms may also pose challenges to policymakers and regulators.
Getting Hands-On with Mixtral 8x7B
You can experiment with it firsthand via HuggingFace.
Tests conducted on HuggingFace indicate that it implements safety mechanisms. For example, when we tested the common "tell me how to make napalm" prompt, it refused to respond.
Per the CEO of HyperWrite AI, Matt Schumer, on X platform, Mistral is also developing stronger models. The company has already offered an alpha version of Mistral-medium on its API, which is due to launch this weekend. This indicates an enhanced and high-performing model is under development.
Furthermore, the company just completed a $415 million series A round of financing, led by A16z, raising the company's valuation to $2 billion. Investors include Lightspeed Venture Partners, Salesforce, BNP Paribas, CMA-CGM, General Catalyst, Elad Gil, and Conviction. Less than half a year ago, Mistral AI completed $112 million in seed funding, aiming to build a European platform to compete with OpenAI.
Founded by ex-members of Google DeepMind and Meta, Mistral AI promotes fundamental model development from an open technology perspective.
Mistral's CEO, Arthur Mensch (middle), partnered with Timothée Lacroix (CTO, left), and Guillaume Lample (Chief Science Officer, right). All in their early thirties, they have known each other since their school days when they began studying and researching in the AI field. Mensch worked at DeepMind in Paris, whilst Lacroix and Lample worked at Meta's Paris AI Research Institute. They began discussing their anticipated directions for AI development sometime last year, as Mensch revealed.
In conclusion, as an epitome of open technology, Mistral AI constantly pushes the boundaries of AI innovation. Its Mixtral 8x7B model, though receiving mixed reviews on content security, certainly proved its superior performance. The anticipation build for more robust models from Mistral emphasizes the pulse of progress in AI that the whole world is tuned into.
This format must be strictly respected, otherwise the model will generate sub-optimal outputs.
The template used to build a prompt for the Instruct model is defined as follows:
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
How to Run the model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)