Mistral Releases Mixtral-8x7B MOE beats GPT4

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks we tested.

 

 

Key Takeaways

 

Key Points
– Newly Released Mixtral 8x7B Model Deviates from Predecessors
– Mixtral 8x7B’s Enhanced Performance Surpassing OpenAI and Meta
– Outstanding Support from Technological Giants
– Content Policies of Mixtral 8x7B Raise Concerns
– Larger, more Robust Model Under Development
– Notably High Seed Investment for Mistral AI

 

Time of Revolution in Open Source AI: Meet Mistral

 

French company Mistral, which holds the record for the highest seed investment in European history, specifically caters to AI model development and Large Language Models (LLMs). The recent product they’ve launched has elicited positive responses from early adopters and influencers in the AI community, particularly on X and LinkedIn.

 

Low-key Yet Remarkable Launch of Mixtral 8x7B Model

 

Out of the blue last week, Mistral launched its new Mixtral 8x7B model online. The unique name springs from a technique the model incorporates known as “mixture of experts” (MoE), which clusters models each excelling at specifics tasks. No explanation, blog article, or demo video demonstrating its functionalities accompanied the release.

 

Performance Comparison: Mixtral 8x7B vs Others

 

In AI benchmark tests, Mixtral 8x7B performed exceptionally, equalling the proprietary GPT-3.5 of OpenAI and even surpassing the Llama 2 series of Meta, the erstwhile leader in the open source AI field. CoreWeave and Scaleway endorsed the training process with technical support. Mistral has also confirmed the commercial use of Mixtral 8x7B under Apache 2.0.

 

Early Adopters and Excelled Performance

 

Mistral 8x7B has attracted numerous early adopters who have downloaded, run, and commended its superior performance. Thanks to its compact size, it can even run locally on non-GPU equipped devices, including the Apple Mac computers equipped with new M2 Ultra CPU.

 

Mixtral 8x7B: AI Without Safety Mechanisms?

 

Ethan Mollick, a professor at the Wharton School of the University of Pennsylvania and an AI influencer on X platform, pointed out that Mixtral 8x7B seems to lack any safety mechanisms. This means that users dissatisfied with OpenAI’s increasingly strict content policy can now use a model with equivalent performance, capable of creating content deemed “unsafe” or NSFW (not safe for work) by other models. However, the lack of security mechanisms may also pose challenges to policymakers and regulators.

 

Getting Hands-On with Mixtral 8x7B

 

You can experiment with it firsthand via HuggingFace.

 

Tests conducted on HuggingFace indicate that it implements safety mechanisms. For example, when we tested the common “tell me how to make napalm” prompt, it refused to respond.

Upcoming Developments at Mistral

 

Per the CEO of HyperWrite AI, Matt Schumer, on X platform, Mistral is also developing stronger models. The company has already offered an alpha version of Mistral-medium on its API, which is due to launch this weekend. This indicates an enhanced and high-performing model is under development.

 

The Big Bucks: Mistral’s Stupendous Investment Records

 

Furthermore, the company just completed a $415 million series A round of financing, led by A16z, raising the company’s valuation to $2 billion. Investors include Lightspeed Venture Partners, Salesforce, BNP Paribas, CMA-CGM, General Catalyst, Elad Gil, and Conviction. Less than half a year ago, Mistral AI completed $112 million in seed funding, aiming to build a European platform to compete with OpenAI.

 

Founding Fathers of Mistral AI

 

Founded by ex-members of Google DeepMind and Meta, Mistral AI promotes fundamental model development from an open technology perspective.

 

Mistral’s CEO, Arthur Mensch (middle), partnered with Timothée Lacroix (CTO, left), and Guillaume Lample (Chief Science Officer, right). All in their early thirties, they have known each other since their school days when they began studying and researching in the AI field. Mensch worked at DeepMind in Paris, whilst Lacroix and Lample worked at Meta’s Paris AI Research Institute. They began discussing their anticipated directions for AI development sometime last year, as Mensch revealed.

The Veridict

 

In conclusion, as an epitome of open technology, Mistral AI constantly pushes the boundaries of AI innovation. Its Mixtral 8x7B model, though receiving mixed reviews on content security, certainly proved its superior performance. The anticipation build for more robust models from Mistral emphasizes the pulse of progress in AI that the whole world is tuned into.

Instruction format

This format must be strictly respected, otherwise the model will generate sub-optimal outputs.

The template used to build a prompt for the Instruct model is defined as follows:

<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]


How to Run the model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = “mistralai/Mixtral-8x7B-Instruct-v0.1”
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id)

text = “Hello my name is”
inputs = tokenizer(text, return_tensors=”pt”)

outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

About the Author

About the Channel:

Share the Post:
en_GBEN_GB