- AI Collections @Beehiiv
- Posts
- New Leader on OpenLLM Leaderboard 🥇
New Leader on OpenLLM Leaderboard 🥇
PLUS: Tesla's Next-Gen Humanoid Bot, Highest MMLU Score with Prompting Technique
Today’s top AI Highlights:
DeciLM-7B: Fastest and Most Accurate 7B LLM to Date
Prompting Technique Achieves Highest Score Ever Achieved on MMLU
Microsoft’s Phi-2: Competes with Models 25x its Size
Tesla Releases Faster Yet Lighter Humoid Robot
& so much more!
Read time: 3 mins
Latest Developments 🌍
Fastest and Most Accurate 7B Model 📈
DeciAI has released DeciLM 7B, the fastest and most accurate 7B model yet. It has topped the OpenLLM Leaderboard surpassing the previous top performer Mistral 7B, with a score of 61.55. It is not just about the performance but also a significant cost reduction, offering nearly 70% cost savings over its closest competitor.
Key Highlights:
DeciLM-7B is designed using Variable Grouped Query Attention, an enhancement over traditional Multi-Query Attention, providing a balance between speed and accuracy, along with Deci's Neural Architecture Search engine, AutoNAC.
In terms of throughput, it demonstrates a remarkable increase, achieving 1.83x higher throughput than Mistral 7B and 2.39x higher than Llama 2 7B, especially for handling sequences of 2048 tokens. Combined with Infery-LLM, an inference SDK, it boosts the throughput 4.4x greater than Mistral 7B and 5.8x greater than Llama-2-7B.
Deci has also introduced an instruction-tuned variant, DeciLM-7B-instruct that achieves an even higher average score of 63.19 on the Open LLM Leaderboard, making it one of the best-performing models in its class using simple LoRA fine-tuning.
Promot Techniques Propels GPT-4 to Highest-ever Score on MMLU 🚀
Researchers at Microsoft had introduced Medprompt in November to steer the performance of generalist foundation models like GPT-4 towards specialist capabilities of fine-tuned models. Built on medical challenge benchmarks, this prompting technique allowed GPT-4 to even surpass MedPaLM-2 and reduce the error rate by 27%.
It was later found that Medprompt strategy could have more general-purpose application. Steering GPT-4 with a modified version of Medprompt achieves the highest score ever achieved on the complete MMLU.
Key Highlights:
Medprompt Strategy: Medprompt integrates dynamic few-shot selection, self-generated chain of thought (CoT), and majority vote ensembling. It improves the AI's domain-specific adaptation and complex reasoning by tailoring examples based on task similarity and prompting step-by-step reasoning, supplemented by combining multiple outputs for robust responses.
Medprompt+ Strategy: An extension of Medprompt, Medprompt+ adds a simpler prompting method to the original complex strategy. It integrates outputs from both strategies, guided by GPT-4's control strategy and using inferred confidence scores to refine responses.
Performance on MMLU Benchmark: Implementing these strategies has significantly enhanced performance. GPT-4 initially scored 89.1% using Medprompt on the MMLU benchmark, with improvements leading to a record score of 90.10% after incorporating Medprompt+.
Surprising Power of Small Language Models 💪
Microsoft has again challenged the notion that larger models are always better. Following on the series of models “Phi”, the team has introduced Phi-2, a 2.7 billion-parameter model, exhibits exceptional reasoning and language understanding, competing with models up to 25x its size. Due to its compact size, is ideal for research in areas like mechanistic interpretability and safety improvements.
Key Highlights:
Phi-2 is trained on a mix of synthetic datasets and web data, emphasizing textbook-quality content to enhance common sense reasoning and general knowledge. This approach deviates from the trend of simply increasing model size and demonstrates that data quality is crucial for model performance.
Despite not undergoing alignment through reinforcement learning from human feedback (RLHF), Phi-2 exhibits better behavior in terms of toxicity and bias compared to similar models that have undergone such processes.
Phi-2 performs exceptionally well outperforming Llama 2 13B in common sense reasoning, language, math and coding and closely following Llama 2 70B. It surpasses even Mistral 7B in reasoning tasks. It further outperforms Gemini Nano 3.2B across all benchmarks.
Tesla’s Faster Yet Lighter Humaoid Robot 🦿
Tesla has unveiled its Optimus Gen-2, its next generation of humanoid robot (with a very enaging video demo). This advanced humanoid robot features Tesla's own design of actuators and sensors, enabling refined movement and efficiency. Notably, it boasts a 2 Degrees of Freedom (DoF) actuated neck, which allows for more flexible neck movements.
A significant enhancement is its 30% increase in walking speed. Despite these advancements, Tesla has managed to reduce the robot's weight by 10kg without compromising its capabilities. The Optimus Gen-2 also excels in balance and full body control, attributes that are further enhanced by its new, faster hands and tactile sensing on all fingers, allowing the robot to manipulate delicate objects with a high degree of precision.
It’ll be exciting to see how soon Tesla’ robot will hit the workforce.
Tools of the Trade ⚒️
Motion Effects in Modyfi: Quickly transform your designs into scroll-stopping loops, create insane motion effects in seconds, and edit your motion design in real-time, during playback.
BricksAI Cloud: Addresses the lack of granular control and monitoring of LLMs, it is a SaaS solution to manage LLM spendings and usage on a detailed level, such as per user, project, environment, or feature. It offers direct insights into LLM usage with fine-grained metrics for each API key.
Voqal: AI-powered voice assistant for coding, transforming traditional GUI of IDEs into a vocal UI. It understands not just the words spoken but also their meaning in the context of software development. This allows for more intuitive and efficient vocal commands.
Lightning AI Studio: A comprehensive cloud-based environment for AI development and deployment. The platform eliminates the need for setting up a local environment, allowing users to code, prototype, train models, serve, and prepare data all in one place.
😍 Enjoying so far, TWEET NOW to share with your friends!
Hot Takes 🔥
Predictions for 2024 - 5 GPT-4 class models by the end of Q1 - 1 open-source (Llama?) and at least 4 closed-source - GPT 4.5 will be released in Q1, 5.0 released in Q2 - LLM hype and doomerism start to die down - AGI is NOT achieved Life goes on... ~ Bindu Reddy
An LLM is a snapshot of a civilization. There will be a LOT more localized LLMs that represent different cultures, political spectrum, religious beliefs, and region-specific regulations. Winners won't take it all. ~ Jim Fan
Meme of the Day 🤡
That’s all for today!
See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇
Real-time AI Updates 🚨
⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!
PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!