Apple Enters the AI OpenSource Arena 🍎

PLUS: 13B LLM beats GPT-4 in zero-shot, Automotive Foundation Model, ChatGPT Plus for FREE in Bing

Today’s top AI Highlights:

  1. NexusRaven-V2 outperforms GPT-4 in advanced function calling with unique instruction tuning.

  2. Scale's AFM-1 offers versatile vision tasks for autonomous vehicles, requiring minimal retraining.

  3. Apple's MLX brings efficient machine learning to Mac with Python/C++ APIs and a unified memory model.

  4. Microsoft Copilot integrates GPT-4 Turbo and DALL-E 3, adding advanced features to Bing and Edge.

  5. New Text-to-image model preferred 2.5x over SDXL

& so much more!

Read time: 3 mins

Latest Developments 🌍

Surpassing GPT-4 for Zero-shot Function Calling 🚀

NexusFlow has released NexusRaven-V2, an innovative 13B parameter LLM that demonstrates superior performance to GPT-4 in zero-shot function calling. This enables the transformation of natural language instructions into executable code, a vital feature for copilots and agents utilizing software tools.

Key Highlights:

  1. NexusRaven-V2 excels in function calling, outperforming GPT-4 by up to 7% in success rates. This is particularly evident in complex scenarios involving nested and composite functions, even though NexusRaven-V2 was not trained on these specific functions.

  2. The model is instruction-tuned on Meta's CodeLlama-13B-instruct using data from open-code corpora. It includes open-source utility artifacts for easy replacement of proprietary function calling APIs, along with online demos and Colab notebooks for integration.

  3. NexusRaven-V2 features a diverse range of real-life, human-curated function-calling examples across 9 tasks, with 8 benchmarks open-sourced. The 'nexusraven' Python package further aids developers in integrating the model into their software, allowing easy ingestion of API function descriptions and conversion of function calling code to JSON format.

Scale’s Automotive Foundation Model 🚘

Traditional data engines in autonomous vehicles are restricted to specific tasks and a fixed set of objects and scenarios, often require retraining with each change in data requirements or taxonomies. They also struggle with rare events detection. Scale has launched AFM-1 which is trained on diverse, large-scale street scene data that is capable of handling multiple vision tasks with new taxonomies without requiring fine-tuning.

Key Highlights:

  1. AFM-1, trained on millions of densely labeled images, excels in five key computer vision tasks: object detection, instance segmentation, semantic segmentation, pantopic segmentation, and classification. This broad spectrum of capabilities showcases its versatility in handling various visual data form​.

  2. The model integrates text and image features through a transformer-based neural network. It achieves significant breakthroughs in concept segmentation and detection with reduced training data. For instance, similar concepts like “traffic light” and “traffic signal” yield almost identical results, eliminating the need for training new models for slight taxonomy change​.

  3. Demonstrating exceptional performance, AFM-1 has shown state-of-the-art results in both zero-shot and fine-tuned regimes on significant benchmarks like Berkeley Deep Drive and Cityscapes. This level of performance equals four years of progress by the open source community, marking a substantial leap in segmentation capabilities.

[video-to-gif output image]

Apple's OpenSource ML Framework for Mac

Apple has opensourced MLX, a specialized array framework designed to harness the capabilities of Apple silicon for machine learning applications. This development offers an array of features for researchers and developers, enhancing the efficiency of machine-learning processes on Apple devices.

Key Highlights:

  1. MLX features Python and C++ APIs that closely follow the design of NumPy, ensuring familiarity for users. Additionally, it includes higher-level packages mirroring PyTorch APIs to facilitate the building of complex models.

  2. The framework supports lazy computation, meaning arrays are only materialized when necessary. It also offers dynamic graph construction, allowing for changes in function argument shapes without slow compilations. Moreover, MLX includes composable function transformations for automatic differentiation, automatic vectorization, and computation graph optimization.

  3. Unique to MLX is its unified memory model, where arrays exist in shared memory, enabling operations across supported devices (CPU and GPU) without the need to move data. This approach streamlines the process, making it more efficient and user-friendly for machine learning research and development.

GPT-4 Turbo, DALL.E-3 and Code Interpreter in Copilot 😲

As Copilot marks its first anniversary, Microsoft is testing new features to enhance the capabilities of Copilot and Bing, fundamentally enhancing how we interact with technology in our daily tasks.

Key Highlights:

  1. Copilot is set to integrate OpenAI's latest GPT-4 Turbo, enhancing its ability to tackle complex tasks. Alongside, the introduction of the new DALL-E 3 model promises higher quality and more accurate image creations directly accessible through Bing or Copilot.

  2. Microsoft Edge users will soon benefit from the Inline Compose feature with a rewrite menu, allowing for easy text modification on most websites. Additionally, the Multi-Modal with Search Grounding feature combines GPT-4's power with Bing image and web search data, offering improved image understanding for user queries.

  3. Code Interpreter will assist users in performing complex tasks involving accurate calculations, coding, data analysis, visualization, and more, enhancing Copilot's utility in technical domains.

  4. The upcoming Deep Search feature in Bing will utilize the power of GPT-4 to optimize search results for complex topics. It expands search queries into more comprehensive descriptions, ensuring the delivery of more relevant and detailed search results.

Tools of the Trade ⚒️

  • Playground V2: New text-to-image generative model by Playground AI that is 2.5x preferred over Stable Diffusion XL. It also introduces the FID benchmark using a curated high-quality dataset from Midjourney, emphasizing aesthetics and image-text alignment.

  • Pitch: A presentation software that accelerates the creation of presentations through AI-driven drafts and templates. It offers AI-powered features for quick slide creation, real-time collaboration, and efficient design customization.

  • GPT-4 Vision in PDF.ai: PDF.ai now integrates GPT-4 Vision. Just take a screenshot of the image in PDF and ask ChatGPT directly about it.

  • Zuga: Integrates DALL·E 3 with tldraw ( a mind mapping canvas) where you can generate AI images in a chain, meaning you can create a series of images that are connected to each other conceptually.

Image

😍 Enjoying so far, TWEET NOW to share with your friends!

Hot Takes 🔥

  1. Google seems like a pretty simple website with just a single HTML input box that returns a bunch of links. I don't understand why it needs more than a couple of engineers to run. ~ Bojan Tunguz

  2. Seems like openAI wants to save insane amounts of money on inference by having the bot be more laconic, which works most of the time for prose. But has had this critical unintended side effect on code. ~ shako

Meme of the Day 🤡

r/ProgrammerHumor - ChatGPT

That’s all for today!

See you tomorrow with more such AI-filled content. Don’t forget to subscribe and give your feedback below 👇

Real-time AI Updates 🚨

⚡️ Follow me on Twitter @Saboo_Shubham for lightning-fast AI updates and never miss what’s trending!!

PS: I curate this AI newsletter every day for FREE, your support is what keeps me going. If you find value in what you read, share it with your friends by clicking the share button below!