Hilbert-AI, DLF Forum, Cybercity Cybercity, Phase III, Gurugram (2026)

03/08/2024

Google’s Gemini 1.5 Pro dethrones GPT-4o

Google’s experimental Gemini 1.5 Pro model has surpassed OpenAI’s GPT-4o in generative AI benchmarks.

For the past year, OpenAI’s GPT-4o and Anthropic’s Claude-3 have dominated the landscape. However, the latest version of Gemini 1.5 Pro appears to have taken the lead.

One of the most widely recognised benchmarks in the AI community is the LMSYS Chatbot Arena, which evaluates models on various tasks and assigns an overall competency score. On this leaderboard, GPT-4o achieved a score of 1,286, while Claude-3 secured a commendable 1,271. A previous iteration of Gemini 1.5 Pro had scored 1,261.

The experimental version of Gemini 1.5 Pro (designated as Gemini 1.5 Pro 0801) surpassed its closest rivals with an impressive score of 1,300. This significant improvement suggests that Google’s latest model may possess greater overall capabilities than its competitors.

https://deepmind.google/technologies/gemini/pro/

02/08/2024

Meta release Segment Anything 2: the most powerful video and image segmentation model.

Following up on the success of the Meta Segment Anything Model (SAM) for images, meta released SAM 2, a unified model for real-time promptable object segmentation in images and videos that achieves state-of-the-art performance.

Code and model weights with a permissive Apache 2.0 license.

Meta also shared the SA-V dataset, which includes approximately 51,000 real-world videos and more than 600,000 masklets (spatio-temporal masks).

SAM 2 can segment any object in any video or image—even for objects and visual domains it has not seen previously, enabling a diverse range of use cases without custom adaptation.

SAM 2 has many potential real-world applications. For example, the outputs of SAM 2 can be used with a generative video model to create new video effects and unlock new creative applications. SAM 2 could also aid in faster annotation tools for visual data to build better computer vision systems.

https://ai.meta.com/blog/segment-anything-2/

24/03/2024

Blackwell B200 GPU: The World's Most Powerful Chip

NVIDIA's GPU Technology Conference (GTC) is as a key conference for the latest advancements in GPU technology, AI, and deep learning. This annual event gathers experts, scientists, and developers to discuss the applications and implications of advanced computing across various industries.

Blackwell B200 GPU
NVIDIA has unveiled the Blackwell B200 GPU, termed the 'world's most powerful chip' for AI, designed to democratize access to trillion-parameter AI models. This launch is set to expand NVIDIA's lead in the AI market.

Performance: The B200 offers 20 petaflops of FP4 performance with its 208 billion transistors.

Efficiency: When paired as a GB200 with a Grace CPU, it delivers 30x the LLM inference workload performance, reducing costs and energy up to 25x compared to the H100.

Capacity: Capable of training a 1.8 trillion parameter model with just 2,000 GPUs using four megawatts, compared to the previous 8,000 Hopper GPUs.

Speed: Exhibits a 7x performance boost on GPT-3 LLM (175 billion parameters) and 4x faster training speed compared to the H100.

21/03/2024

Grok-1 is Now Open-Source: The Largest Open LLM

What's New?
xAI finally open-source Grok-1 making it the largest open LLM ever built.

With 314-billion parameters, the Mixture of Experts (MoE) model utilizes 86 billion active parameters at any given time, enhancing its processing capabilities.

Unlike traditional models, Grok-1 employs Rotary Embeddings, avoiding fixed positional limitations and supporting a more dynamic data interpretation.

Key Specifications:
Parameters: 314 billion, with 25% of weights active per token.

Architecture: Mixture of 8 Experts, using 2 per token.

Layers: 64 transformer layers, integrating multihead attention and dense blocks.

Tokenization: Utilizes a SentencePiece tokenizer, vocab size of 131,072.

Embedding and Positional Encoding: 6,144 embedding size, matching rotary positional embeddings.

Attention: 48 heads for queries, 8 for keys/values, each with a size of 128.

Context Length: Capable of processing 8,192 tokens with bf16 precision.

Performance Metrics:

Outperforms LLaMa 2 70B and Mixtral 8x7B with a MMLU score of 73%, showcasing its efficiency and accuracy in various tests.

Implementation Details:

Requires significant GPU resources due to its size.

Uses an inefficient MoE layer implementation to avoid custom kernel needs, focusing on model correctness validation.

The model supports activation sharding and 8-bit quantization to optimize performance.

Open-Source Availability:

Released under the Apache 2.0 license, Grok-1’s weights and architecture are accessible for community use and contribution.

Check:
https://github.com/xai-org/grok-1

08/03/2024

Microsoft's Introduces 1-bit LLMs

Microsoft released BitNet b1.58, a new Large Language Model (LLM) using 1.58 bits per parameter, reducing computational demands significantly while maintaining performance.

Unlike traditional 16-bit models, BitNet employs ternary values (-1, 0, 1), slashing GPU memory use and energy consumption by up to 3.5 times and 71 times respectively, without sacrificing model accuracy.

BitNet b1.58 achieves comparable or superior results to FP16 models like LLaMA 3B in perplexity and various language tasks starting from a 3 billion parameter size.

The reduction in precision to 1-bit minimizes the need for energy-heavy floating-point operations, particularly in matrix multiplication, speeding up computations and lowering energy costs.

This model's efficiency improves as it scales, offering significant performance enhancements, especially in larger models. For instance, at 70 billion parameters, BitNet is over four times faster than traditional models, increasing throughput and reducing latency.
https://arxiv.org/html/2402.17764v1

06/03/2024

Anthropic Releases Claude 3 Beating GPT-4 on Every Benchmark

Anthropic recently unveiled its Claude 3 series, a new family of benchmark-crushing models: Opus, Sonnet, and Haiku.
Opus and Sonnet are now available to use in claude.ai and the Claude API which is now generally available in 159 countries. Haiku will be available soon.

Model Specifications:

Opus: Targets advanced intelligence for in-depth processing.

Sonnet: Balances speed with efficiency for scalable tasks.

Haiku: Engineered for the fastest response in live interactions.

Performance and Capabilities:

Benchmark: Opus leads over competitors like GPT-4 and Gemini Ultra in GSM-8k and MMLU benchmarks, indicating superior mathematical reasoning and expert knowledge capabilities.
Multimodal Functionality: Supports text and image inputs, vital for parsing complex, unstructured data across various formats
Enhanced Context and Recall:
Extended Context Window: Initially offering a 200K token context window, with capabilities to handle inputs exceeding 1 million tokens.
Near-Perfect Recall: Demonstrates robust information recall from extensive datasets.
Pricing and capabilities: Opus is priced at $15 per million tokens, a nod to its advanced capabilities over competitors like GPT-4 Turbo, which costs $10.

Multimodal: One of the standout features of the Claude 3 series is its multimodal AI capabilities, enabling the models to process both text and image inputs

Bias Mitigation and Safety: In response to concerns over AI biases, Anthropic has developed the "Constitutional AI" framework, aiming for model neutrality. However, completely eliminating bias remains a challenging task.

TRY:
https://www.anthropic.com/claude

02/03/2024

Mistral Large: A LLM That Outperforms Every Model Apart From GPT-4

After several weeks of speculations about a new model, Mistral AI has officially released Large - its latest and most performant model- along with Le Chat, a beta of their chat UI.

Large is the world’s second best language model available through an API after GPT-4.

Mistral Large boasts a performance of 81.2% on MMLU (measuring massive multitask language understanding), beating Claude 2, Gemini Pro and Llama-2-70B. Large is particularly good at common sense and reasoning, with a 94.2% accuracy on the Arc Challenge (5 shot).

Mistral Small was also updated on the API to a faster and more performant model than Mixtral 8x7B.

Some of Mistral’s features include:

Training on English, French, Spanish and Italian datasets for native multilingual capabilities.

32k token context window - way below Gemini’s 1M token or Claude’s 200k token ones.

Precise instruction-following that were used to moderate the Chat interface.

Native function calling capabilities for agentic capabilities similar to ChatGPT Plus.

https://auth.mistral.ai/ui/login?flow=41a68ceb-db53-4476-b86e-ef6145963dd7

25/02/2024

A new open-source bomb dropped when researchers from the Berkeley AI research lab

What's New?
A new open-source bomb dropped when researchers from the Berkeley AI research lab, led by Prof Pieter Abbeel, released Large World Model (LWM), a family of general-purpose large-context multimodal autoregressive models.
These models were trained on several multimodal datasets (text, images, videos). Using next token prediction, they can generate data across all these modalities, over a context of up to 1M tokens.

Ring Attention
The Ring Attention technique was used to gradually and cost-efficiently scale the training context from 4k to 1M tokens. The solution to the challenges associated with training on both video and language was to effectively train on different sequence lengths, with a weighted contribution of language and vision.

Ring Attention refers to an advanced attention mechanism that improves how language models handle large context sizes. By distributing the input sequence across multiple devices, the attention matrix can be computed without materializing it entirely.

Performance:
LWM beats Gemini Pro on single needle retrieval, and ties with GPT-4. Other tests like multi needle retrieval and long video understanding are also performed.

Similarly to its closed source counterparts, LWM is also capable of generating high quality videos (Sora) and answering questions over a 1 hour video (Gemini 1.5 Pro).

Learn more (code): https://largeworldmodel.github.io/

23/02/2024

Google Releases Gemma: The Best Open LLM Yet

Google just unveiled Gemma, a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models.

Key Features of Gemma:

Model Variants: Gemma comes in 2B and 7B parameter versions, each available in Instruction and base (non-multimodal) formats.
Commercial Use: Fully authorized for commercial applications, opening up new possibilities for businesses.
Context Window: Both models support a 8192 token context window.
Performance: The 7B model notably outperforms competitors such as Mistral AI 7B and LLaMa 2 in Human Eval and MMLU tests, scoring 64.56 on MMLU.
Learn More:
https://ai.google.dev/gemma/?utm_source=keyword&utm_medium=referral&utm_campaign=gemma_cta&utm_content

20/02/2024

Karpathy Leaves OpenAI and Releases BPE Algorithm Implementation

Andrej Karpathy, known for his significant contributions to OpenAI and as the former head of Tesla's autopilot team, recently announced his departure from OpenAI to dedicate time to personal projects, and he's already making waves.

Karpathy clarified that his decision to leave was amicable, with no underlying issues prompting the move. After rejoining OpenAI about a year ago—having first left in 2017—his projects and responsibilities have now been transferred to another senior researcher at the organization.

Checkout BPE:
https://github.com/karpathy/minbpe

19/02/2024

Amazon trains 980M parameter LLM with ’emergent abilities’

Researchers at Amazon have trained a new large language model (LLM) for text-to-speech that they claim exhibits “emergent” abilities.
The 980 million parameter model, called BASE TTS, is the largest text-to-speech model yet created. The researchers trained models of various sizes on up to 100,000 hours of public domain speech data to see if they would observe the same performance leaps that occur in natural language processing models once they grow past a certain scale.

They found that their medium-sized 400 million parameter model – trained on 10,000 hours of audio – showed a marked improvement in versatility and robustness on tricky test sentences.
https://www.amazon.science/publications/base-tts-lessons-from-building-a-billion-parameter-text-to-speech-model-on-100k-hours-of-data

18/02/2024

NVIDIA Launches RTX - A Personalized LLM Chatbot

Nvidia just released "Chat with RTX" a local app that allows you to create a personal AI chatbot (LLM) based on your own content.

Rather than searching through notes or saved content, users can simply type queries. For example, one could ask, “What was the restaurant my partner recommended while in Las Vegas?” and Chat with RTX will scan local files the user points it to and provide the answer with context.
https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generative-ai/

Hilbert-AI

03/08/2024

02/08/2024

24/03/2024

21/03/2024

08/03/2024

06/03/2024

02/03/2024

25/02/2024

23/02/2024

20/02/2024

19/02/2024

18/02/2024

Address

Website

Alerts

Shortcuts

Share

Category