Friendli AI

Friendli AI Supercharge Generative AI Inference
Efficient, fast, and reliable generative AI inference solution for production

We’re excited to announce that our CEO, Byung-Gon Chun, will be taking the keynote stage at SuperAI Singapore-the region...
05/20/2025

We’re excited to announce that our CEO, Byung-Gon Chun, will be taking the keynote stage at SuperAI Singapore-the region’s largest AI conference-on June 18 at 2:15pm (SGT)!

Join us at Marina Bay Sands as he shares insights on the future of AI alongside 150+ global leaders and innovators. SuperAI 2025 brings together 7,500+ attendees from 90+ countries to explore the next wave of transformative AI technologies, from robotics and LLMs to finance and healthcare.

Don’t miss the chance to connect with us at our booth and be part of Asia’s premier AI gathering!

🔗 Learn more about SuperAI : https://www.superai.com/

Did you know that Friendli TCache—the caching layer of the Friendli infrastructure—supports prefix caching not only for ...
05/19/2025

Did you know that Friendli TCache—the caching layer of the Friendli infrastructure—supports prefix caching not only for text, but also for images, videos, and more?
That means faster and more cost-efficient inference for any multimodal AI application.

From healthcare chatbots to video analysis and e-commerce, Friendli TCache lets you reuse encoded data, reduce redundant computation, and deliver lightning-fast results.

Curious how it works and how it can supercharge your multimodal AI applications? Check out our latest blog post for a deep dive!

🔗 Learn more in our blog: https://friendli.ai/blog/friendli-tcache-flexible-multimodal-prefix-caching

Friendli TCache expands prefix caching beyond text, enabling support for multimodal inputs like image and video embeddings. This unlocks faster inference and improved efficiency for modern AI workloads.

Big News: 370,000+ AI Models Now on FriendliAI! We’ve just expanded our Models Page to feature over 370K AI models cover...
05/15/2025

Big News: 370,000+ AI Models Now on FriendliAI!

We’ve just expanded our Models Page to feature over 370K AI models covering language, audio, image, and video.

Instantly deploy any of the 370K+ models from Hugging Face with one click and experience FriendliAI’s industry-leading inference speed.

Explore our models page : https://friendli.ai/models
📖 Read more: https://friendli.ai/blog/models-page

Hack, Network, Win: AI Observability & Agentic RAG Night in NYC!We're thrilled to sponsor this exciting evening filled w...
05/15/2025

Hack, Network, Win: AI Observability & Agentic RAG Night in NYC!

We're thrilled to sponsor this exciting evening filled with LLMs, AI Observability, Vector DBs, Agentic RAG, and more. Enjoy lightning talks, hands-on hacking sessions, and great networking opportunities with awesome prizes up for grabs!

Catch Soomin Chun’s talk on deploying and scaling AI inference effortlessly with FriendliAI.

Don’t miss this chance to connect and innovate with fellow AI enthusiasts. See you there!

🔗 Event : https://lu.ma/hacknight-at-msft-05-20-2025

Hack Night NYC is back with LLMs, AI Observability, Vector Databases, Agentic RAG and more! This evening is brought to you by Weaviate and Comet. 🏆 Evening of…

Excited to announce that our CTO, Gyeong-In Yu, will take the stage at Weights & Biases Fully Connected 2025 in San Fran...
05/12/2025

Excited to announce that our CTO, Gyeong-In Yu, will take the stage at Weights & Biases Fully Connected 2025 in San Francisco, sharing hard-won lessons and practical strategies for scaling GenAI inference!

He’ll reveal real-world techniques and optimizations drawn from the front lines of generative AI.

Hosted by Weights & Biases, this two-day event brings together AI pioneers and practitioners for hands-on workshops and inspiring talks on the future of AI agent development-from prototype to production.

Catch Gyeong-In on Day 2 at 1:15pm during the AI Pioneer Speaker Series.

Let’s meet up and shape the next wave of AI Agent innovation together!

🔗 Event : https://wandb.ai/site/resources/events/fully-connected/

Why manage separate models for every task? With Multi-LoRA adapters on FriendliAI, you can deploy multiple Hugging Face ...
05/06/2025

Why manage separate models for every task?
With Multi-LoRA adapters on FriendliAI, you can deploy multiple Hugging Face LoRA adapters on a single base model-saving memory and boosting efficiency.

Key benefits:
1️⃣ Load lightweight adapters instead of full models
2️⃣ Instantly switch tasks by selecting the right adapter at inference-no redeployment needed
3️⃣ Scale easily by serving multiple tasks from one GPU

FriendliAI is the only platform offering real-time, per-request LoRA adapter switching. Streamline your AI deployments and build adaptable, production-ready systems with ease!

🔗 Learn more in our blog: https://friendli.ai/blog/how-to-use-hugging-face-multi-lora-adapters

Learn how to use multiple LoRA adapters from Hugging Face.

Ever wondered how those magical, Studio Ghibli-style AI artworks are created? Discover how Low-Rank Adaptation (LoRA) ma...
05/05/2025

Ever wondered how those magical, Studio Ghibli-style AI artworks are created?

Discover how Low-Rank Adaptation (LoRA) makes it possible-enabling massive models to learn new styles or tasks with just a few images and minimal computation.

Our latest blog breaks down how LoRA unlocks creative, scalable AI, and how FriendliAI’s Multi-LoRA support brings production-ready flexibility for any use case.

Curious how adapter-based fine-tuning is changing the AI game?

🔗 Learn more in our blog: https://friendli.ai/blog/how-lora-brings-ghibli-style-ai-art-to-life

Read the full story and stay tuned for Part 2!

A deep dive into Low-Rank Adaptation (LoRA): the concepts, benefits, variants, and use cases – plus how FriendliAI enables scalable AI deployment in production with Multi-LoRA support.

Introducing Qwen 3: Now Available on FriendliAI!  Developed by Alibaba Cloud , Qwen 3 represents the cutting edge of AI ...
05/02/2025

Introducing Qwen 3: Now Available on FriendliAI!
Developed by Alibaba Cloud , Qwen 3 represents the cutting edge of AI innovation, designed to meet the needs of businesses worldwide.

What makes Qwen 3 a game-changer?
• Hybrid Reasoning: Combines speed and depth for smarter, more flexible decisions.
• Advanced Capabilities: Excels in coding, tool use, and browsing with ease.
• Global Reach: Communicates seamlessly in 119 languages.
• Efficiency Redefined: Delivers top performance with ultra-efficient Mixture-of-Experts models.
• Open Source: Driving innovation as a fully open-source solution under Apache 2.0.

Qwen 3 is now available on Friendli Dedicated Endpoints to take your AI capabilities to the next level.

Ready to lead the way in AI innovation?

👉 Get started today!
Qwen 3 235B: https://friendli.ai/deploy-model/Qwen/Qwen3-235B-A22B
Qwen 3 32B: https://friendli.ai/deploy-model/Qwen/Qwen3-32B
Qwen 3 8B: https://friendli.ai/deploy-model/Qwen/Qwen3-8B

04/21/2025

Fresh Update: Meta’s LLaMA 4 Running Smoothly on 4×H100s with FriendliAI

Meta’s latest, LLaMA 4 Maverick, is now live and flying—fully deployed on just 4×H100 GPUs using FriendliAI’s optimized infrastructure.

No setup headaches. No scaling delays. Just fast, efficient, expert-trusted inference.

Better yet? You can try it instantly via Hugging Face, powered by FriendliAI—whether you’re benchmarking, prototyping, or running real workloads.

💡 Built for teams that want speed without compromise.
🧠 Trusted by experts.
⚙️ Production-ready from day one.

🔗 Try it on Hugging Face: https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct

🦙 Llama 4 is now live on Friendli — deploy instantly from Hugging Face!Build with confidence: blazing-fast performance, ...
04/10/2025

🦙 Llama 4 is now live on Friendli — deploy instantly from Hugging Face!
Build with confidence: blazing-fast performance, massive context, dramatically lower cost.

🔍 Overview of Llama 4
Llama 4 is Meta’s newest family of large language models, consisting of Maverick and Scout.
Both models support multimodal input (text + image) and are available under the Llama 4 Community License.

🧠 Llama 4 Maverick
・ 17B active params / 128 experts MoE / 400B total parameters
・ 1M-token context window
・ Optimized for multilingual and visual tasks, and suitable for large-scale applications across enterprise settings.
👉 https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct

🧠 Llama 4 Scout
・ 17B active params / 16 experts MoE / 109B total parameters
・ 10M-token context window
・ Efficient for multi-document analysis, long-context reasoning, and building customized AI tools.
👉 https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct

⚡ Why Use Llama 4 on FriendliAI?
✅ 1-click dedicated deployment from Hugging Face
✅ Blazing-fast, low-cost inference

🔗 Read the blog : https://lnkd.in/eyknVcgU

💡 Sample Use Cases
・ Build assistants that understand both images and multilingual text
・ Analyze full-length documents or large codebases
・ Deploy internal GenAI tools with long-context and cost efficiency

🚀 Introducing Our New Side-by-Side Comparison Playground for Multimodal AI Models!Choosing the right AI model just got e...
03/25/2025

🚀 Introducing Our New Side-by-Side Comparison Playground for Multimodal AI Models!
Choosing the right AI model just got easier! Our latest update allows you to compare outputs from multiple multimodal AI models side-by-side, streamlining your decision-making process.

🔍 Why Compare?
Different models excel in various tasks, and quick comparisons let you:
Identify the best fit for your specific use case
Evaluate performance and uncover potential issues
Benchmark against state-of-the-art models
Gain valuable insights into model behavior

💡 Key Benefits
Faster Evaluations: Get real-time results in one view—no more toggling between tools.
Informed Choices: Assess model strengths and weaknesses to find your perfect fit.
Cost-Effective: Reduce GPU costs by over 50% while maximizing ROI.
Seamless Deployment from Hugging Face: Easily deploy and compare your models with just one click.

Ready to elevate your AI projects? Check out our blog for a detailed walkthrough of this powerful feature!

🔗 Read the blog: http://friendli.ai/blog/compare-multimodal-ai-models

Build and serve custom generative AI model with Friendli Endpoints, saving GPU costs and accelerating AI inference. FriendliAI offers best inference solutions to optimize LLM.

💪 Discover 130K+ AI Models on FriendliAI's New Model List Page!We’re excited to launch our new Model List Page, your go-...
03/25/2025

💪 Discover 130K+ AI Models on FriendliAI's New Model List Page!

We’re excited to launch our new Model List Page, your go-to hub for exploring and deploying over 130,000 AI models.

From language models to image generation and multimodal tasks, find the perfect model for your needs and deploy it with just one click!

✨ What’s New?
✔️ Diverse AI models for text, audio, video, and more
✔️ Seamless integration from Hugging Face
✔️ Optimized models for faster, cost-effective performance
✔️ Easy access to Friendli Dedicated and Serverless Endpoints

Start exploring today and take your AI projects to the next level!
📖 Read more: https://friendli.ai/blog/models-page

How to explore and deploy AI models effortlessly on FriendlAI’s models page.

Address

303 Twin Dolphin Drive, Suite 600 PMB 6009
Redwood City, CA
94065

Alerts

Be the first to know and let us send you an email when Friendli AI posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Contact The Business

Send a message to Friendli AI:

Share