Our Pricing

See what Lamini can do
See the full model lifecycle: choose a model, run RAG, try tuning, see evals, and use for inference
Compare top open source models (Llama 3, Mistral 2, Phi 3) in our playground
200 inference requests per month
Limited tuning requests
Runs on Lamini's optimized compute platform, using LoRA, PEFT, Model/Data Parallelism, Sharding, and more
Tune models and deploy them anywhere
Run on your own GPUs or reserve dedicated GPUs from Lamini
Unlimited tuning and inference
Unmatched inference throughput
Full evaluation suite
Access to world-class ML experts
Enterprise white glove support
Upto 10 projects
Customizable dashboard
Upto 50 tasks
Upto 1 GB storage
Upto 10 projects
Customizable dashboard
Upto 50 tasks
Upto 1 GB storage
Unlimited proofings
Unlimited custom fields
Unlimited milestones
Unlimited timeline
Frequently asked questions (FAQ's)

How is this different than just using a single provider’s (e.g. OpenAI’s) APIs off the shelf?
3 major reasons from our customers:

1. Data privacy: Use private data in your own secure environment. Use all of your data, rather than what fits into a prompt.

2. Ownership & Flexibility: Own the LLM you train with your own engineering team, and flexibly swap out models as new ones appear each day. Build up AI know-how and an AI moat internally at your company, while getting a big infrastructure lift.

3. Control (cost, latency, throughput): With ownership, you also have more control over the (much lower) cost and (much lower) latency of the model. We expose these controls in an easy interface for your engineering team to customize.
What does the LLM platform do?
Our platform runs and optimizes your LLM.

It brings several of the latest technologies and research to bear that was able to make ChatGPT from GPT-3, as well as Github Copilot from Codex.

These include, among others, fine-tuning, RLHF, retrieval-augmented training, data augmentation, and GPU optimization.
How expensive is using your platform to build and use my model?
We have a free tier, where training a small LLM is free. You also get $20 free credits with each new account for inference.

Please see our contact page for our Enterprise tier, where you can download the model weights, and are not limited to model size, type, and can control throughput and latency.
Do you build my own large model?
Yes, the resulting model is very large! It depends on what base model you select, or have us automatically select based on your data and use case.

However, what’s exciting is that it builds on the great work before it, such as GPT-3 or ChatGPT. These general purpose models know English and can answer in the general vicinity of your tasks.

We take it to the next level to teach it to understand your company’s language and specific use cases, by using your data.

This means it will do better than a general purpose model on tasks that matter to you, but it won’t be as good as a general purpose model on generic tasks without any data, e.g. putting together a grocery list (unless that’s what your company does).
What LLMs is the LLM platform using under the hood?
We build on the latest generation of models to make your LLM perform best, including any LLM on HuggingFace and OpenAI's models. We work with customers to determine which models are appropriate, depending on your specific use cases and data constraints.

For example, using OpenAI’s well-known GPT-3 and ChatGPT models are a great starting point for some customers, but not others. We believe in following your needs, and using the best models for you.
If I want to export the model and run it myself, can I do that?
Yes, we deploy to any cloud service (e.g. VPC) and on premise.

This includes setup for running your LLM in your own environment for scaled inference.

If you want to, you can export the weights from our platform and host the LLM yourself.