Lamini Raises $25M For Enterprises To Develop Top LLMs In-House

Sharon Zhou, CEO

TL;DR

  • Lamini, the Enterprise AI platform, makes it possible for software teams within enterprises to develop new LLM capabilities that reduce hallucinations on proprietary data, run their LLMs securely from cloud VPCs to on premise, and scale their infrastructure with model evaluations that prioritize ROI and business outcomes over hype.
  • We call this Expert AI, designed for proprietary data in a proprietary ecosystem, which requires a fundamentally different approach than General AI on public data for public use.
  • Lamini has raised $25 million from notable investors in AI, technology, and enterprise, including Amplify Partners (led Series A), First Round Capital (led Seed), Andrew Ng, Andrej Karpathy, Bernard Arnault, Pierre Lamond, Sarah Guo, Dylan Field, Lip-Bu Tan, Drew Houston, Anthony Schiller, AMD Ventures, and others.

Lamini raises $25M for enterprises to turn proprietary expertise into the next generation of LLM capabilities.

The future of generative AI is in the hands of enterprises, who can unlock their vast proprietary data that foundational models haven’t been trained on and turn them into ever more powerful capabilities through LLMs. This untouched data enables such new capabilities by understanding new domains deeply, like a team of human experts within an organization, reaching far beyond simple reasoning, coding, and question answering that public general models are capable of.

Lamini is the Enterprise AI platform designed for enterprises to fully take advantage of their proprietary data, by turning that data into new LLM capabilities, deploying securely to vendor-agnostic compute options, and empowering their in-house software teams to uplevel to OpenAI-level AI teams. We call this Expert AI, which differs from General AI both in its deeper capabilities and how to successfully build it.

Today, we’re thrilled to announce that Lamini has raised $25 million from notable investors in AI, technology, and enterprise, including Amplify Partners (led Series A), First Round Capital (led Seed), Andrew Ng, Andrej Karpathy, Bernard Arnault, Pierre Lamond, Sarah Guo, Dylan Field, Lip-Bu Tan, Drew Houston, AMD Ventures, and others.

"We believe there's a massive opportunity for generative AI in enterprises. While there are a number of AI infrastructure companies, Lamini is the first one I've seen that is taking the problems of the enterprise seriously and creating a solution that helps enterprises unlock the tremendous value of their private data while satisfying even the most stringent compliance and security requirements. Lamini has been growing exponentially, working with major global enterprises. We're excited to invest in Lamini and partner with the team to bring Expert AI to every enterprise."
— Mike Dauber, General Partner, Amplify Partners

Since launching in April 2023, Lamini has been growing fast with numerous achievements:

  • LLM Photographic Memory Evaluation Suite points to the high hallucination rate that complex RAG and GPT-4 still create. This hallucinations are native to general LLMs but are a dealbreaker for enterprises.
  • Multi-node LLM Training dives in our deeper technical approach to training on thousands of GPUs, making a 1000 hr process only 1 hr, and thus enabling developer teams to concurrently run and tune models efficiently — making efficient collaboration on GenAI within a team possible.
  • Guaranteed JSON Output on Lamini's inference engine is the only way for developers today to enable all LLM inference calls to be 100% in the format you need it to be in — the true holy grail of function calling. This avoids the need for parsers, and makes LLMs output machine-readable.
  • Parameter-Efficient Fine-Tuning (PEFT) showcases our production LoRA stack that can tune millions of adapters, making it as efficient and cost-effective to finetune as it is to do RAG.

We also partnered with AMD, Snowflake, Databricks, Nutanix, Meta, Mistral, and more. 

The new funding will accelerate development into deeper technical optimizations and expand our team to offer enterprises the strategic support they need at scale. We are also proactively expanding our GPU cloud infrastructure globally, with new GPUs coming online (including more AMD!) to meet the accelerating demand from customers across new regions.

"Many founders these days are newcomers to generative AI and LLMs — but not Sharon and Greg. They've been working on this technology for over a decade and are world-class experts in AI and high-performance machine learning systems. That knowledge combined with a deeply customer-centric approach to solving problems for big enterprises creates a massive advantage for their customers. We're excited to partner with Lamini and believe their vision will transform the way enterprises use and adopt AI."
— Todd Jackson, Partner, First Round Capital

Enterprises can't follow the standard AI playbook.

Out of the 100+ enterprise executives we've talked to, the top priority of nearly every CEO, CIO, and CTO is to take advantage of generative AI within their organization with maximal return on investment (ROI). Although it's easy to get a working demo on a laptop for an individual developer, the path to production is strewn with failures. Proofs of concept, demos, and prototypes abound internally, with a rollercoaster of optimism followed by disappointment when there's still no clear path to a killer app with strong ROI in production. The common blockers?

  1. Model quality: The model hallucinates or produces responses by fabricating incorrect and false information. For example, even if you tell the large language model (LLM) that your revenue is $100B in the prompt, it sometimes still confidently states that it's $10M because it has learned and committed to that information in the past.
  2. Infrastructure to support engineering teams: While there are many great open-source libraries, many are designed and built for research, not production. Scaling beyond a single test workstation is a different orchestration and optimization problem that teams of more than one engineer face when shipping an LLM in production.
  3. Security: Compliance and security requirements block enterprise deployments because the importance of data has dramatically increased. LLMs are the most valuable derivatives today, so it's crucial to get governance right to prevent sensitive data leaks. Where data is safely stored today is where LLMs can be safely deployed.
  4. ROI: The ROI on these developments has to make sense. Unlike AI companies, enterprises will not invest $1B upfront without seeing a return on investment. They want to invest little, see ROI, then invest more.

Compounding these issues is the rapidly evolving AI ecosystem, where enterprises lack the right expertise and struggle to keep pace with this dynamism, risking wasting time, talent, and falling behind competitors.

The solution to these problems is not General AI, but Expert AI, which prioritizes depth in an expert subject area over breadth. Expert AI is tailored to grok key facts and reason within specific subdomains, akin to human subject matter experts, while still maintaining the ability to communicate effectively across relevant common sense areas. Expert AI has photographic memory to commit to critical facts and figures related to an enterprise's proprietary data. We will share more technical details about dramatic hallucination reduction in a blog post soon.

And, because it is such a deep derivative of proprietary expertise, Expert AI must be deployed in a way that respects existing security and governance protocols. The process of building Expert AI requires initial small investments across an enterprise's developer teams, that is then capable of scaling up reliably when there's clear ROI. That is to say, Expert AI is fundamentally different to build, but we believe it’s the only way for enterprises to meaningfully win and find high, repeatable ROI with generative AI going forward.

General-purpose LLMs are designed to hallucinate.

As previously hinted at in this tweet by Andrej Karpathy, general-purpose LLMs are designed to hallucinate and answer all the questions humans can think of. The way these models are trained is by rewarding "good behavior" in a general context—but that same "good behavior" is detrimental in many expert contexts, where the data, value system, or truth has changed.

This is something that prompt-engineering and retrieval-augmented generation (RAG) cannot solve. These techniques try to nudge but not enforce the weights of the AI, what the new truth or value system is, or what fundamentally “correct” or “good” is. Even something as simple as a clause change in a contract, while it can be retrieved correctly, is still hallucinated to be the wrong clause by the model. 

Ultimately, it all comes down to accuracy. It's great if your models are fast and cost-efficient, but if every response isn't specific, useful, and grounded in real data, your project is going to fail! Enterprises have a major asset: tons of proprietary data gathered over decades of hard work. Successful, impactful models will need to integrate that data fundamentally, not just at a surface level, to get anywhere. So, how do you build a model that prioritizes this kind of truth and data-driven approach?

Lamini is founded on the belief that the future of LLMs lies in unlocking the vast amount of data in enterprises.

From the developer-facing interfaces to the fundamental hardware level where bits interact with atoms on GPUs, Lamini has systematically optimized every piece of the LLM stack for enterprises to take advantage of their proprietary data and build Expert AI from it. This includes LLM orchestration, optimization of LLM inference and training engines, and a deep hardware-software co-design approach involving close collaboration with hardware partners. This ensures the full stack, from cloud to silicon, is meticulously architected to maximize the performance and accuracy requirements of Enterprise Expert AI deployments. Finally, this can all be done flexibly across infrastructure environments with varying levels of security: across different cloud platforms, on-premise in airgapped environments, and different compute vendors in a way that derisks reliance on a single one.

Optimizing Accuracy and Reducing Hallucinations

Memory tuning and safe inference: “Memory tuning” is training an LLM on specific domain data to the extent that it can recall the exact match. It can significantly reduce hallucinations and create a path to deploying an LLM application that “easy” methods like RAG cannot. For example, one of our customers achieved 88% accuracy on product ID match after Memory Tuning compared to only 1% accuracy using RAG over the original model. Read more about Memory Tuning.

Faster Time to Market and Easy to Scale

Lamini helps enterprises with sizable engineering teams to accelerate time to market. Engineering teams from 1 to 10,000 can scale elastically on the most efficient amount of compute to serve and tune LLMs, maximizing ROI. Our largest Fortune 500 deployment so far is being scaled from hundreds of GPUs to over 1,000 GPUs with 10,000 developers in an air-gapped on-premise environment.

Secure and Flexible Deployments

Flexible deployment targets mean your developers can build in a way that automatically respects security requirements. Lamini runs LLMs indiscernibly to our customers on cloud or on-premise, with or without internet access, on NVIDIA or AMD GPUs, seamlessly moves between them, and de-risks your stack from an ever-shifting compute landscape.

Higher ROI Out of the Gate 

Counterintuitively, building expert AI is more capital-efficient. Or rather, starting with a small investment, even for things like Memory Tuning, and then ramping it up as ROI is proven out.

Join Lamini!

We’re hiring! Come build the next generation of LLMs, from the full-stack interface to the algorithms driving higher accuracy on all LLMs to high-performance computing:

– Sharon Zhou, Co-founder and CEO, Lamini