What makes LLM expensive?
Enterprises must recognize the full extent of the costs associated with integrating generative AI into their operations, particularly when focusing on large language models (LLMs). While subscribing to a consumer-level chatbot like ChatGPT for personal use may seem inexpensive, the dynamics shift significantly when considering enterprise-scale implementation.
Consider the scene at a bustling coffee shop, where Amy, a college student on the verge of an important presentation, finds herself grappling with writer’s block. With deadlines looming and inspiration in short supply, she surreptitiously taps away on her phone, employing a chatbot to generate a clever opening line for her speech. The ensuing applause from her classmates serves as a testament to the effectiveness of consumer-grade chatbots in a pinch, all at a price that won’t break the bank. However, for enterprises dealing with sensitive data, the calculus changes drastically.
For enterprises, the imperative is not just about subscribing to a chatbot; it’s about evaluating what goes into production when handling confidential data. Therefore, it’s crucial for businesses to partner with platforms or vendors tailored to enterprise needs.
These are the seven key cost factors that have been found to make Large Language Models expensive and shape the scalability of generative AI across enterprises.
1. Firstly, “use case” sets the stage. Every enterprise has unique requirements, akin to selecting a vehicle based on specific needs like terrain, capacity, and features. A blanket cost estimate for generative AI won’t suffice; instead, enterprises should engage in pilots to identify pain points, assess efficacy, and customize solutions.
2. Secondly, “model size” matters. The complexity and size of LLMs significantly impact pricing. Vendors offer different pricing tiers based on model parameters, necessitating careful consideration of which model suits the enterprise’s needs best.
3. “Pre-training”, the third factor, involves building and training a foundation model from scratch. While offering control over the training data, it incurs substantial costs, exemplified by the multi-million dollar investment required for models like GPT-3. This cost barrier has limited the number of players in the market capable of undertaking such endeavours.
4. “Inferencing”, the fourth factor, encompasses the process of generating responses using LLMs. Cost is tied to the number of tokens processed, underscoring the importance of efficient prompt engineering to elicit desired responses without extensive model alteration.
5. “Tuning”, the fifth factor, involves adjusting model parameters to improve performance or reduce costs. Fine-tuning, an extensive adaptation of the model, demands significant labelled data and compute resources. In contrast, parameter-efficient fine-tuning achieves task-specific performance without major model alterations, offering a cost-effective alternative.
6. “Hosting”, the sixth factor, comes into play when deploying customized or fine-tuned models. Enterprises must decide between utilizing inference APIs for pre-deployment models or hosting their models for more extensive customization. The latter incurs additional costs for model hosting and maintenance.
7. Finally, “deployment” options — SaaS or on-premises — impact cost and flexibility. While SaaS offers predictable subscription fees and shared GPU resources, on-premises deployments provide full control over data and architecture but require purchasing and maintaining GPUs.
In conclusion, enterprises must navigate a complex landscape of cost factors when integrating generative AI. Partnering with vendors offering customizable solutions and flexibility in deployment can optimize costs while meeting specific business needs. By carefully evaluating these factors, enterprises can harness the transformative potential of generative AI effectively.
Reference: IBM Research, AI Academy series “What Makes Large Language Models Expensive?”