
Chatbot: Rules, NLP, or LLM — Which to Choose
When someone says "let's put a chatbot on our site," the question that should come immediately after is: "what kind of chatbot?" The answer radically changes the cost, maintenance complexity, and quality of the user experience. There are chatbots that cost $3 and chatbots that cost $3,000 a month -- and both can be the right choice depending on the context.
There are three main architectures today: rule-based, NLP-based (natural language processing), and LLM-based (large language models like GPT-4 and Claude). Each solves different problems, has different costs, and requires different levels of technical maturity to operate. In this article, we'll demystify all three.
Rule-Based Chatbots: Simple and Predictable
The rule-based chatbot is the oldest and still the most widely used. It works with a fixed set of flows: if the user types X, respond Y. If they type Z, respond W. The logic is a decision tree -- menus, buttons, predefined responses.
Tools like Typebot, Landbot, and ManyChat are the visual version of this approach. You drag blocks, connect conditions, and publish. In a matter of hours you have a functional bot that collects leads, qualifies customers, or answers the ten most frequently asked questions.
The advantages are clear: very low cost (many tools have a free plan), 100% predictable behavior, easy to audit and fix, zero risk of "hallucination," and simple compliance with CCPA/GDPR because the bot never generates its own content.
The limitations are also clear: the bot doesn't understand language variations. If the flow expects "schedule appointment" and the user types "I want to book a time slot," the bot doesn't know what to do. Any deviation from the planned script results in a generic fallback response.
When to use: static FAQ, lead qualification with predefined questions, scheduling with fixed options, structured data collection (name, email, phone). Any flow where the possible answers are known and finite.
NLP with Rasa and Dialogflow: Intents and Entities
The jump to NLP solves the language variation problem. Instead of mapping exact phrases, you train the model to recognize intents (what the user wants) and entities (the relevant data in the message).
"I want to schedule for Tuesday at 2pm" --> intent: schedule_appointment, entity: date=Tuesday, entity: time=2:00 PM
"Can you fit me in next week in the morning?" --> same intent: schedule_appointment, entity: date=next week, entity: time=morning
Dialogflow (Google) is the easiest to get started with. Visual interface for creating intents, native English support, direct integration with Google Assistant, Telegram, Slack, and WhatsApp. The CX (Customer Experience) version has more complex conversation flows and is the recommended choice for serious projects. Pricing is based on request volume -- accessible for medium volumes.
Rasa is the open-source alternative. You host it on your own server, train your own models, and have full control of data. It's more complex to set up (requires Python knowledge and ML concepts), but the operational cost is much lower at high volume and data privacy is guaranteed -- nothing leaves your servers.
The weak point of classic NLP is that it still fails on open conversations and unexpected contexts. It recognizes trained intents, but if the user asks a question outside the intent set, the bot can't respond. Maintenance is constant: you need to review logs, identify recognition failures, and add training examples regularly.
LLMs: High Quality, High Cost, and Low Control
Large language models changed what's possible in a chatbot. With GPT-4, Claude, or Gemini, you have a bot that:
- Understands questions in any form
- Maintains context across long conversations
- Responds with fluent natural language
- Can reason over documents, FAQs, and knowledge bases
The most common architecture for LLM-powered support chatbots is RAG (Retrieval-Augmented Generation): you index your documentation, FAQs, and policies in a vector database (Pinecone, Weaviate, pgvector). When the user asks something, the system retrieves the most relevant chunks and passes them to the LLM alongside the question. The model responds based on that context.
from openai import OpenAI
from typing import Optional
client = OpenAI()
SYSTEM_PROMPT = """You are the virtual assistant for Company X.
Answer only based on the information provided in the context.
If you don't know the answer, say you'll connect with a human agent.
Always be friendly and concise."""
def respond_to_user(
question: str,
retrieved_context: str,
history: list[dict]
) -> str:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
*history,
{
"role": "user",
"content": f"Relevant context:\n{retrieved_context}\n\nQuestion: {question}"
}
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.3, # low for more consistent responses
max_tokens=500
)
return response.choices[0].message.content
The problems with LLMs are real and need active management. Hallucinations happen -- the model can fabricate information with confidence. The cost per message is orders of magnitude higher than classic NLP. Latency is higher. And behavior is not 100% deterministic, making automated testing harder.
Cost at high volume is the biggest blocker: 1 million messages per month with GPT-4o can cost several thousand dollars. For lower volumes and high-value use cases (complex technical support, consultative sales), the ROI justifies it. For basic e-commerce FAQ, it doesn't.
Decision Matrix by Use Case
| Use Case | Rules | NLP | LLM |
|---|---|---|---|
| Static FAQ (10-30 questions) | Ideal | Overkill | Overkill |
| Lead qualification | Ideal | Good | Unnecessary |
| Scheduling with fixed options | Ideal | Good | Unnecessary |
| Technical support with variations | Poor | Good | Excellent |
| Free-form customer service | Poor | Fair | Excellent |
| Complex consultative sales | Poor | Poor | Excellent |
| Answering from internal docs | Doesn't work | Fair | Excellent (RAG) |
| High volume (1M+ msgs/month) | Ideal | Good | Expensive |
| Sensitive data (healthcare, legal) | Safe | Safe (Rasa) | Risk |
The core logic: use rules for predictable flows, NLP for natural language variations within known domains, and LLMs for open conversations where response quality is critical and the cost is justifiable.
A hybrid approach often makes more sense: the first tier uses rules for the most common flows (which represent 80% of volume), and the fallback for everything outside the flow uses LLM. This controls cost and maintains quality where it matters.
Conclusion
There is no ideal chatbot -- there is a chatbot appropriate to the context. The decision between rules, NLP, and LLM is an engineering decision that should consider message volume, question complexity, data sensitivity, budget, and error tolerance.
The most expensive mistake is choosing a more complex architecture than necessary -- deploying an LLM to answer "what are your business hours" is waste. The second most expensive mistake is choosing an architecture that's too simple for a complex problem -- putting a rule-based chatbot on SaaS technical support will frustrate customers and solve nothing.
At SystemForge, we do this analysis before writing a single line of code. We map use cases, estimate volume, assess data sensitivity, and recommend the architecture that delivers the most results at the lowest operational cost. Talk to us about your project.
Need Bots and Automation?
We build custom bots and automation workflows for your business.
Learn more →Need help?
