How to Build an AI Chatbot: The Complete Step-by-Step Guide (2025)
- Leanware Editorial Team
- 11 hours ago
- 18 min read
Building an AI chatbot in 2025 is not the same as it was a few years ago. Earlier systems were rule-based and needed extensive training data and setup.
With large language models, you can now create working chatbots much faster, using prompt-driven workflows instead of hand-crafted rules.
TL;DR: Building AI Chatbots in 2025
This guide shows you how to build a production-ready Slack AI chatbot using Google Vertex AI that answers questions from your company's knowledge base.
Two-Part Architecture: A Python-based intranet scraper that extracts content from Google Sites and uploads it to Google Cloud Storage, plus a serverless Flask backend on Google Cloud Run that processes Slack messages.
Key Design Choice: Instead of traditional RAG systems with vector databases, we use large context windows in Vertex AI to feed entire knowledge bases directly to the model, reducing system complexity.
Production Features: Message deduplication with PostgreSQL locks, sliding-window conversation memory, automated CI/CD deployment, and caching for cost and performance optimization.
Included Code: Selenium-based scraper, Flask Slack bot, and deployment automation.
Why is 2025 the Right Time to Build AI Chatbots?
Building chatbots is faster and simpler today with LLMs. Instead of training models or mapping rigid flows, you provide context through prompts, and the model handles natural dialogue. A bot can read documents like policies or manuals and answer questions directly.
Three factors make this practical now:
LLM APIs from OpenAI, Google, and Anthropic are widely available.
Processing costs have dropped, making production use affordable.
No-code platforms and custom frameworks let teams quickly prototype or build production-grade systems with memory, concurrency, and integrations.
Modern LLMs support large context windows, RAG for massive knowledge bases, multiple languages, and natural dialogue.
Getting started:
1. Focus on a single use case: customer support, sales qualification, employee knowledge, or education.
2. Roll out in phases: define scope, set up baseline Q&A, add memory and accuracy checks, then integrate systems.
3. Choose the development approach based on complexity:
No-code for fast prototypes and simple bots (In our case, this approach wasn’t sufficient because we needed a custom scraper and architecture that could handle a larger, growing knowledge base.)
Custom development for advanced workflows, multi-system integration, scalability, and compliance.
Custom Development: Building Production-Grade AI Chatbots
Let’s build an enterprise-grade Slack chatbot using Python, Flask, and Google Vertex AI.
Modern AI Chatbot Architecture
The architecture follows a "simplicity as a feature" principle. Instead of complex RAG systems with vector databases and retrieval logic, we leverage large context windows to feed entire knowledge bases directly to the AI.
Our two-part architecture includes:
Automated Knowledge Scraper: A standalone Python application that uses Selenium to navigate corporate intranets, extract content from pages and embedded Google Docs, convert HTML to clean Markdown, and upload structured data to Google Cloud Storage.
Slack Event Service: A Flask application deployed on Google Cloud Run that handles Slack events, manages conversation context, integrates with Vertex AI for response generation, and implements production features like concurrency control and error handling.
Flow Diagram and Infrastructure:


Technology Stack Selection: Python + Flask + Vertex AI
The system is built on Python 3.12+, which offers stable support for AI workflows and integrations.
Flask: lightweight framework for handling HTTP requests.
Slack Bolt SDK: official library for building Slack bots and event handling.
Google Vertex AI (google-cloud-aiplatform): model hosting, scaling, and monitoring.
Google Cloud Storage: stores processed knowledge base files.
Google Cloud SQL (PostgreSQL): stores conversation history and metadata.
Selenium, BeautifulSoup4, html2text: scraping and cleaning intranet content.
PyInstaller: packages the scraper into an executable for non-technical use.
Docker + Google Cloud Run: containerization and deployment.
Setup requires Python 3.12+, a Google Cloud project with Vertex AI, Cloud Storage, and Cloud SQL enabled, and a Slack workspace with bot permissions. Environment variables are managed with .env files.
Part 1: The Intranet Scraper
An AI chatbot is only as smart as the data it has access to. The first step is building a reliable system that extracts knowledge from the company’s intranet, hosted on Google Sites, and makes it available to the AI.
Scraping a private Google Site involves challenges like authentication, dynamic content, and access controls. To handle this, we developed a standalone Python script (intranet_scraper.py) that automates browser navigation, extracts content, converts it to clean Markdown, and uploads it to Google Cloud Storage.
Step 1: Scraping the Intranet with Selenium
Our goal is to automate a web browser to log in, navigate pages, and save content. The intranet_scraper/main.py script serves as the entry point for this process, but the core logic resides in utils/intranet_scraper.py.
The scraper needs to:
Authenticate: Handle logging into a Google-powered intranet. It uses environment variables (GOOGLE_EMAIL, GOOGLE_PASSWORD) to manage credentials securely.
Navigate & Discover: It starts from a base URL and recursively finds all linked pages within the same intranet site.
Extract & Format: For each page, it uses Selenium to pull the main textual content and html2text to convert it into clean Markdown. Markdown is perfect for AI because it preserves structure (headings, lists, tables).
Upload: It stores each scraped page as a separate .md file in a designated Google Cloud Storage bucket.
Handle Embedded Content: A key feature is its ability to find embedded Google Docs within the intranet pages, directly download their content as plain text, and append it to the corresponding markdown file.
Authentication and Driver Setup (utils/intranet_scraper.py):
Before scraping, the tool authenticates using Selenium. It navigates to the Google sign-in page, enters the credentials stored in the environment variables, and maintains the authenticated session for subsequent requests.
import os
import threading
from flask import Blueprint, request
from slack_bolt import App
from controllers.admin_agent_controller import AdminAgentController
import utils.slack as slack_service
slack_blueprint = Blueprint("slack", __name__)
# Initialize Slack App
slack_bolt_app = App(
token=os.getenv("SLACK_BOT_TOKEN"),
signing_secret=os.getenv("SLACK_SIGNING_SECRET")
)
slack_bot_user_id = slack_bolt_app.client.auth_test().data["user_id"]
admin_agent_controller = AdminAgentController()
class OptimizedMultiPageGoogleScraper:
# ... (initialization)
def authenticate_once(self):
"""Authenticate once and return a driver instance for reuse"""
driver = self._setup_driver()
try:
print("🔐 Performing one-time authentication...")
# Go to Google Sign-in
driver.get("https://accounts.google.com/signin")
# Complete email
email_field = WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.ID, "identifierId"))
)
email_field.send_keys(self.email)
# ... (clicks next)
# Wait and complete password
password_field = WebDriverWait(driver, 15).until(
EC.element_to_be_clickable((By.NAME, "Passwd"))
)
password_field.send_keys(self.password)
# ... (clicks next)
print("✅ Authentication successful - driver ready for reuse")
return driver
except Exception as e:
print(f"❌ Authentication failed: {e}")
driver.quit()
return None
Extracting Content and Handling Embedded Google Docs (utils/intranet_scraper.py):
Once authenticated, the scraper navigates to a URL and begins the extraction process. A key challenge is that much of the content, especially from Google Docs, is embedded within <iframe> elements or linked directly. The scraper systematically finds and processes these.
The extractiframe_contents function serves as the orchestrator for this. It first looks for direct links to Google Docs and then processes all other iframes it can find on the page.
def _extract_iframe_contents(self, driver):
"""Extract content from all iframes in the page"""
embedded_contents = []
structured_docs = []
try:
# First, specifically look for Google Docs URLs on the page
doc_contents, doc_structured = self._process_google_docs_urls(driver)
embedded_contents.extend(doc_contents)
structured_docs.extend(doc_structured)
# Then, process all other generic iframes
iframe_contents = self._process_iframes(driver)
embedded_contents.extend(iframe_contents)
except Exception as e:
print(f"❌ Error extracting iframe contents: {e}")
return embedded_contents, structured_docs
To get the content from a Google Doc, the scraper doesn't try to parse the complex editor HTML. Instead, it constructs a special export URL to download the document directly as a plain text file. This is a much more reliable method.
def _get_google_doc_content_directly(self, driver, doc_url):
"""Get Google Doc content by downloading it as a text file"""
try:
# Extract the unique document ID from its URL
doc_id = self._extract_doc_id(doc_url)
if not doc_id: return None
# Setup a temporary directory for the download and trigger it
downloads_dir, existing_files = self._setup_doc_download(driver, doc_id)
if not downloads_dir: return None
# Wait for the .txt file to appear in the directory
downloaded_file = self._wait_for_download(downloads_dir, existing_files, doc_id)
if not downloaded_file: return None
# Read the content from the downloaded file and then delete it
return self._read_and_cleanup_file(downloaded_file)
except Exception as e:
print(f"❌ Error downloading Google Doc content: {e}")
return None
def _setup_doc_download(self, driver, doc_id):
"""Setup download directory and trigger document download"""
downloads_dir = getattr(self, "downloads_dir", "/tmp")
existing_files = set(os.listdir(downloads_dir))
# This URL format forces a direct download of the Google Doc as plain text
export_url = f"https://docs.google.com/document/d/{doc_id}/export?format=txt"
print(f"📄 Downloading Google Doc content from: {export_url}")
driver.get(export_url) # This action triggers the browser download
return downloads_dir, existing_files
After all the content (from the main page and any embedded documents) has been extracted, it is converted to Markdown and uploaded to Google Cloud Storage.
Cleaning HTML and Converting to Markdown (utils/html_to_markdown_converter.py):
Before converting the scraped HTML to Markdown, it's crucial to clean it. Raw HTML from a website contains a lot of noise that is irrelevant to the AI, such as scripts, styles, navigation bars, and footers. The clean_html_content function uses the BeautifulSoup library to parse the HTML and systematically remove these unwanted elements.
First, a list of undesirable tags is defined. This includes everything from scripts and styles to common structural elements like <nav> and <footer>.
def _get_unwanted_tags(remove_images: bool) -> list[str]:
"""Get list of unwanted HTML tags to remove."""
unwanted_tags = [
"script", "style", "noscript", "object", "embed", "applet",
"form", "input", "button", "select", "textarea", "canvas",
"svg", "audio", "video", "map", "area", "base", "link",
"nav", "aside", "footer", "header",
]
if remove_images:
unwanted_tags.extend(["img", "picture", "figure", "figcaption"])
return unwanted_tags
The main conversion function, html_to_markdown, orchestrates the process. It first calls clean_html_content to strip out the noise and then uses the html-to-markdown library to perform the final conversion, ensuring the output is clean, readable, and optimized for the AI model.
def html_to_markdown(
html_content: str, clean_html: bool = True, remove_images: bool = True, **options
) -> str:
"""
Convert HTML content to Markdown format.
"""
try:
# Clean HTML content first if requested
if clean_html:
html_content = clean_html_content(html_content, remove_images=remove_images)
# ... (default options for conversion)
markdown = convert_to_markdown(html_content, **conversion_options)
# ... (post-processing to clean up extra whitespace)
return "\n".join(cleaned_lines).strip()
except Exception as e:
raise RuntimeError(f"Error converting HTML to Markdown: {e}")
This cleaning step is vital for improving the signal-to-noise ratio of the data we feed into the AI, resulting in more accurate and relevant answers.
Part 2: The Slack Event Service
With our knowledge base continuously updated in Google Cloud Storage, we can now build the core of our AI agent: a serverless application that handles Slack messages, queries the AI, and delivers intelligent answers.
This service is built with Flask and will be deployed on Google Cloud Run, making it scalable and cost-effective. Its main responsibility is to listen for direct messages sent to our Slack bot.
Step 1: Handling Incoming Slack Messages
The entry point for our service is views/slack_view.py. It uses the slack-bolt library to handle events. When a user sends a direct message to our bot, the handle_slack_event function receives the payload and starts a background thread to process it via process_slack_message_async. Running this in a separate thread is crucial to immediately acknowledge Slack's request and avoid timeouts.
Event Routing (views/slack_view.py):
This function acts as the main router for incoming Slack events. It verifies the request's signature, handles Slack's challenge request, and delegates direct messages to the asynchronous processor.
import os
import threading
from flask import Blueprint, request
from slack_bolt import App
from controllers.admin_agent_controller import AdminAgentController
import utils.slack as slack_service
slack_blueprint = Blueprint("slack", __name__)
# Initialize Slack App
slack_bolt_app = App(
token=os.getenv("SLACK_BOT_TOKEN"),
signing_secret=os.getenv("SLACK_SIGNING_SECRET")
)
slack_bot_user_id = slack_bolt_app.client.auth_test().data["user_id"]
admin_agent_controller = AdminAgentController()
@slack_blueprint.route("/event", methods=["POST"])
def handle_slack_event():
"""
Handle Slack events, particularly app_mention events with file attachments.
"""
# Verify request is from Slack
if not slack_service.verifier.is_valid_request(request.get_data(), request.headers):
return None, 403
data = request.json
if "challenge" in data:
return jsonify({"challenge": data["challenge"]})
# Check if the message is a direct message to our bot
if slack_service.is_slack_application_direct_message(data, slack_bot_user_id):
# Process in a background thread to avoid timeouts
thread = threading.Thread(target=process_slack_message_async, args=(data,))
thread.daemon = True
thread.start()
return jsonify({"message": "OK"}), 200
# Fallback to the default Slack Bolt handler for other events
return SlackRequestHandler(slack_bolt_app).handle(request)
The Asynchronous Processor (views/slack_view.py):
The process_slack_message_async function is the heart of our chatbot's real-time logic.
import os
import threading
from flask import Blueprint, request
from slack_bolt import App
from controllers.admin_agent_controller import AdminAgentController
import utils.slack as slack_service
slack_blueprint = Blueprint("slack", __name__)
# Initialize Slack App
slack_bolt_app = App(
token=os.getenv("SLACK_BOT_TOKEN"),
signing_secret=os.getenv("SLACK_SIGNING_SECRET")
)
slack_bot_user_id = slack_bolt_app.client.auth_test().data["user_id"]
admin_agent_controller = AdminAgentController()
@slack_blueprint.route("/slack/events", methods=["POST"])
def handle_slack_events():
"""Handle all incoming Slack events."""
# The SlackRequestHandler translates the HTTP request into a Bolt-compatible format
return SlackRequestHandler(slack_bolt_app).handle(request)
def process_slack_message_async(data: dict):
# Import here to avoid circular imports
from app import app
from models.postgres.conversation_history import ConversationHistory
from models.postgres.slack_message_lock import SlackMessageLock
# Create application context for background thread
with app.app_context():
(
message_text,
user_id,
channel_id,
message_ts,
) = slack_service.extract_slack_message_info(data)
# 1. Deduplication and Concurrency Lock
if SlackMessageLock.is_message_processed(user_id, message_ts):
print(f"⏭️ Skipping duplicate message from user {user_id} (ts: {message_ts})")
return # Skip duplicate message
if SlackMessageLock.is_user_locked(user_id):
print(f"🔒 User {user_id} is locked, skipping message (ts: {message_ts})")
return # Skip if user has a message in flight
try:
SlackMessageLock.create_lock(user_id, message_ts, channel_id, message_text)
print(f"🔓 Created lock for user {user_id} (ts: {message_ts})")
except Exception as e:
print(f"❌ Failed to create lock for user {user_id} (ts: {message_ts}): {e}")
return # Handle failure to create lock
try:
# 2. Retrieve Conversation History
conversation_context = ConversationHistory.format_context_for_ai(
user_id, channel_id, limit=10
)
# 3. Detect Language
detected_language = (
admin_agent_controller.admin_agent_service.detect_language(message_text)
)
# 4. Process Message with AI Core
result = admin_agent_controller.process_message_with_context(
message=message_text,
user_id=user_id,
channel_id=channel_id,
language=detected_language,
conversation_context=conversation_context,
)
# 5. Send Response and Cleanup
if result["success"]:
response_data = result["data"]["response"]
message = response_data["message"]
slack_service.send_message(channel=channel_id, text=message)
_handle_successful_processing(
user_id, message_ts, channel_id, message_text, message, detected_language
)
else:
_handle_processing_error(
user_id, message_ts, channel_id, detected_language, result["message"]
)
except Exception as e:
# Handle any unexpected errors
_safe_fail_processing(user_id, message_ts, str(e))
This function follows a clear, robust sequence:
Deduplication & Locking: It uses a PostgreSQL table (SlackMessageLock) to ensure the same message isn't processed twice and that a user can't send multiple queries simultaneously, preventing race conditions.
Retrieve History: It fetches recent conversation history from the database to provide context for follow-up questions.
Detect Language: It determines the user's language to provide a response in their native tongue.
Process with AI: It passes the question, history, and language to the AdminAgentController, which orchestrates the call to the Vertex AI GenerativeModel.
Respond & Cleanup: It sends the AI-generated message back to the user on Slack and updates the lock and conversation history tables.
Step 2: Integrating with Vertex AI
The actual AI query happens within the controller and service layers. The service layer is responsible for creating the prompt that will be sent to the GenerativeModel. This involves combining the user's question, the conversation history, and the entire knowledge base loaded from Google Cloud Storage.
This approach, made possible by models with large context windows, is powerful because it allows the AI to see all available information at once, enabling it to cross-reference documents and provide comprehensive answers without the complexity of a traditional RAG (Retrieval-Augmented Generation) system.
Production Features: Concurrency and Conversation Context
To move from a simple proof-of-concept to a reliable production service, we need to handle two critical challenges: preventing duplicate responses and maintaining conversational memory. Our application solves these using two specialized PostgreSQL models: SlackMessageLock and ConversationHistory.
Preventing Duplicate Responses with Message Locking
The Problem: Network requests can be unreliable. Slack may sometimes send the same message event more than once if it doesn't receive a timely acknowledgment. Additionally, a user might send multiple questions in quick succession. Without a locking mechanism, our bot could process the same message multiple times, leading to duplicate, confusing responses.
The Solution: The SlackMessageLock model acts as a gatekeeper. Before any processing begins, the application checks if the message has already been seen or if the user currently has another message being processed.
Implementation in views/slack_view.py: At the very beginning of the process_slack_message_async function, we perform these checks:
# 1. Deduplication and Concurrency Lock
# Check if message already processed (deduplication)
if SlackMessageLock.is_message_processed(user_id, message_ts):
print(f"⏭️ Skipping duplicate message from user {user_id} (ts: {message_ts})")
return
# Check if user is currently locked (has other processing messages)
if SlackMessageLock.is_user_locked(user_id):
print(f"🔒 User {user_id} is locked, skipping message (ts: {message_ts})")
return
# Create lock for this message
try:
SlackMessageLock.create_lock(user_id, message_ts, channel_id, message_text)
print(f"🔓 Created lock for user {user_id} (ts: {message_ts})")
except Exception as e:
# ... handle failure to create lock ...
return
The SlackMessageLock Model (models/postgres/slack_message_lock.py):
This model uses a unique constraint on user_id and message_ts to prevent duplicates. The is_user_locked method provides a simple way to check for any messages from a specific user that are still in the "processing" state.
class SlackMessageLock(BaseModel):
__tablename__ = "slack_message_locks"
user_id = Column(String(50), nullable=False, index=True)
message_ts = Column(String(30), nullable=False)
status = Column(String(20), nullable=False, default="processing") # processing, completed, failed
# ... other columns
__table_args__ = (
# Unique constraint to prevent duplicate processing of same message
Index("ix_user_message_unique", "user_id", "message_ts", unique=True),
# ... other indexes
)
@classmethod
def is_user_locked(cls, user_id: str) -> bool:
"""Check if a user has any active (processing) message locks."""
return (
cls.query.filter_by(user_id=user_id, status="processing").first()
is not None
)
Once processing is finished (either successfully or with an error), the lock's status is updated to completed or failed, releasing the lock for that user.
Remembering Conversations with a Sliding Window
The Problem: To handle follow-up questions ("What about international travel?"), the bot needs to remember the recent turns of the conversation.
The Solution: The ConversationHistory model stores each user message and bot response. To keep the context relevant and the database lean, it maintains a "sliding window" of only the last 10 interactions for any given user in a channel.
Implementation in views/slack_view.py: Before calling the AI, the service retrieves the formatted history for the current conversation.
# 2. Retrieve Conversation History
conversation_context = ConversationHistory.format_context_for_ai(
user_id, channel_id, limit=10
)
# 4. Process Message with AI Core
result = admin_agent_controller.process_message_with_context(
message=message_text,
# ...
conversation_context=conversation_context,
)
The ConversationHistory Model (models/postgres/conversation_history.py):
This model provides a method to fetch recent interactions and format them into a simple text block that can be prepended to the AI's main prompt.
class ConversationHistory(BaseModel):
# ... (model definition)
@classmethod
def format_context_for_ai(
cls, user_id: str, channel_id: str, limit: int = 10
) -> str:
"""Get conversation context formatted for AI prompt."""
conversation = cls.get_conversation_context(user_id, channel_id, limit)
if not conversation:
return ""
context_lines = ["RECENT CONVERSATION HISTORY:"]
for interaction in conversation:
context_lines.append(f"User: {interaction['user_message']}")
context_lines.append(f"Assistant: {interaction['bot_response']}")
return "\n".join(context_lines)
After an interaction is complete, a new entry is saved. The model logic then automatically cleans up any records that are older than the last 10 interactions, ensuring the context window "slides" forward.
The Power of a Large Context Window
This is where the magic happens. Instead of complex retrieval logic, we will give the Vertex AI Generative Model all our documents at once - along with the conversation history and let it find the answer.
Integrating with Vertex AI (services/agent_service.py):
Our AdminAgentService will orchestrate the process. It gets the documents from GCS, constructs a detailed prompt, and calls the Vertex AI API.
from vertexai.generative_models import GenerativeModel
class AdminAgentService:
def __init__(self):
self.markdown_processor = MarkdownProcessor(...) # Your GCS loader
self.model = GenerativeModel("gemini-1.5-pro-preview-0409")
def process_query(self, question: str, language: str = "en", history: list = None):
# 1. Load the entire knowledge base
documents = self.markdown_processor.get_all_markdown_documents()
# 2. Create the prompt
prompt = self.create_search_prompt(documents, question, language, history)
# 3. Generate the response
response = self.model.generate_content(prompt)
# 4. Format and return
return self.format_response(response.text)
def create_search_prompt(self, documents: list, question: str, language: str, history: list = None) -> str:
# Combine all document content into a single string
all_content = "\n".join([f"=== DOCUMENT: {doc['filename']} ===\n{doc['content']}" for doc in documents])
history_str = ""
if history:
history_str = "CONVERSATION HISTORY:\n" + "\n".join([f"Human: {msg['question']}\nAI: {msg['response']}" for msg in history])
# The prompt engineering is key. We instruct the model on its persona,
# provide the complete knowledge, and give it clear formatting rules.
instructions = f"""
You are a specialized assistant for Leanware with access to the company's complete knowledge base. Your job is to answer questions about company policies and procedures.
{history_str}
COMPLETE KNOWLEDGE BASE CONTENT:
{all_content}
USER QUESTION: {question}
INSTRUCTIONS:
1. Analyze all available information to answer the user's question.
2. Provide a complete and accurate response based ONLY on the content provided.
3. Specifically cite which documents contain the relevant information.
4. If the information is not available, state that clearly.
5. Respond in {language}.
"""
return instructions
Advantages with massive context:
Perfect Recall: The AI sees everything. It can cross-reference information between multiple policies automatically.
Contextual Understanding: By providing the recent conversation history, the AI can correctly interpret ambiguous follow-up questions.
No "I don't have that information" (if it exists): The model can't miss a document if it's in the prompt.
Simplicity: The code is straightforward. The complexity is handled by the model, not our application.
Smart Caching for Performance and Cost
Every call to the Vertex AI API costs money, and loading hundreds of documents from GCS takes time. Caching is essential.
Level 1: Caching the Knowledge Base
Our MarkdownProcessor (the GCS loader) should implement in-memory caching. The first user query after the server starts might take a few seconds as it loads the knowledge base from GCS. Every subsequent query is sub-second because the documents are already in memory. A simple time-based cache (e.g., refresh every hour) is highly effective.
Level 2: Caching AI Responses (Optional)
For frequently asked questions, you can implement a semantic cache. Instead of caching based on the exact question string, you can use an embedding model to see if a new question is semantically similar to a previously answered one. If it is, you can return the cached answer, saving an expensive LLM call.
Production-Ready Deployment
We'll use Docker to package our Flask application into a portable container that can run anywhere, making it easy to develop locally and deploy to the cloud.
The Dockerfile for Production:
Our Dockerfile defines the environment for our application. It sets up the Python version, installs dependencies into a virtual environment, and configures the gunicorn web server to handle production traffic.
FROM python:3.12.5
COPY ./requirements.txt /requirements.txt
COPY . /app
WORKDIR /app
RUN python -m venv /py && \
/py/bin/pip install --upgrade pip && \
/py/bin/pip install -r /requirements.txt
COPY ./google-application-credentials.json /app/google-application-credentials.json
ENV PATH="/py/bin:$PATH"
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
EXPOSE 5000
ENV FLASK_APP=app
CMD exec gunicorn --bind 0.0.0.0:5000 --workers 1 --threads 8 --timeout 0 'app:app'
This Dockerfile creates a self-contained image with our application and all its dependencies, ready to be deployed. It uses gunicorn as a production-grade web server, listening on port 5000.
Automated Deployment with GitHub Actions
Instead of manually building and pushing images, a production-ready setup uses a CI/CD pipeline to automate this process. Here's how you can do it with GitHub Actions, based on the provided deploy_staging.yml workflow.
The workflow consists of two main jobs:
Build and Push: This job compiles the Docker image and pushes it to a container registry like Google Artifact Registry.
Deploy: This job takes the newly built image from the registry and deploys it to Google Cloud Run.
Example GitHub Actions Workflow (.github/workflows/deploy.yml):
name: Deploy to Cloud Run
on:
push:
branches: [ master ] # Or your main branch
jobs:
build-and-push:
name: Build and push to Google Artifact Registry
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Authenticate to Google Cloud
uses: 'google-github-actions/auth@v1'
with:
credentials_json: '${{ secrets.GCP_CREDENTIALS_JSON }}'
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: '${{ vars.GCP_REGION }}-docker.pkg.dev/${{ vars.GCP_PROJECT_ID }}/ai-chatbot/image:${{ github.sha }}'
deploy:
name: Deploy to Cloud Run
runs-on: ubuntu-latest
needs: build-and-push
steps:
- name: Authenticate to Google Cloud
uses: 'google-github-actions/auth@v1'
with:
credentials_json: '${{ secrets.GCP_CREDENTIALS_JSON }}'
- name: Deploy to Cloud Run
uses: 'google-github-actions/deploy-cloudrun@v1'
with:
service: 'ai-chatbot-service'
region: '${{ vars.GCP_REGION }}'
image: '${{ vars.GCP_REGION }}-docker.pkg.dev/${{ vars.GCP_PROJECT_ID }}/ai-chatbot/image:${{ github.sha }}'
This workflow automatically deploys a new version of your service every time you push code to the master branch. You'll need to set up the GCP_CREDENTIALS_JSON secret and the GCP_REGION and GCP_PROJECT_ID variables in your GitHub repository settings.
Wrap Up: The Future of AI Chatbot Development
Modern AI chatbots work best when knowledge management is separate from the query service. A dedicated scraper maintains structured content, and a serverless backend handles requests and scales with demand. Large context windows reduce the need for complex retrieval systems for most enterprise knowledge bases, simplifying the architecture.
The choice of development approach should match the requirements: no-code platforms are sufficient for basic Q&A workflows, while custom development is necessary for complex logic, multiple integrations, or advanced conversation handling. Iterative testing and monitoring are critical to ensure accuracy and stability.
Overall, the quality of a chatbot depends on a precise scope, structured knowledge, and aligning the technology with the intended use case. You can also contact us for guidance on building AI agents that fit your exact requirements.
Good luck!
Frequently Asked Questions
How much does it cost to build an AI chatbot?
Costs vary by approach:
No-code platforms: $50-500 per month, depending on features and usage limits.
Custom development: $5,000-50,000 upfront, plus $200-2,000 per month for hosting, monitoring, and updates.
AI model usage: $0.001-0.01 per interaction, depending on provider and model.
Cloud infrastructure: $50-500 per month, based on traffic and storage.
The real cost depends on the scope. For example, a simple support bot answering FAQs can be done on the low end, while an enterprise knowledge system with multiple integrations will be higher. ROI often comes from reduced support hours, faster response times, and improved customer satisfaction.
Can I build an AI chatbot without programming skills?
Yes, with no-code tools, you can set up basic bots that handle structured Q&A or connect to popular apps. This path works for small teams and straightforward use cases. The trade-off is limited customization, weaker security options, and less control over the model’s behavior.
If you need integrations with internal systems, custom workflows, or strict compliance, you’ll eventually need developer support. A hybrid approach starting with no-code, then extending with code, can reduce costs and shorten timelines.
How long does it take to build a production-ready AI chatbot?
Timelines depend on complexity:
Simple Q&A bot: 1-2 weeks
Customer support automation: 1-2 months
Enterprise knowledge management (similar to the Leanware case): 2-3 months
Multi-platform chatbot with complex integrations: 3-6 months
Delays usually come from preparing a clean knowledge base, designing integrations, and testing for reliability.
What’s the difference between RAG and large context window approaches?
RAG (Retrieval-Augmented Generation): Best for very large document sets (millions of pages). It requires a vector database and retrieval pipeline but scales well.
Large context window models: Work better for smaller datasets (thousands of pages). They’re simpler to implement and usually give more accurate answers for business use cases.
RAG adds complexity and higher engineering cost, while large context models cost more per query but reduce infrastructure overhead. Most mid-sized businesses don’t need RAG unless they manage very large document collections.
How do I ensure my AI chatbot gives accurate responses?
Accuracy depends on:
Quality of knowledge base: Keep content clean, consistent, and updated.
Prompt engineering: Use clear instructions and examples for the model.
Testing: Build a set of “golden questions” with expected answers and evaluate regularly.
Monitoring: Track response accuracy, user feedback, and usage trends.
Iteration: Update prompts, retrain models, or improve the knowledge base based on errors.
Without these processes, accuracy will degrade over time.
Can I integrate my chatbot with existing business systems?
Yes, through APIs and event-driven patterns. Examples:
CRM: Sync customer data or create new leads automatically.
Helpdesk: Log tickets, update status, or fetch case details.
Databases: Run read/write queries for internal data.
Authentication: Enforce SSO or role-based access.
These integrations usually require developer involvement because each system has its own API, data formats, and security rules.