top of page

How to Build an AI Chatbot: The Complete Step-by-Step Guide (2025)

  • Writer: Leanware Editorial Team
    Leanware Editorial Team
  • 11 hours ago
  • 18 min read

Building an AI chatbot in 2025 is not the same as it was a few years ago. Earlier systems were rule-based and needed extensive training data and setup. 


With large language models, you can now create working chatbots much faster, using prompt-driven workflows instead of hand-crafted rules.


TL;DR: Building AI Chatbots in 2025

This guide shows you how to build a production-ready Slack AI chatbot using Google Vertex AI that answers questions from your company's knowledge base.


  • Two-Part Architecture: A Python-based intranet scraper that extracts content from Google Sites and uploads it to Google Cloud Storage, plus a serverless Flask backend on Google Cloud Run that processes Slack messages.

  • Key Design Choice: Instead of traditional RAG systems with vector databases, we use large context windows in Vertex AI to feed entire knowledge bases directly to the model, reducing system complexity.

  • Production Features: Message deduplication with PostgreSQL locks, sliding-window conversation memory, automated CI/CD deployment, and caching for cost and performance optimization.

  • Included Code: Selenium-based scraper, Flask Slack bot, and deployment automation.


Why is 2025 the Right Time to Build AI Chatbots?

Building chatbots is faster and simpler today with LLMs. Instead of training models or mapping rigid flows, you provide context through prompts, and the model handles natural dialogue. A bot can read documents like policies or manuals and answer questions directly.


Three factors make this practical now:


Modern LLMs support large context windows, RAG for massive knowledge bases, multiple languages, and natural dialogue.


Getting started:


1. Focus on a single use case: customer support, sales qualification, employee knowledge, or education.

2. Roll out in phases: define scope, set up baseline Q&A, add memory and accuracy checks, then integrate systems.

3. Choose the development approach based on complexity:


  • No-code for fast prototypes and simple bots (In our case, this approach wasn’t sufficient because we needed a custom scraper and architecture that could handle a larger, growing knowledge base.)

  • Custom development for advanced workflows, multi-system integration, scalability, and compliance.


Custom Development: Building Production-Grade AI Chatbots


Let’s build an enterprise-grade Slack chatbot using Python, Flask, and Google Vertex AI.


Modern AI Chatbot Architecture

The architecture follows a "simplicity as a feature" principle. Instead of complex RAG systems with vector databases and retrieval logic, we leverage large context windows to feed entire knowledge bases directly to the AI.


Our two-part architecture includes:


Automated Knowledge Scraper: A standalone Python application that uses Selenium to navigate corporate intranets, extract content from pages and embedded Google Docs, convert HTML to clean Markdown, and upload structured data to Google Cloud Storage. 


Slack Event Service: A Flask application deployed on Google Cloud Run that handles Slack events, manages conversation context, integrates with Vertex AI for response generation, and implements production features like concurrency control and error handling.


Flow Diagram and Infrastructure:

AI Chatbot Architecture
Flow Diagram and Infrastructure

Technology Stack Selection: Python + Flask + Vertex AI


The system is built on Python 3.12+, which offers stable support for AI workflows and integrations.


  • Flask: lightweight framework for handling HTTP requests.

  • Slack Bolt SDK: official library for building Slack bots and event handling.

  • Google Vertex AI (google-cloud-aiplatform): model hosting, scaling, and monitoring.

  • Google Cloud Storage: stores processed knowledge base files.

  • Google Cloud SQL (PostgreSQL): stores conversation history and metadata.

  • Selenium, BeautifulSoup4, html2text: scraping and cleaning intranet content.

  • PyInstaller: packages the scraper into an executable for non-technical use.

  • Docker + Google Cloud Run: containerization and deployment.


Setup requires Python 3.12+, a Google Cloud project with Vertex AI, Cloud Storage, and Cloud SQL enabled, and a Slack workspace with bot permissions. Environment variables are managed with .env files.


Part 1: The Intranet Scraper


An AI chatbot is only as smart as the data it has access to. The first step is building a reliable system that extracts knowledge from the company’s intranet, hosted on Google Sites, and makes it available to the AI.


Scraping a private Google Site involves challenges like authentication, dynamic content, and access controls. To handle this, we developed a standalone Python script (intranet_scraper.py) that automates browser navigation, extracts content, converts it to clean Markdown, and uploads it to Google Cloud Storage.


Step 1: Scraping the Intranet with Selenium


Our goal is to automate a web browser to log in, navigate pages, and save content. The intranet_scraper/main.py script serves as the entry point for this process, but the core logic resides in utils/intranet_scraper.py.


The scraper needs to:


  1. Authenticate: Handle logging into a Google-powered intranet. It uses environment variables (GOOGLE_EMAIL, GOOGLE_PASSWORD) to manage credentials securely.

  2. Navigate & Discover: It starts from a base URL and recursively finds all linked pages within the same intranet site.

  3. Extract & Format: For each page, it uses Selenium to pull the main textual content and html2text to convert it into clean Markdown. Markdown is perfect for AI because it preserves structure (headings, lists, tables).

  4. Upload: It stores each scraped page as a separate .md file in a designated Google Cloud Storage bucket.

  5. Handle Embedded Content: A key feature is its ability to find embedded Google Docs within the intranet pages, directly download their content as plain text, and append it to the corresponding markdown file.


Authentication and Driver Setup (utils/intranet_scraper.py):

Before scraping, the tool authenticates using Selenium. It navigates to the Google sign-in page, enters the credentials stored in the environment variables, and maintains the authenticated session for subsequent requests.

import os
import threading
from flask import Blueprint, request
from slack_bolt import App
from controllers.admin_agent_controller import AdminAgentController
import utils.slack as slack_service

slack_blueprint = Blueprint("slack", __name__)

# Initialize Slack App
slack_bolt_app = App(
    token=os.getenv("SLACK_BOT_TOKEN"),
    signing_secret=os.getenv("SLACK_SIGNING_SECRET")
)
slack_bot_user_id = slack_bolt_app.client.auth_test().data["user_id"]
admin_agent_controller = AdminAgentController()

class OptimizedMultiPageGoogleScraper:
    # ... (initialization)

    def authenticate_once(self):
        """Authenticate once and return a driver instance for reuse"""
        driver = self._setup_driver()

        try:
            print("🔐 Performing one-time authentication...")

            # Go to Google Sign-in
            driver.get("https://accounts.google.com/signin")

            # Complete email
            email_field = WebDriverWait(driver, 15).until(
                EC.presence_of_element_located((By.ID, "identifierId"))
            )
            email_field.send_keys(self.email)
            # ... (clicks next)

            # Wait and complete password
            password_field = WebDriverWait(driver, 15).until(
                EC.element_to_be_clickable((By.NAME, "Passwd"))
            )
            password_field.send_keys(self.password)
            # ... (clicks next)

            print("✅ Authentication successful - driver ready for reuse")
            return driver

        except Exception as e:
            print(f"❌ Authentication failed: {e}")
            driver.quit()
            return None

Extracting Content and Handling Embedded Google Docs (utils/intranet_scraper.py):

Once authenticated, the scraper navigates to a URL and begins the extraction process. A key challenge is that much of the content, especially from Google Docs, is embedded within <iframe> elements or linked directly. The scraper systematically finds and processes these.


The extractiframe_contents function serves as the orchestrator for this. It first looks for direct links to Google Docs and then processes all other iframes it can find on the page.

def _extract_iframe_contents(self, driver):
    """Extract content from all iframes in the page"""
    embedded_contents = []
    structured_docs = []

    try:
        # First, specifically look for Google Docs URLs on the page
        doc_contents, doc_structured = self._process_google_docs_urls(driver)
        embedded_contents.extend(doc_contents)
        structured_docs.extend(doc_structured)

        # Then, process all other generic iframes
        iframe_contents = self._process_iframes(driver)
        embedded_contents.extend(iframe_contents)

    except Exception as e:
        print(f"❌ Error extracting iframe contents: {e}")

    return embedded_contents, structured_docs

To get the content from a Google Doc, the scraper doesn't try to parse the complex editor HTML. Instead, it constructs a special export URL to download the document directly as a plain text file. This is a much more reliable method.

def _get_google_doc_content_directly(self, driver, doc_url):
    """Get Google Doc content by downloading it as a text file"""
    try:
        # Extract the unique document ID from its URL
        doc_id = self._extract_doc_id(doc_url)
        if not doc_id: return None

        # Setup a temporary directory for the download and trigger it
        downloads_dir, existing_files = self._setup_doc_download(driver, doc_id)
        if not downloads_dir: return None

        # Wait for the .txt file to appear in the directory
        downloaded_file = self._wait_for_download(downloads_dir, existing_files, doc_id)
        if not downloaded_file: return None

        # Read the content from the downloaded file and then delete it
        return self._read_and_cleanup_file(downloaded_file)

    except Exception as e:
        print(f"❌ Error downloading Google Doc content: {e}")
        return None

def _setup_doc_download(self, driver, doc_id):
    """Setup download directory and trigger document download"""
    downloads_dir = getattr(self, "downloads_dir", "/tmp")
    existing_files = set(os.listdir(downloads_dir))

    # This URL format forces a direct download of the Google Doc as plain text
    export_url = f"https://docs.google.com/document/d/{doc_id}/export?format=txt"
    print(f"📄 Downloading Google Doc content from: {export_url}")

    driver.get(export_url) # This action triggers the browser download
    return downloads_dir, existing_files

After all the content (from the main page and any embedded documents) has been extracted, it is converted to Markdown and uploaded to Google Cloud Storage.


Cleaning HTML and Converting to Markdown (utils/html_to_markdown_converter.py):

Before converting the scraped HTML to Markdown, it's crucial to clean it. Raw HTML from a website contains a lot of noise that is irrelevant to the AI, such as scripts, styles, navigation bars, and footers. The clean_html_content function uses the BeautifulSoup library to parse the HTML and systematically remove these unwanted elements.


First, a list of undesirable tags is defined. This includes everything from scripts and styles to common structural elements like <nav> and <footer>.

def _get_unwanted_tags(remove_images: bool) -> list[str]:
    """Get list of unwanted HTML tags to remove."""
    unwanted_tags = [
        "script", "style", "noscript", "object", "embed", "applet",
        "form", "input", "button", "select", "textarea", "canvas",
        "svg", "audio", "video", "map", "area", "base", "link",
        "nav", "aside", "footer", "header",
    ]

    if remove_images:
        unwanted_tags.extend(["img", "picture", "figure", "figcaption"])

    return unwanted_tags

The main conversion function, html_to_markdown, orchestrates the process. It first calls clean_html_content to strip out the noise and then uses the html-to-markdown library to perform the final conversion, ensuring the output is clean, readable, and optimized for the AI model.

def html_to_markdown(
    html_content: str, clean_html: bool = True, remove_images: bool = True, **options
) -> str:
    """
    Convert HTML content to Markdown format.
    """
    try:
        # Clean HTML content first if requested
        if clean_html:
            html_content = clean_html_content(html_content, remove_images=remove_images)

        # ... (default options for conversion)

        markdown = convert_to_markdown(html_content, **conversion_options)

        # ... (post-processing to clean up extra whitespace)

        return "\n".join(cleaned_lines).strip()

    except Exception as e:
        raise RuntimeError(f"Error converting HTML to Markdown: {e}")

This cleaning step is vital for improving the signal-to-noise ratio of the data we feed into the AI, resulting in more accurate and relevant answers.


Part 2: The Slack Event Service

With our knowledge base continuously updated in Google Cloud Storage, we can now build the core of our AI agent: a serverless application that handles Slack messages, queries the AI, and delivers intelligent answers.


This service is built with Flask and will be deployed on Google Cloud Run, making it scalable and cost-effective. Its main responsibility is to listen for direct messages sent to our Slack bot.


Step 1: Handling Incoming Slack Messages

The entry point for our service is views/slack_view.py. It uses the slack-bolt library to handle events. When a user sends a direct message to our bot, the handle_slack_event function receives the payload and starts a background thread to process it via process_slack_message_async. Running this in a separate thread is crucial to immediately acknowledge Slack's request and avoid timeouts.


Event Routing (views/slack_view.py):

This function acts as the main router for incoming Slack events. It verifies the request's signature, handles Slack's challenge request, and delegates direct messages to the asynchronous processor.

import os
import threading
from flask import Blueprint, request
from slack_bolt import App
from controllers.admin_agent_controller import AdminAgentController
import utils.slack as slack_service

slack_blueprint = Blueprint("slack", __name__)

# Initialize Slack App
slack_bolt_app = App(
    token=os.getenv("SLACK_BOT_TOKEN"),
    signing_secret=os.getenv("SLACK_SIGNING_SECRET")
)
slack_bot_user_id = slack_bolt_app.client.auth_test().data["user_id"]
admin_agent_controller = AdminAgentController()

@slack_blueprint.route("/event", methods=["POST"])
def handle_slack_event():
    """
    Handle Slack events, particularly app_mention events with file attachments.
    """
    # Verify request is from Slack
    if not slack_service.verifier.is_valid_request(request.get_data(), request.headers):
        return None, 403

    data = request.json
    if "challenge" in data:
        return jsonify({"challenge": data["challenge"]})

    # Check if the message is a direct message to our bot
    if slack_service.is_slack_application_direct_message(data, slack_bot_user_id):
        # Process in a background thread to avoid timeouts
        thread = threading.Thread(target=process_slack_message_async, args=(data,))
        thread.daemon = True
        thread.start()
        return jsonify({"message": "OK"}), 200
        
    # Fallback to the default Slack Bolt handler for other events
    return SlackRequestHandler(slack_bolt_app).handle(request)

The Asynchronous Processor (views/slack_view.py):

The process_slack_message_async function is the heart of our chatbot's real-time logic.

import os
import threading
from flask import Blueprint, request
from slack_bolt import App
from controllers.admin_agent_controller import AdminAgentController
import utils.slack as slack_service

slack_blueprint = Blueprint("slack", __name__)

# Initialize Slack App
slack_bolt_app = App(
    token=os.getenv("SLACK_BOT_TOKEN"),
    signing_secret=os.getenv("SLACK_SIGNING_SECRET")
)
slack_bot_user_id = slack_bolt_app.client.auth_test().data["user_id"]
admin_agent_controller = AdminAgentController()

@slack_blueprint.route("/slack/events", methods=["POST"])
def handle_slack_events():
    """Handle all incoming Slack events."""
    # The SlackRequestHandler translates the HTTP request into a Bolt-compatible format
    return SlackRequestHandler(slack_bolt_app).handle(request)

def process_slack_message_async(data: dict):
    # Import here to avoid circular imports
    from app import app
    from models.postgres.conversation_history import ConversationHistory
    from models.postgres.slack_message_lock import SlackMessageLock

    # Create application context for background thread
    with app.app_context():
        (
            message_text,
            user_id,
            channel_id,
            message_ts,
        ) = slack_service.extract_slack_message_info(data)

        # 1. Deduplication and Concurrency Lock
        if SlackMessageLock.is_message_processed(user_id, message_ts):
            print(f"⏭️ Skipping duplicate message from user {user_id} (ts: {message_ts})")
            return # Skip duplicate message
        if SlackMessageLock.is_user_locked(user_id):
            print(f"🔒 User {user_id} is locked, skipping message (ts: {message_ts})")
            return # Skip if user has a message in flight
        
        try:
            SlackMessageLock.create_lock(user_id, message_ts, channel_id, message_text)
            print(f"🔓 Created lock for user {user_id} (ts: {message_ts})")
        except Exception as e:
            print(f"❌ Failed to create lock for user {user_id} (ts: {message_ts}): {e}")
            return # Handle failure to create lock

        try:
            # 2. Retrieve Conversation History
            conversation_context = ConversationHistory.format_context_for_ai(
                user_id, channel_id, limit=10
            )

            # 3. Detect Language
            detected_language = (
                admin_agent_controller.admin_agent_service.detect_language(message_text)
            )
            
            # 4. Process Message with AI Core
            result = admin_agent_controller.process_message_with_context(
                message=message_text,
                user_id=user_id,
                channel_id=channel_id,
                language=detected_language,
                conversation_context=conversation_context,
            )

            # 5. Send Response and Cleanup
            if result["success"]:
                response_data = result["data"]["response"]
                message = response_data["message"]
                slack_service.send_message(channel=channel_id, text=message)
                _handle_successful_processing(
                    user_id, message_ts, channel_id, message_text, message, detected_language
                )
            else:
                _handle_processing_error(
                    user_id, message_ts, channel_id, detected_language, result["message"]
                )

        except Exception as e:
            # Handle any unexpected errors
            _safe_fail_processing(user_id, message_ts, str(e))

This function follows a clear, robust sequence:


  1. Deduplication & Locking: It uses a PostgreSQL table (SlackMessageLock) to ensure the same message isn't processed twice and that a user can't send multiple queries simultaneously, preventing race conditions.

  2. Retrieve History: It fetches recent conversation history from the database to provide context for follow-up questions.

  3. Detect Language: It determines the user's language to provide a response in their native tongue.

  4. Process with AI: It passes the question, history, and language to the AdminAgentController, which orchestrates the call to the Vertex AI GenerativeModel.

  5. Respond & Cleanup: It sends the AI-generated message back to the user on Slack and updates the lock and conversation history tables.


Step 2: Integrating with Vertex AI

The actual AI query happens within the controller and service layers. The service layer is responsible for creating the prompt that will be sent to the GenerativeModel. This involves combining the user's question, the conversation history, and the entire knowledge base loaded from Google Cloud Storage.


This approach, made possible by models with large context windows, is powerful because it allows the AI to see all available information at once, enabling it to cross-reference documents and provide comprehensive answers without the complexity of a traditional RAG (Retrieval-Augmented Generation) system.


Production Features: Concurrency and Conversation Context

To move from a simple proof-of-concept to a reliable production service, we need to handle two critical challenges: preventing duplicate responses and maintaining conversational memory. Our application solves these using two specialized PostgreSQL models: SlackMessageLock and ConversationHistory.


Preventing Duplicate Responses with Message Locking


The Problem: Network requests can be unreliable. Slack may sometimes send the same message event more than once if it doesn't receive a timely acknowledgment. Additionally, a user might send multiple questions in quick succession. Without a locking mechanism, our bot could process the same message multiple times, leading to duplicate, confusing responses.


The Solution: The SlackMessageLock model acts as a gatekeeper. Before any processing begins, the application checks if the message has already been seen or if the user currently has another message being processed.


Implementation in views/slack_view.py: At the very beginning of the process_slack_message_async function, we perform these checks:

# 1. Deduplication and Concurrency Lock
# Check if message already processed (deduplication)
if SlackMessageLock.is_message_processed(user_id, message_ts):
    print(f"⏭️ Skipping duplicate message from user {user_id} (ts: {message_ts})")
    return

# Check if user is currently locked (has other processing messages)
if SlackMessageLock.is_user_locked(user_id):
    print(f"🔒 User {user_id} is locked, skipping message (ts: {message_ts})")
    return

# Create lock for this message
try:
    SlackMessageLock.create_lock(user_id, message_ts, channel_id, message_text)
    print(f"🔓 Created lock for user {user_id} (ts: {message_ts})")
except Exception as e:
    # ... handle failure to create lock ...
    return

The SlackMessageLock Model (models/postgres/slack_message_lock.py): 

This model uses a unique constraint on user_id and message_ts to prevent duplicates. The is_user_locked method provides a simple way to check for any messages from a specific user that are still in the "processing" state.

class SlackMessageLock(BaseModel):
    __tablename__ = "slack_message_locks"

    user_id = Column(String(50), nullable=False, index=True)
    message_ts = Column(String(30), nullable=False)
    status = Column(String(20), nullable=False, default="processing") # processing, completed, failed
    # ... other columns

    __table_args__ = (
        # Unique constraint to prevent duplicate processing of same message
        Index("ix_user_message_unique", "user_id", "message_ts", unique=True),
        # ... other indexes
    )

    @classmethod
    def is_user_locked(cls, user_id: str) -> bool:
        """Check if a user has any active (processing) message locks."""
        return (
            cls.query.filter_by(user_id=user_id, status="processing").first()
            is not None
        )

Once processing is finished (either successfully or with an error), the lock's status is updated to completed or failed, releasing the lock for that user.


Remembering Conversations with a Sliding Window


The Problem: To handle follow-up questions ("What about international travel?"), the bot needs to remember the recent turns of the conversation.

The Solution: The ConversationHistory model stores each user message and bot response. To keep the context relevant and the database lean, it maintains a "sliding window" of only the last 10 interactions for any given user in a channel.

Implementation in views/slack_view.py: Before calling the AI, the service retrieves the formatted history for the current conversation.

# 2. Retrieve Conversation History
conversation_context = ConversationHistory.format_context_for_ai(
    user_id, channel_id, limit=10
)

# 4. Process Message with AI Core
result = admin_agent_controller.process_message_with_context(
    message=message_text,
    # ...
    conversation_context=conversation_context,
)

The ConversationHistory Model (models/postgres/conversation_history.py): 

This model provides a method to fetch recent interactions and format them into a simple text block that can be prepended to the AI's main prompt.

class ConversationHistory(BaseModel):
    # ... (model definition)

    @classmethod
    def format_context_for_ai(
        cls, user_id: str, channel_id: str, limit: int = 10
    ) -> str:
        """Get conversation context formatted for AI prompt."""
        conversation = cls.get_conversation_context(user_id, channel_id, limit)

        if not conversation:
            return ""

        context_lines = ["RECENT CONVERSATION HISTORY:"]
        for interaction in conversation:
            context_lines.append(f"User: {interaction['user_message']}")
            context_lines.append(f"Assistant: {interaction['bot_response']}")
        
        return "\n".join(context_lines)

After an interaction is complete, a new entry is saved. The model logic then automatically cleans up any records that are older than the last 10 interactions, ensuring the context window "slides" forward.


The Power of a Large Context Window

This is where the magic happens. Instead of complex retrieval logic, we will give the Vertex AI Generative Model all our documents at once - along with the conversation history and let it find the answer.


Integrating with Vertex AI (services/agent_service.py):

Our AdminAgentService will orchestrate the process. It gets the documents from GCS, constructs a detailed prompt, and calls the Vertex AI API.

from vertexai.generative_models import GenerativeModel

class AdminAgentService:
    def __init__(self):
        self.markdown_processor = MarkdownProcessor(...) # Your GCS loader
        self.model = GenerativeModel("gemini-1.5-pro-preview-0409")

    def process_query(self, question: str, language: str = "en", history: list = None):
        # 1. Load the entire knowledge base
        documents = self.markdown_processor.get_all_markdown_documents()
        
        # 2. Create the prompt
        prompt = self.create_search_prompt(documents, question, language, history)
        
        # 3. Generate the response
        response = self.model.generate_content(prompt)
        
        # 4. Format and return
        return self.format_response(response.text)

    def create_search_prompt(self, documents: list, question: str, language: str, history: list = None) -> str:
        # Combine all document content into a single string
        all_content = "\n".join([f"=== DOCUMENT: {doc['filename']} ===\n{doc['content']}" for doc in documents])
        
        history_str = ""
        if history:
            history_str = "CONVERSATION HISTORY:\n" + "\n".join([f"Human: {msg['question']}\nAI: {msg['response']}" for msg in history])
        
        # The prompt engineering is key. We instruct the model on its persona,
        # provide the complete knowledge, and give it clear formatting rules.
        instructions = f"""
        You are a specialized assistant for Leanware with access to the company's complete knowledge base. Your job is to answer questions about company policies and procedures.

        {history_str}

        COMPLETE KNOWLEDGE BASE CONTENT:
        {all_content}

        USER QUESTION: {question}

        INSTRUCTIONS:
        1. Analyze all available information to answer the user's question.
        2. Provide a complete and accurate response based ONLY on the content provided.
        3. Specifically cite which documents contain the relevant information.
        4. If the information is not available, state that clearly.
        5. Respond in {language}.
        """
        
        return instructions

Advantages with massive context:


  1. Perfect Recall: The AI sees everything. It can cross-reference information between multiple policies automatically.

  2. Contextual Understanding: By providing the recent conversation history, the AI can correctly interpret ambiguous follow-up questions.

  3. No "I don't have that information" (if it exists): The model can't miss a document if it's in the prompt.

  4. Simplicity: The code is straightforward. The complexity is handled by the model, not our application.


Smart Caching for Performance and Cost


Every call to the Vertex AI API costs money, and loading hundreds of documents from GCS takes time. Caching is essential.


Level 1: Caching the Knowledge Base

Our MarkdownProcessor (the GCS loader) should implement in-memory caching. The first user query after the server starts might take a few seconds as it loads the knowledge base from GCS. Every subsequent query is sub-second because the documents are already in memory. A simple time-based cache (e.g., refresh every hour) is highly effective.


Level 2: Caching AI Responses (Optional)

For frequently asked questions, you can implement a semantic cache. Instead of caching based on the exact question string, you can use an embedding model to see if a new question is semantically similar to a previously answered one. If it is, you can return the cached answer, saving an expensive LLM call.


Production-Ready Deployment

We'll use Docker to package our Flask application into a portable container that can run anywhere, making it easy to develop locally and deploy to the cloud.


The Dockerfile for Production:

Our Dockerfile defines the environment for our application. It sets up the Python version, installs dependencies into a virtual environment, and configures the gunicorn web server to handle production traffic.

FROM python:3.12.5
COPY ./requirements.txt /requirements.txt
COPY . /app
WORKDIR /app

RUN python -m venv /py && \
    /py/bin/pip install --upgrade pip && \
    /py/bin/pip install -r /requirements.txt

COPY ./google-application-credentials.json /app/google-application-credentials.json

ENV PATH="/py/bin:$PATH"
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

EXPOSE 5000

ENV FLASK_APP=app

CMD exec gunicorn --bind 0.0.0.0:5000 --workers 1 --threads 8 --timeout 0 'app:app'

This Dockerfile creates a self-contained image with our application and all its dependencies, ready to be deployed. It uses gunicorn as a production-grade web server, listening on port 5000.


Automated Deployment with GitHub Actions

Instead of manually building and pushing images, a production-ready setup uses a CI/CD pipeline to automate this process. Here's how you can do it with GitHub Actions, based on the provided deploy_staging.yml workflow.


The workflow consists of two main jobs:

  1. Build and Push: This job compiles the Docker image and pushes it to a container registry like Google Artifact Registry.

  2. Deploy: This job takes the newly built image from the registry and deploys it to Google Cloud Run.


Example GitHub Actions Workflow (.github/workflows/deploy.yml):

name: Deploy to Cloud Run

on:
  push:
    branches: [ master ] # Or your main branch

jobs:
  build-and-push:
    name: Build and push to Google Artifact Registry
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3

      - name: Authenticate to Google Cloud
        uses: 'google-github-actions/auth@v1'
        with:
          credentials_json: '${{ secrets.GCP_CREDENTIALS_JSON }}'

      - name: Build and push Docker image
        uses: docker/build-push-action@v4
        with:
          context: .
          push: true
          tags: '${{ vars.GCP_REGION }}-docker.pkg.dev/${{ vars.GCP_PROJECT_ID }}/ai-chatbot/image:${{ github.sha }}'

  deploy:
    name: Deploy to Cloud Run
    runs-on: ubuntu-latest
    needs: build-and-push
    steps:
      - name: Authenticate to Google Cloud
        uses: 'google-github-actions/auth@v1'
        with:
          credentials_json: '${{ secrets.GCP_CREDENTIALS_JSON }}'
          
      - name: Deploy to Cloud Run
        uses: 'google-github-actions/deploy-cloudrun@v1'
        with:
          service: 'ai-chatbot-service'
          region: '${{ vars.GCP_REGION }}'
          image: '${{ vars.GCP_REGION }}-docker.pkg.dev/${{ vars.GCP_PROJECT_ID }}/ai-chatbot/image:${{ github.sha }}'

This workflow automatically deploys a new version of your service every time you push code to the master branch. You'll need to set up the GCP_CREDENTIALS_JSON secret and the GCP_REGION and GCP_PROJECT_ID variables in your GitHub repository settings.


Wrap Up: The Future of AI Chatbot Development


Modern AI chatbots work best when knowledge management is separate from the query service. A dedicated scraper maintains structured content, and a serverless backend handles requests and scales with demand. Large context windows reduce the need for complex retrieval systems for most enterprise knowledge bases, simplifying the architecture.


The choice of development approach should match the requirements: no-code platforms are sufficient for basic Q&A workflows, while custom development is necessary for complex logic, multiple integrations, or advanced conversation handling. Iterative testing and monitoring are critical to ensure accuracy and stability.


Overall, the quality of a chatbot depends on a precise scope, structured knowledge, and aligning the technology with the intended use case. You can also contact us for guidance on building AI agents that fit your exact requirements.


Good luck!


Frequently Asked Questions

How much does it cost to build an AI chatbot?

Costs vary by approach:


  • No-code platforms: $50-500 per month, depending on features and usage limits.

  • Custom development: $5,000-50,000 upfront, plus $200-2,000 per month for hosting, monitoring, and updates.

  • AI model usage: $0.001-0.01 per interaction, depending on provider and model.

  • Cloud infrastructure: $50-500 per month, based on traffic and storage.


The real cost depends on the scope. For example, a simple support bot answering FAQs can be done on the low end, while an enterprise knowledge system with multiple integrations will be higher. ROI often comes from reduced support hours, faster response times, and improved customer satisfaction.

Can I build an AI chatbot without programming skills?

Yes, with no-code tools, you can set up basic bots that handle structured Q&A or connect to popular apps. This path works for small teams and straightforward use cases. The trade-off is limited customization, weaker security options, and less control over the model’s behavior.


If you need integrations with internal systems, custom workflows, or strict compliance, you’ll eventually need developer support. A hybrid approach starting with no-code, then extending with code, can reduce costs and shorten timelines.

How long does it take to build a production-ready AI chatbot?

Timelines depend on complexity:


  1. Simple Q&A bot: 1-2 weeks

  2. Customer support automation: 1-2 months

  3. Enterprise knowledge management (similar to the Leanware case): 2-3 months

  4. Multi-platform chatbot with complex integrations: 3-6 months


Delays usually come from preparing a clean knowledge base, designing integrations, and testing for reliability.

What’s the difference between RAG and large context window approaches?

  • RAG (Retrieval-Augmented Generation): Best for very large document sets (millions of pages). It requires a vector database and retrieval pipeline but scales well.

  • Large context window models: Work better for smaller datasets (thousands of pages). They’re simpler to implement and usually give more accurate answers for business use cases.


RAG adds complexity and higher engineering cost, while large context models cost more per query but reduce infrastructure overhead. Most mid-sized businesses don’t need RAG unless they manage very large document collections.

How do I ensure my AI chatbot gives accurate responses?

Accuracy depends on:


  1. Quality of knowledge base: Keep content clean, consistent, and updated.

  2. Prompt engineering: Use clear instructions and examples for the model.

  3. Testing: Build a set of “golden questions” with expected answers and evaluate regularly.

  4. Monitoring: Track response accuracy, user feedback, and usage trends.

  5. Iteration: Update prompts, retrain models, or improve the knowledge base based on errors.

  6. Without these processes, accuracy will degrade over time.

Can I integrate my chatbot with existing business systems?

Yes, through APIs and event-driven patterns. Examples:


  1. CRM: Sync customer data or create new leads automatically.

  2. Helpdesk: Log tickets, update status, or fetch case details.

  3. Databases: Run read/write queries for internal data.

  4. Authentication: Enforce SSO or role-based access.


These integrations usually require developer involvement because each system has its own API, data formats, and security rules.


Join our newsletter for fresh insights, once a month. No spam.

 
 
bottom of page