top of page
leanware most promising latin america tech company 2021 badge by cioreview
clutch global award leanware badge
clutch champion leanware badge
clutch top bogota pythn django developers leanware badge
clutch top bogota developers leanware badge
clutch top web developers leanware badge
clutch top bubble development firm leanware badge
clutch top company leanware badge
leanware on the manigest badge
leanware on teach times review badge

Learn more at Clutch and Tech Times

LangChain Models: Chat, LLMs & Advanced Usage

  • Writer: Leanware Editorial Team
    Leanware Editorial Team
  • Nov 5
  • 7 min read

The rise of AI orchestration frameworks has fundamentally changed how developers build intelligent applications. Among these frameworks, LangChain stands out for its elegant abstraction layer that manages models across multiple providers, enabling developers to build sophisticated AI systems without vendor lock-in. Understanding LangChain's model system is crucial for anyone building production-grade AI applications that need flexibility, scalability, and maintainability.


What Are Models in LangChain?

In LangChain's context, "models" represent abstract interfaces that standardize interactions with various language models and chat-based systems. This abstraction layer allows developers to switch between providers like OpenAI, Anthropic, or Cohere without rewriting application logic. LangChain treats models as pluggable components that can be composed, chained, and orchestrated to create complex AI workflows.


Chat Models vs LLMs

The distinction between chat models and traditional LLMs is fundamental to LangChain's architecture. Chat models work with structured message formats, accepting and returning message objects that include roles (system, user, assistant) and content. This structure enables rich conversational interactions and maintains context across turns.


# Chat Model Example

from langchain_openai import ChatOpenAI


chat_model = ChatOpenAI(model="gpt-4")

messages = [

    {"role": "system", "content": "You are a helpful assistant"},

    {"role": "user", "content": "Explain quantum computing"}

]

response = chat_model.invoke(messages)


Traditional LLMs, in contrast, operate on raw text input and output. They take a string prompt and return a string completion, making them simpler but less structured:


python

# LLM Example

from langchain_openai import OpenAI


llm = OpenAI(model="text-davinci-003")

response = llm.invoke("Complete this sentence: The future of AI is")


Why the Distinction Matters for Your Application

Choosing between chat models and LLMs impacts your application's architecture and capabilities. Chat models excel in agent-like systems, customer service bots, and interactive applications where maintaining conversation state is crucial. They handle multi-turn dialogues naturally and support system prompts for behavior modification.


LLMs are better suited for single-turn tasks like summarization, text generation, and completion tasks, where the overhead of message structure isn't necessary. They're often faster and cheaper for bulk processing tasks that don't require conversational context.


Overview of Supported Model Types

LangChain's ecosystem encompasses various model types, each serving specific purposes in the AI application stack.


 Chat Models (Messages In → Messages Out)

Modern chat models form the backbone of most LangChain applications. OpenAI's GPT-4 and GPT-3.5 Turbo provide reliable performance with extensive tool support. Anthropic's Claude series offers larger context windows and nuanced reasoning. Google's Gemini models bring multimodal capabilities. Each maintains message history and supports structured interactions:


from langchain_anthropic import ChatAnthropic


claude = ChatAnthropic(model="claude-3-sonnet-20240229")

response = claude.invoke([

    {"role": "user", "content": "What's the weather like?"}

])

Text-Completion Models / LLMs (String In → String Out)

While being phased out by many providers, text-completion models still serve important roles. OpenAI's legacy Davinci models, Cohere's generation models, and open-source alternatives like Llama 2 through Replicate maintain the traditional prompt-completion paradigm. They're particularly useful for specific fine-tuned models or when working with older systems.


2.3 Embedding Models & Other Specialized Models

Beyond generation, LangChain supports embedding models for vector operations, crucial for RAG systems:


from langchain_openai import OpenAIEmbeddings


embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vector = embeddings.embed_query("Sample text for embedding")

Specialized models include rerankers for improving retrieval quality, classification models for intent detection, and custom models wrapped in LangChain's interface.


Key Features of LangChain Models


ree

Standard Parameters (temperature, max_tokens etc.)

LangChain standardizes common parameters across providers, making model behavior predictable:


model = ChatOpenAI(

    temperature=0.7# Creativity level (0-2)

    max_tokens=500,   # Maximum response length

    top_p=0.9,        # Nucleus sampling

    frequency_penalty=0.5# Reduce repetition

    presence_penalty=0.5    # Encourage topic diversity


Default behaviors vary by provider, but LangChain normalizes these where possible, providing consistent interfaces for temperature scaling, token limits, and sampling parameters.


Interface Methods: invoke(), stream(), batch()

LangChain provides three primary methods for model interaction, each optimized for different use cases:


# Single invocation

response = model.invoke("What is machine learning?")


# Streaming for real-time output

for chunk in model.stream("Tell me a story"):

    print(chunk.content, end="")


# Batch processing for efficiency

responses = model.batch([

    "Question 1",

    "Question 2",

    "Question 3"

])

The invoke() method handles synchronous single requests, stream() enables token-by-token output for responsive UIs, and batch() processes multiple inputs efficiently with automatic concurrency management.


Tool Calling: Binding External Tools to Models

LangChain's tool calling capability transforms models into agents that can interact with external systems:


from langchain.tools import Tool

from langchain.agents import create_tool_calling_agent


def calculate(expression: str) -> str:

    return str(eval(expression))


calculator_tool = Tool(

    name="Calculator",

    func=calculate,

    description="Useful for mathematical calculations"

)


model_with_tools = model.bind_tools([calculator_tool])

response = model_with_tools.invoke("What is 25 * 48?")


This enables retrieval augmentation, API calls, database queries, and any custom functionality your application needs.


Structured Outputs & JSON Modes

Modern applications often need structured data from models. LangChain supports JSON mode and structured outputs across compatible providers:


from langchain.output_parsers import PydanticOutputParser

from pydantic import BaseModel


class ProductInfo(BaseModel):

    name: str

    price: float

    in_stock: bool


parser = PydanticOutputParser(pydantic_object=ProductInfo)

model_with_structure = model.bind(

    response_format={"type": "json_object"}

)


prompt = f"Extract product info: iPhone 15 Pro costs $999 and is available. {parser.get_format_instructions()}"

response = model_with_structure.invoke(prompt)

structured_output = parser.parse(response.content)


Integrations & Ecosystem


Providers and Supported Model Libraries

LangChain's extensive provider support includes:


  • OpenAI: GPT-4, GPT-3.5, embeddings, DALL-E

  • Anthropic: Claude 3 family, legacy Claude models

  • Google: Gemini Pro, PaLM, Vertex AI

  • Cohere: Command, Embed, Rerank models

  • Hugging Face: Thousands of open-source models

  • AWS Bedrock: Managed model access

  • Together AI, Replicate: Hosted open-source models

  • Local models: Ollama, LlamaCPP, GPT4All


Each integration maintains provider-specific features while conforming to LangChain's standard interfaces.


Switching Providers: Vendor-Agnostic Architecture

LangChain's abstraction enables seamless provider switching:


# Easy to switch providers

def create_model(provider="openai"):

    if provider == "openai":

        return ChatOpenAI(model="gpt-4")

    elif provider == "anthropic":

        return ChatAnthropic(model="claude-3-sonnet-20240229")

    elif provider == "google":

        return ChatGoogleGenerativeAI(model="gemini-pro")

    

# Application code remains unchanged

model = create_model(provider="anthropic")

response = model.invoke("Same prompt works everywhere")


This vendor-agnostic approach protects against service outages, enables cost optimization, and prevents vendor lock-in.


Multimodality & Context Window Considerations

Modern models support various input types and context sizes. Claude 3 offers 200k token windows, Gemini handles 1M+ tokens, and GPT-4 Turbo supports 128k tokens. Multimodal capabilities allow image, audio, and document processing:


from langchain_core.messages import HumanMessage


message = HumanMessage(

    content=[

        {"type": "text", "text": "What's in this image?"},

        {"type": "image_url", "image_url": {"url": image_url}}

    ]

)

response = multimodal_model.invoke([message])


Advanced Topics & Best Practices

 Rate-Limiting, Caching, and Performance Optimization

Implement rate limiting to avoid API throttling:


from langchain.cache import InMemoryCache

from langchain.globals import set_llm_cache

import time


# Enable caching

set_llm_cache(InMemoryCache())


# Rate limiting wrapper

class RateLimitedModel:

    def init(self, model, requests_per_minute=60):

        self.model = model

        self.rpm = requests_per_minute

        self.last_request = 0

    

    def invoke(self, args, *kwargs):

        elapsed = time.time() - self.last_request

        if elapsed < 60 / self.rpm:

            time.sleep(60 / self.rpm - elapsed)

        self.last_request = time.time()

        return self.model.invoke(*args, **kwargs)


Streaming vs Batch vs Real-Time Usage

Choose the right calling strategy based on your needs:


  • Streaming: Best for chat interfaces, real-time feedback, and long responses

  • Batch: Optimal for bulk processing, analytics, and offline tasks

  • Real-time: Use for synchronous APIs, simple queries, and when latency isn't critical


Choosing the Right Model for Your Use Case

Use Case

Recommended Model

Key Factors

Conversational AI

GPT-4, Claude 3

Context retention, tool support

Code Generation

GPT-4, Claude 3

Reasoning ability, syntax understanding

Summarization

GPT-3.5 Turbo, Mixtral

Cost-efficiency, speed

Embeddings

text-embedding-3-small

Vector quality, dimension size

Local/Private

Llama 2, Mistral

Privacy, no API costs

Getting Started: A Simple LangChain Model


Example

Installing the Library and Setting Up Credentials


pip install langchain langchain-openai langchain-anthropic python-dotenv

```


Create a `.env` file:

```

OPENAI_API_KEY=your_openai_key

ANTHROPIC_API_KEY=your_anthropic_key


 Instantiating a Chat Model


python

from langchain_openai import ChatOpenAI

from dotenv import load_dotenv


load_dotenv()


# Initialize model

model = ChatOpenAI(

    model="gpt-4",

    temperature=0.7,

    max_tokens=500


Running a Simple Prompt and Interpreting the Output


# Simple invocation

response = model.invoke("Explain neural networks in simple terms")

print(response.content)  # Access the text content

print(response.response_metadata)  # View token usage, model info


# With message history

messages = [

    {"role": "system", "content": "You are a helpful teacher"},

    {"role": "user", "content": "What is machine learning?"}

]

response = model.invoke(messages)


# Streaming example

for chunk in model.stream("Write a haiku about programming"):

    print(chunk.content, end="", flush=True)


Conclusion

LangChain’s model ecosystem empowers developers to build flexible, scalable, and provider-agnostic AI systems. By understanding the distinction between chat models and traditional LLMs—and leveraging features like tool calling, structured outputs, and streaming—you can design intelligent workflows that fit your specific use cases. Whether you’re building conversational agents, RAG systems, or multimodal applications, LangChain offers the tools to orchestrate them seamlessly across providers.


Ready to integrate LangChain into your AI stack or explore the right model architecture for your project? Contact Us to get expert guidance and accelerate your development journey.


FAQs

What's the difference between chat models and LLMs in LangChain?

Chat models handle structured message formats with roles (system, user, assistant) and maintain conversation context, making them ideal for interactive applications. LLMs process raw text input and output, better suited for simple completion tasks and bulk processing.

Can I use multiple models in a single LangChain app?

Yes, LangChain's modular architecture encourages using multiple models for different tasks. You can combine OpenAI for generation, Anthropic for complex reasoning, and local models for privacy-sensitive operations, all within the same application.

What models does LangChain support?

LangChain supports all major providers, including OpenAI (GPT-4, GPT-3.5), Anthropic (Claude 3), Google (Gemini, PaLM), Cohere, Hugging Face models, AWS Bedrock, and local models through Ollama and LlamaCPP. The list continuously expands with new integrations.

How do I switch from OpenAI to Anthropic in LangChain?

Simply change the model constructor and credentials. LangChain's unified interface means your application logic remains unchanged:

# From OpenAI

model = ChatOpenAI(model="gpt-4")


# To Anthropic

model = ChatAnthropic(model="claude-3-sonnet-20240229")


# Same interface

response = model.invoke("Your prompt")

Does LangChain support streaming outputs?

Yes, use the stream() method for real-time token-by-token output. Most major providers support streaming, enabling responsive user interfaces and progress indication for long responses.


bottom of page