LangChain Models: Chat, LLMs & Advanced Usage

Leanware Editorial Team
Nov 5
7 min read

The rise of AI orchestration frameworks has fundamentally changed how developers build intelligent applications. Among these frameworks, LangChain stands out for its elegant abstraction layer that manages models across multiple providers, enabling developers to build sophisticated AI systems without vendor lock-in. Understanding LangChain's model system is crucial for anyone building production-grade AI applications that need flexibility, scalability, and maintainability.

What Are Models in LangChain?

In LangChain's context, "models" represent abstract interfaces that standardize interactions with various language models and chat-based systems. This abstraction layer allows developers to switch between providers like OpenAI, Anthropic, or Cohere without rewriting application logic. LangChain treats models as pluggable components that can be composed, chained, and orchestrated to create complex AI workflows.

Chat Models vs LLMs

The distinction between chat models and traditional LLMs is fundamental to LangChain's architecture. Chat models work with structured message formats, accepting and returning message objects that include roles (system, user, assistant) and content. This structure enables rich conversational interactions and maintains context across turns.

# Chat Model Example

from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4")

messages = [

{"role": "system", "content": "You are a helpful assistant"},

{"role": "user", "content": "Explain quantum computing"}

]

response = chat_model.invoke(messages)

Traditional LLMs, in contrast, operate on raw text input and output. They take a string prompt and return a string completion, making them simpler but less structured:

python

# LLM Example

from langchain_openai import OpenAI

llm = OpenAI(model="text-davinci-003")

response = llm.invoke("Complete this sentence: The future of AI is")

Why the Distinction Matters for Your Application

Choosing between chat models and LLMs impacts your application's architecture and capabilities. Chat models excel in agent-like systems, customer service bots, and interactive applications where maintaining conversation state is crucial. They handle multi-turn dialogues naturally and support system prompts for behavior modification.

LLMs are better suited for single-turn tasks like summarization, text generation, and completion tasks, where the overhead of message structure isn't necessary. They're often faster and cheaper for bulk processing tasks that don't require conversational context.

Overview of Supported Model Types

LangChain's ecosystem encompasses various model types, each serving specific purposes in the AI application stack.

Chat Models (Messages In → Messages Out)

Modern chat models form the backbone of most LangChain applications. OpenAI's GPT-4 and GPT-3.5 Turbo provide reliable performance with extensive tool support. Anthropic's Claude series offers larger context windows and nuanced reasoning. Google's Gemini models bring multimodal capabilities. Each maintains message history and supports structured interactions:

from langchain_anthropic import ChatAnthropic

claude = ChatAnthropic(model="claude-3-sonnet-20240229")

response = claude.invoke([

{"role": "user", "content": "What's the weather like?"}

])

Text-Completion Models / LLMs (String In → String Out)

While being phased out by many providers, text-completion models still serve important roles. OpenAI's legacy Davinci models, Cohere's generation models, and open-source alternatives like Llama 2 through Replicate maintain the traditional prompt-completion paradigm. They're particularly useful for specific fine-tuned models or when working with older systems.

2.3 Embedding Models & Other Specialized Models

Beyond generation, LangChain supports embedding models for vector operations, crucial for RAG systems:

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vector = embeddings.embed_query("Sample text for embedding")

Specialized models include rerankers for improving retrieval quality, classification models for intent detection, and custom models wrapped in LangChain's interface.

Key Features of LangChain Models

Standard Parameters (temperature, max_tokens etc.)

LangChain standardizes common parameters across providers, making model behavior predictable:

model = ChatOpenAI(

temperature=0.7, # Creativity level (0-2)

max_tokens=500, # Maximum response length

top_p=0.9, # Nucleus sampling

frequency_penalty=0.5, # Reduce repetition

presence_penalty=0.5 # Encourage topic diversity

Default behaviors vary by provider, but LangChain normalizes these where possible, providing consistent interfaces for temperature scaling, token limits, and sampling parameters.

Interface Methods: invoke(), stream(), batch()

LangChain provides three primary methods for model interaction, each optimized for different use cases:

# Single invocation

response = model.invoke("What is machine learning?")

# Streaming for real-time output

for chunk in model.stream("Tell me a story"):

print(chunk.content, end="")

# Batch processing for efficiency

responses = model.batch([

"Question 1",

"Question 2",

"Question 3"

])

The invoke() method handles synchronous single requests, stream() enables token-by-token output for responsive UIs, and batch() processes multiple inputs efficiently with automatic concurrency management.

Tool Calling: Binding External Tools to Models

LangChain's tool calling capability transforms models into agents that can interact with external systems:

from langchain.tools import Tool

from langchain.agents import create_tool_calling_agent

def calculate(expression: str) -> str:

return str(eval(expression))

calculator_tool = Tool(

name="Calculator",

func=calculate,

description="Useful for mathematical calculations"

)

model_with_tools = model.bind_tools([calculator_tool])

response = model_with_tools.invoke("What is 25 * 48?")

This enables retrieval augmentation, API calls, database queries, and any custom functionality your application needs.

Structured Outputs & JSON Modes

Modern applications often need structured data from models. LangChain supports JSON mode and structured outputs across compatible providers:

from langchain.output_parsers import PydanticOutputParser

from pydantic import BaseModel

class ProductInfo(BaseModel):

name: str

price: float

in_stock: bool

parser = PydanticOutputParser(pydantic_object=ProductInfo)

model_with_structure = model.bind(

response_format={"type": "json_object"}

)

prompt = f"Extract product info: iPhone 15 Pro costs $999 and is available. {parser.get_format_instructions()}"

response = model_with_structure.invoke(prompt)

structured_output = parser.parse(response.content)

Integrations & Ecosystem

Providers and Supported Model Libraries

LangChain's extensive provider support includes:

OpenAI: GPT-4, GPT-3.5, embeddings, DALL-E
Anthropic: Claude 3 family, legacy Claude models
Google: Gemini Pro, PaLM, Vertex AI
Cohere: Command, Embed, Rerank models
Hugging Face: Thousands of open-source models
AWS Bedrock: Managed model access
Together AI, Replicate: Hosted open-source models
Local models: Ollama, LlamaCPP, GPT4All

Each integration maintains provider-specific features while conforming to LangChain's standard interfaces.

Switching Providers: Vendor-Agnostic Architecture

LangChain's abstraction enables seamless provider switching:

# Easy to switch providers

def create_model(provider="openai"):

if provider == "openai":

return ChatOpenAI(model="gpt-4")

elif provider == "anthropic":

return ChatAnthropic(model="claude-3-sonnet-20240229")

elif provider == "google":

return ChatGoogleGenerativeAI(model="gemini-pro")

# Application code remains unchanged

model = create_model(provider="anthropic")

response = model.invoke("Same prompt works everywhere")

This vendor-agnostic approach protects against service outages, enables cost optimization, and prevents vendor lock-in.

Multimodality & Context Window Considerations

Modern models support various input types and context sizes. Claude 3 offers 200k token windows, Gemini handles 1M+ tokens, and GPT-4 Turbo supports 128k tokens. Multimodal capabilities allow image, audio, and document processing:

from langchain_core.messages import HumanMessage

message = HumanMessage(

content=[

{"type": "text", "text": "What's in this image?"},

{"type": "image_url", "image_url": {"url": image_url}}

]

)

response = multimodal_model.invoke([message])

Advanced Topics & Best Practices

Rate-Limiting, Caching, and Performance Optimization

Implement rate limiting to avoid API throttling:

from langchain.cache import InMemoryCache

from langchain.globals import set_llm_cache

import time

# Enable caching

set_llm_cache(InMemoryCache())

# Rate limiting wrapper

class RateLimitedModel:

def init(self, model, requests_per_minute=60):

self.model = model

self.rpm = requests_per_minute

self.last_request = 0

def invoke(self, args, *kwargs):

elapsed = time.time() - self.last_request

if elapsed < 60 / self.rpm:

time.sleep(60 / self.rpm - elapsed)

self.last_request = time.time()

return self.model.invoke(*args, **kwargs)

Streaming vs Batch vs Real-Time Usage

Choose the right calling strategy based on your needs:

Streaming: Best for chat interfaces, real-time feedback, and long responses
Batch: Optimal for bulk processing, analytics, and offline tasks
Real-time: Use for synchronous APIs, simple queries, and when latency isn't critical

Choosing the Right Model for Your Use Case

Use Case	Recommended Model	Key Factors
Conversational AI	GPT-4, Claude 3	Context retention, tool support
Code Generation	GPT-4, Claude 3	Reasoning ability, syntax understanding
Summarization	GPT-3.5 Turbo, Mixtral	Cost-efficiency, speed
Embeddings	text-embedding-3-small	Vector quality, dimension size
Local/Private	Llama 2, Mistral	Privacy, no API costs

Getting Started: A Simple LangChain Model

Example

Installing the Library and Setting Up Credentials

pip install langchain langchain-openai langchain-anthropic python-dotenv

```

Create a `.env` file:

```

OPENAI_API_KEY=your_openai_key

ANTHROPIC_API_KEY=your_anthropic_key

Instantiating a Chat Model

python

from langchain_openai import ChatOpenAI

from dotenv import load_dotenv

load_dotenv()

# Initialize model

model = ChatOpenAI(

model="gpt-4",

temperature=0.7,

max_tokens=500

Running a Simple Prompt and Interpreting the Output

# Simple invocation

response = model.invoke("Explain neural networks in simple terms")

print(response.content) # Access the text content

print(response.response_metadata) # View token usage, model info

# With message history

messages = [

{"role": "system", "content": "You are a helpful teacher"},

{"role": "user", "content": "What is machine learning?"}

]

response = model.invoke(messages)

# Streaming example

for chunk in model.stream("Write a haiku about programming"):

print(chunk.content, end="", flush=True)

Conclusion

LangChain’s model ecosystem empowers developers to build flexible, scalable, and provider-agnostic AI systems. By understanding the distinction between chat models and traditional LLMs—and leveraging features like tool calling, structured outputs, and streaming—you can design intelligent workflows that fit your specific use cases. Whether you’re building conversational agents, RAG systems, or multimodal applications, LangChain offers the tools to orchestrate them seamlessly across providers.

Ready to integrate LangChain into your AI stack or explore the right model architecture for your project? Contact Us to get expert guidance and accelerate your development journey.

FAQs

What's the difference between chat models and LLMs in LangChain?

Chat models handle structured message formats with roles (system, user, assistant) and maintain conversation context, making them ideal for interactive applications. LLMs process raw text input and output, better suited for simple completion tasks and bulk processing.

Can I use multiple models in a single LangChain app?

Yes, LangChain's modular architecture encourages using multiple models for different tasks. You can combine OpenAI for generation, Anthropic for complex reasoning, and local models for privacy-sensitive operations, all within the same application.

What models does LangChain support?

LangChain supports all major providers, including OpenAI (GPT-4, GPT-3.5), Anthropic (Claude 3), Google (Gemini, PaLM), Cohere, Hugging Face models, AWS Bedrock, and local models through Ollama and LlamaCPP. The list continuously expands with new integrations.

How do I switch from OpenAI to Anthropic in LangChain?

Simply change the model constructor and credentials. LangChain's unified interface means your application logic remains unchanged:

# From OpenAI

model = ChatOpenAI(model="gpt-4")

# To Anthropic

model = ChatAnthropic(model="claude-3-sonnet-20240229")

# Same interface

response = model.invoke("Your prompt")

Does LangChain support streaming outputs?

Yes, use the stream() method for real-time token-by-token output. Most major providers support streaming, enabling responsive user interfaces and progress indication for long responses.