LangChain Persistent Memory API (Simple Setup)

LangChain Memory Is Temporary by Default

LangChain's built-in memory classes — ConversationBufferMemory, ConversationSummaryMemory, and friends — are all in-process. The moment your script ends, your Lambda function returns, or your container restarts, every memory is gone.

What this means in practice: your LangChain agent has no idea it already talked to this user. It can't remember the user's preferences, their project context, or decisions made in prior sessions. Every run starts from zero.

The fix is simple: stop relying on in-process memory classes and start writing important facts to an external store after each run — and reading them back at the start of the next one. Memstore gives you this in two REST calls, with no infrastructure to manage.

Quick Test — curl First

Get your free API key at memstore.dev and verify the API works before touching any LangChain code:

Store a memorycurl

curl -X POST https://memstore.dev/v1/memory/remember \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "User is building a fintech app, prefers Python"}'

Recall across runscurl

curl "https://memstore.dev/v1/memory/recall?q=user+background" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Returns: {"memories":[{"content":"User is building a fintech app...","score":0.94}]}

Step-by-Step LangChain Integration

Install the Memstore SDK

Run pip install memstore. LangChain and your LLM provider are already installed.

Instantiate the client

Create a Memstore instance with your API key. Use ms.remember() and ms.recall() anywhere in your agent.

Recall before building the prompt

At the start of each chain or agent invocation, fetch the most relevant memories and inject them into your system message.

Store after the run

After the chain returns, persist anything worth remembering — the user's question, key facts surfaced, decisions made.

Full Python Example

Installbash

pip install memstore

LangChain agent with persistent memoryPython

from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage
from memstore import Memstore

ms = Memstore(api_key="am_live_...")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.3)

def chat(user_id: str, user_message: str) -> str:
    # 1. Recall relevant context
    memories = ms.recall(user_message, session=user_id)
    context = "\n".join(f"- {m['content']}" for m in memories)

    # 2. Build prompt with memory injected
    system = f"""You are a helpful assistant.

What you remember about this user:
{context or "Nothing yet — this may be their first message."}

Use this context to give personalised, non-repetitive responses."""

    messages = [
        SystemMessage(content=system),
        HumanMessage(content=user_message),
    ]

    # 3. Run the chain
    response = llm(messages).content

    # 4. Store anything worth remembering
    ms.remember(f"User said: {user_message[:150]}", session=user_id)
    if "prefer" in user_message.lower() or "always" in user_message.lower():
        ms.remember(user_message, session=user_id)  # store explicit preferences

    return response

Session scoping tip: pass a session parameter equal to your user's ID. This isolates one user's memories from another's — no cross-contamination even when thousands of users share the same API key.

Using LangChain Chains (LCEL)

If you're using LangChain Expression Language, wrap your chain invocation with the same recall-before / store-after pattern:

LCEL chain with Memstore memoryPython

from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from memstore import Memstore

ms = Memstore(api_key="am_live_...")
prompt = ChatPromptTemplate.from_messages([
    ("system", "User context:\n{memory}\n\nBe concise and helpful."),
    ("human", "{input}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o-mini")

def invoke(user_id: str, user_input: str) -> str:
    memories = ms.recall(user_input, session=user_id)
    result = chain.invoke({
        "memory": "\n".join(m["content"] for m in memories) or "No prior context.",
        "input": user_input,
    })
    ms.remember(user_input, session=user_id)
    return result.content

Why Not Use LangChain's Built-in Memory?

It's ephemeral — lost on every restart, deploy, or timeout
It doesn't scale — in-memory state can't be shared across multiple workers or Lambda instances
It stuffs the context window — ConversationBufferMemory dumps the full history, blowing up token costs
Memstore is semantic — recall returns only the relevant memories, not everything ever stored

Add persistent memory to LangChain

Free tier — 1,000 ops/month. No credit card. Works in 10 minutes.

Get your API key →

LangChain Persistent Memory API(Simple Setup)

LangChain Memory Is Temporary by Default

Quick Test — curl First

Step-by-Step LangChain Integration

Install the Memstore SDK

Instantiate the client

Recall before building the prompt

Store after the run

Full Python Example

Using LangChain Chains (LCEL)

Why Not Use LangChain's Built-in Memory?

Add persistent memory to LangChain

Further Reading

LangChain Persistent Memory API
(Simple Setup)