[PYTHON/LANGCHAIN] RunnableBinding 클래스 : stream 메소드를 사용해 RAG 애플리케이션 결과 스트리밍하기

■ RunnableBinding 클래스의 stream 메소드를 사용해 RAG 애플리케이션의 결과를 스트리밍하는 방법을 보여준다.

※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.

▶ main.py


import bs4

from dotenv                               import load_dotenv
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters             import RecursiveCharacterTextSplitter
from langchain_openai                     import OpenAIEmbeddings
from langchain_chroma                     import Chroma
from langchain_openai                     import ChatOpenAI
from langchain_core.prompts               import ChatPromptTemplate
from langchain.chains.combine_documents   import create_stuff_documents_chain
from langchain.chains                     import create_retrieval_chain

load_dotenv()

webBaseLoader = WebBaseLoader(
    web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs = dict(parse_only = bs4.SoupStrainer(class_ = ("post-content", "post-title", "post-header")))
)

documentList = webBaseLoader.load()

recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)

splitDocumentList = recursiveCharacterTextSplitter.split_documents(documentList)

chroma = Chroma.from_documents(documents = splitDocumentList, embedding = OpenAIEmbeddings())

vectorStoreRetriever = chroma.as_retriever()

chatOpenAI = ChatOpenAI(model = "gpt-4o-mini")

chatPromptTemplate = ChatPromptTemplate.from_messages(
    [
        ("system", "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise.\n\n{context}"),
        ("human", "{input}")
    ]
)

runnableBinding1 = create_stuff_documents_chain(chatOpenAI, chatPromptTemplate)

# create_retrieval_chain 함수로 구성된 체인은 "input", "context", "answer" 키를 가진 dict 객체를 반환한다.
# 여기서 "answer" 키만 토큰 단위로 스트리밍되고, 검색과 같은 다른 구성 요소는 토큰 수준 스트리밍을 지원하지 않는다.
runnableBinding2 = create_retrieval_chain(vectorStoreRetriever, runnableBinding1)

for addableDict in runnableBinding2.stream({"input" : "What is Task Decomposition?"}):
    print(addableDict)
print("-" * 50)

for addableDict in runnableBinding2.stream({"input" : "What is Task Decomposition?"}):
    if answerAddableDict := addableDict.get("answer"):
        print(f"{answerAddableDict}|", end = "", flush = True)
print()
print("-" * 50)

"""
{'input': 'What is Task Decomposition?'}
{'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for
enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2)
by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.'), Document(metadata={'source':
'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content="(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.")]}
{'answer': ''}
{'answer': 'Task'}
{'answer': ' decomposition'}
{'answer': ' is'}
{'answer': ' the'}
{'answer': ' process'}
{'answer': ' of'}
{'answer': ' breaking'}
{'answer': ' down'}
{'answer': ' a'}
{'answer': ' complicated'}
{'answer': ' task'}
{'answer': ' into'}
{'answer': ' smaller'}
{'answer': ','}
{'answer': ' manageable'}
{'answer': ' steps'}
{'answer': '.'}
{'answer': ' Techniques'}
{'answer': ' such'}
{'answer': ' as'}
{'answer': ' Chain'}
{'answer': ' of'}
{'answer': ' Thought'}
{'answer': ' ('}
{'answer': 'Co'}
{'answer': 'T'}
{'answer': ')'}
{'answer': ' and'}
{'answer': ' Tree'}
{'answer': ' of'}
{'answer': ' Thoughts'}
{'answer': ' ('}
{'answer': 'To'}
{'answer': 'T'}
{'answer': ')'}
{'answer': ' help'}
{'answer': ' enhance'}
{'answer': ' model'}
{'answer': ' performance'}
{'answer': ' by'}
{'answer': ' struct'}
{'answer': 'uring'}
{'answer': ' the'}
{'answer': ' reasoning'}
{'answer': ' process'}
{'answer': ','}
{'answer': ' allowing'}
{'answer': ' for'}
{'answer': ' a'}
{'answer': ' clearer'}
{'answer': ' understanding'}
{'answer': ' and'}
{'answer': ' execution'}
{'answer': ' of'}
{'answer': ' complex'}
{'answer': ' tasks'}
{'answer': '.'}
{'answer': ' This'}
{'answer': ' can'}
{'answer': ' involve'}
{'answer': ' simple'}
{'answer': ' prompting'}
{'answer': ','}
{'answer': ' task'}
{'answer': '-specific'}
{'answer': ' instructions'}
{'answer': ','}
{'answer': ' or'}
{'answer': ' human'}
{'answer': ' inputs'}
{'answer': ' to'}
{'answer': ' guide'}
{'answer': ' the'}
{'answer': ' decomposition'}
{'answer': '.'}
{'answer': ''}
--------------------------------------------------
Task| decomposition| is| the| process| of| breaking| down| a| complicated| task| into| smaller|,| more| manageable| steps|.| Techniques| like| Chain| of| Thought| (|Co|T|)| and| Tree| of| Thoughts| allow| models| to| think| through| tasks| in| a| structured| way|,| enhancing| their| performance| by| simplifying| complex| problems|.| This| process| can| be| achieved| through| prompting|,| task|-specific| instructions|,| or| human| input|.|
--------------------------------------------------
"""

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

import bs4

from dotenv import load_dotenv

from langchain_community.document_loaders import WebBaseLoader

from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_openai import OpenAIEmbeddings

from langchain_chroma import Chroma

from langchain_openai import ChatOpenAI

from langchain_core.prompts import ChatPromptTemplate

from langchain.chains.combine_documents import create_stuff_documents_chain

from langchain.chains import create_retrieval_chain

load_dotenv()

webBaseLoader = WebBaseLoader(

web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",),

bs_kwargs = dict(parse_only = bs4.SoupStrainer(class_ = ("post-content", "post-title", "post-header")))

)

documentList = webBaseLoader.load()

recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)

splitDocumentList = recursiveCharacterTextSplitter.split_documents(documentList)

chroma = Chroma.from_documents(documents = splitDocumentList, embedding = OpenAIEmbeddings())

vectorStoreRetriever = chroma.as_retriever()

chatOpenAI = ChatOpenAI(model = "gpt-4o-mini")

chatPromptTemplate = ChatPromptTemplate.from_messages(

[

("system", "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise.\n\n{context}"),

("human", "{input}")

]

)

runnableBinding1 = create_stuff_documents_chain(chatOpenAI, chatPromptTemplate)

# create_retrieval_chain 함수로 구성된 체인은 "input", "context", "answer" 키를 가진 dict 객체를 반환한다.

# 여기서 "answer" 키만 토큰 단위로 스트리밍되고, 검색과 같은 다른 구성 요소는 토큰 수준 스트리밍을 지원하지 않는다.

runnableBinding2 = create_retrieval_chain(vectorStoreRetriever, runnableBinding1)

for addableDict in runnableBinding2.stream({"input" : "What is Task Decomposition?"}):

print(addableDict)

print("-" * 50)

for addableDict in runnableBinding2.stream({"input" : "What is Task Decomposition?"}):

if answerAddableDict := addableDict.get("answer"):

print(f"{answerAddableDict}|", end = "", flush = True)

print()

print("-" * 50)

"""

{'input': 'What is Task Decomposition?'}

{'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for

enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2)

by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Resources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.'), Document(metadata={'source':

'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content="(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.")]}

{'answer': ''}

{'answer': 'Task'}

{'answer': ' decomposition'}

{'answer': ' is'}

{'answer': ' the'}

{'answer': ' process'}

{'answer': ' of'}

{'answer': ' breaking'}

{'answer': ' down'}

{'answer': ' a'}

{'answer': ' complicated'}

{'answer': ' task'}

{'answer': ' into'}

{'answer': ' smaller'}

{'answer': ','}

{'answer': ' manageable'}

{'answer': ' steps'}

{'answer': '.'}

{'answer': ' Techniques'}

{'answer': ' such'}

{'answer': ' as'}

{'answer': ' Chain'}

{'answer': ' of'}

{'answer': ' Thought'}

{'answer': ' ('}

{'answer': 'Co'}

{'answer': 'T'}

{'answer': ')'}

{'answer': ' and'}

{'answer': ' Tree'}

{'answer': ' of'}

{'answer': ' Thoughts'}

{'answer': ' ('}

{'answer': 'To'}

{'answer': 'T'}

{'answer': ')'}

{'answer': ' help'}

{'answer': ' enhance'}

{'answer': ' model'}

{'answer': ' performance'}

{'answer': ' by'}

{'answer': ' struct'}

{'answer': 'uring'}

{'answer': ' the'}

{'answer': ' reasoning'}

{'answer': ' process'}

{'answer': ','}

{'answer': ' allowing'}

{'answer': ' for'}

{'answer': ' a'}

{'answer': ' clearer'}

{'answer': ' understanding'}

{'answer': ' and'}

{'answer': ' execution'}

{'answer': ' of'}

{'answer': ' complex'}

{'answer': ' tasks'}

{'answer': '.'}

{'answer': ' This'}

{'answer': ' can'}

{'answer': ' involve'}

{'answer': ' simple'}

{'answer': ' prompting'}

{'answer': ','}

{'answer': ' task'}

{'answer': '-specific'}

{'answer': ' instructions'}

{'answer': ','}

{'answer': ' or'}

{'answer': ' human'}

{'answer': ' inputs'}

{'answer': ' to'}

{'answer': ' guide'}

{'answer': ' the'}

{'answer': ' decomposition'}

{'answer': '.'}

{'answer': ''}

--------------------------------------------------

--------------------------------------------------

"""

▶ requirements.txt


aiohappyeyeballs==2.4.4
aiohttp==3.11.9
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.6.2.post1
asgiref==3.8.1
attrs==24.2.0
backoff==2.2.1
bcrypt==4.2.1
beautifulsoup4==4.12.3
bs4==0.0.2
build==1.2.2.post1
cachetools==5.5.0
certifi==2024.8.30
charset-normalizer==3.4.0
chroma-hnswlib==0.7.6
chromadb==0.5.20
click==8.1.7
colorama==0.4.6
coloredlogs==15.0.1
dataclasses-json==0.6.7
Deprecated==1.2.15
distro==1.9.0
durationpy==0.9
fastapi==0.115.5
filelock==3.16.1
flatbuffers==24.3.25
frozenlist==1.5.0
fsspec==2024.10.0
google-auth==2.36.0
googleapis-common-protos==1.66.0
greenlet==3.1.1
grpcio==1.68.1
h11==0.14.0
httpcore==1.0.7
httptools==0.6.4
httpx==0.28.0
httpx-sse==0.4.0
huggingface-hub==0.26.3
humanfriendly==10.0
idna==3.10
importlib_metadata==8.5.0
importlib_resources==6.4.5
jiter==0.8.0
jsonpatch==1.33
jsonpointer==3.0.0
kubernetes==31.0.0
langchain==0.3.9
langchain-chroma==0.1.4
langchain-community==0.3.8
langchain-core==0.3.21
langchain-openai==0.2.10
langchain-text-splitters==0.3.2
langsmith==0.1.147
markdown-it-py==3.0.0
marshmallow==3.23.1
mdurl==0.1.2
mmh3==5.0.1
monotonic==1.6
mpmath==1.3.0
multidict==6.1.0
mypy-extensions==1.0.0
numpy==1.26.4
oauthlib==3.2.2
onnxruntime==1.20.1
openai==1.55.3
opentelemetry-api==1.28.2
opentelemetry-exporter-otlp-proto-common==1.28.2
opentelemetry-exporter-otlp-proto-grpc==1.28.2
opentelemetry-instrumentation==0.49b2
opentelemetry-instrumentation-asgi==0.49b2
opentelemetry-instrumentation-fastapi==0.49b2
opentelemetry-proto==1.28.2
opentelemetry-sdk==1.28.2
opentelemetry-semantic-conventions==0.49b2
opentelemetry-util-http==0.49b2
orjson==3.10.12
overrides==7.7.0
packaging==24.2
posthog==3.7.4
propcache==0.2.1
protobuf==5.29.0
pyasn1==0.6.1
pyasn1_modules==0.4.1
pydantic==2.10.2
pydantic-settings==2.6.1
pydantic_core==2.27.1
Pygments==2.18.0
PyPika==0.48.9
pyproject_hooks==1.2.0
pyreadline3==3.5.4
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
rich==13.9.4
rsa==4.9
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
SQLAlchemy==2.0.35
starlette==0.41.3
sympy==1.13.3
tenacity==9.0.0
tiktoken==0.8.0
tokenizers==0.21.0
tqdm==4.67.1
typer==0.14.0
typing-inspect==0.9.0
typing_extensions==4.12.2
urllib3==2.2.3
uvicorn==0.32.1
watchfiles==1.0.0
websocket-client==1.8.0
websockets==14.1
wrapt==1.17.0
yarl==1.18.3
zipp==3.21.0

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

aiohappyeyeballs==2.4.4

aiohttp==3.11.9

aiosignal==1.3.1

annotated-types==0.7.0

anyio==4.6.2.post1

asgiref==3.8.1

attrs==24.2.0

backoff==2.2.1

bcrypt==4.2.1

beautifulsoup4==4.12.3

bs4==0.0.2

build==1.2.2.post1

cachetools==5.5.0

certifi==2024.8.30

charset-normalizer==3.4.0

chroma-hnswlib==0.7.6

chromadb==0.5.20

click==8.1.7

colorama==0.4.6

coloredlogs==15.0.1

dataclasses-json==0.6.7

Deprecated==1.2.15

distro==1.9.0

durationpy==0.9

fastapi==0.115.5

filelock==3.16.1

flatbuffers==24.3.25

frozenlist==1.5.0

fsspec==2024.10.0

google-auth==2.36.0

googleapis-common-protos==1.66.0

greenlet==3.1.1

grpcio==1.68.1

h11==0.14.0

httpcore==1.0.7

httptools==0.6.4

httpx==0.28.0

httpx-sse==0.4.0

huggingface-hub==0.26.3

humanfriendly==10.0

idna==3.10

importlib_metadata==8.5.0

importlib_resources==6.4.5

jiter==0.8.0

jsonpatch==1.33

jsonpointer==3.0.0

kubernetes==31.0.0

langchain==0.3.9

langchain-chroma==0.1.4

langchain-community==0.3.8

langchain-core==0.3.21

langchain-openai==0.2.10

langchain-text-splitters==0.3.2

langsmith==0.1.147

markdown-it-py==3.0.0

marshmallow==3.23.1

mdurl==0.1.2

mmh3==5.0.1

monotonic==1.6

mpmath==1.3.0

multidict==6.1.0

mypy-extensions==1.0.0

numpy==1.26.4

oauthlib==3.2.2

onnxruntime==1.20.1

openai==1.55.3

opentelemetry-api==1.28.2

opentelemetry-exporter-otlp-proto-common==1.28.2

opentelemetry-exporter-otlp-proto-grpc==1.28.2

opentelemetry-instrumentation==0.49b2

opentelemetry-instrumentation-asgi==0.49b2

opentelemetry-instrumentation-fastapi==0.49b2

opentelemetry-proto==1.28.2

opentelemetry-sdk==1.28.2

opentelemetry-semantic-conventions==0.49b2

opentelemetry-util-http==0.49b2

orjson==3.10.12

overrides==7.7.0

packaging==24.2

posthog==3.7.4

propcache==0.2.1

protobuf==5.29.0

pyasn1==0.6.1

pyasn1_modules==0.4.1

pydantic==2.10.2

pydantic-settings==2.6.1

pydantic_core==2.27.1

Pygments==2.18.0

PyPika==0.48.9

pyproject_hooks==1.2.0

pyreadline3==3.5.4

python-dateutil==2.9.0.post0

python-dotenv==1.0.1

PyYAML==6.0.2

regex==2024.11.6

requests==2.32.3

requests-oauthlib==2.0.0

requests-toolbelt==1.0.0

rich==13.9.4

rsa==4.9

shellingham==1.5.4

six==1.16.0

sniffio==1.3.1

soupsieve==2.6

SQLAlchemy==2.0.35

starlette==0.41.3

sympy==1.13.3

tenacity==9.0.0

tiktoken==0.8.0

tokenizers==0.21.0

tqdm==4.67.1

typer==0.14.0

typing-inspect==0.9.0

typing_extensions==4.12.2

urllib3==2.2.3

uvicorn==0.32.1

watchfiles==1.0.0

websocket-client==1.8.0

websockets==14.1

wrapt==1.17.0

yarl==1.18.3

zipp==3.21.0

※ pip install python-dotenv langchain langchain-community langchain-chroma langchain-openai bs4 명령을 실행했다.