■ create_history_aware_retriever 함수를 사용해 히스토리 기반 검색기를 생성하는 방법을 보여준다.
▶ main.py
import os import bs4 from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain.chains.combine_documents import create_stuff_documents_chain from langchain_core.prompts import MessagesPlaceholder from langchain.chains import create_history_aware_retriever # OPANAI API 키를 설정한다. os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" # OPENAI 챗봇 모델을 설정한다. chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") # 질문-응답 채팅 프롬프트 템플리트를 설정한다. questionAnswerChatPromptTemplate = ChatPromptTemplate.from_messages( [ ("system", "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, say that you don't know. Use three sentences maximum and keep the answer concise.\n\n{context}"), MessagesPlaceholder("chatHistoryList"), ("human", "{input}"), ] ) # 질문-응답 체인을 생성한다. questionAnswerRunnableBinding = create_stuff_documents_chain(chatOpenAI, questionAnswerChatPromptTemplate) # 웹 문서를 로딩한다. webBaseLoader = WebBaseLoader( web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs = dict(parse_only = bs4.SoupStrainer(class_ = ("post-content", "post-title", "post-header"))) ) documentList = webBaseLoader.load() # 문서를 분할한다. recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200) splitDocumentList = recursiveCharacterTextSplitter.split_documents(documentList) # 문서 임베딩 및 검색기를 설정한다. chroma = Chroma.from_documents(documents = splitDocumentList, embedding = OpenAIEmbeddings()) vectorStoreRetriever = chroma.as_retriever() # 질문 재구성 채팅 프롬프트 템플리트를 설정한다. contextualizeQuestionChatPromptTemplate = ChatPromptTemplate.from_messages( [ ("system", "Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if needed and otherwise return it as is."), MessagesPlaceholder("chatHistoryList"), ("human", "{input}"), ] ) # 히스토리 기반 검색기 체인을 생성한다. historyAwareRetrieverRunnableBinding = create_history_aware_retriever(chatOpenAI, vectorStoreRetriever, contextualizeQuestionChatPromptTemplate) |
▶ requirements.txt
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 beautifulsoup4==4.12.3 bs4==0.0.2 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 dataclasses-json==0.6.7 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==3.0.0 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-community==0.2.4 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.77 markdown-it-py==3.0.0 MarkupSafe==2.1.5 marshmallow==3.21.3 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.4 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing-inspect==0.9.0 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※ pip install langchain langchain-chroma langchain-community langchain-openai bs4 명령을 실행했다.