[PYTHON/LANGCHAIN] FAISS 클래스 : as_retriever 메소드를 사용해 벡터 스토어 검색기 구하기
■ FAISS 클래스의 as_retriever 메소드를 사용해 벡터 스토어 검색기를 구하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ 예제
■ FAISS 클래스의 as_retriever 메소드를 사용해 벡터 스토어 검색기를 구하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ 예제
■ Chroma 클래스의 similarity_search_by_vector 메소드를 사용해 검색 문자열의 벡터 리스트로 검색 결과 문서 리스트를 구하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은
■ FAISS 클래스의 similarity_search 메소드를 사용해 문자열 쿼리 유사성 기반으로 문서 리스트를 구하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에
■ from_documents 함수를 사용해 메모리 내 벡터 저장소를 위한 FAISS 객체를 만드는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.
■ RunnableSequence 클래스의 get_prompts 메소드를 사용해 체인에서 사용하는 프롬프트 템플리트 리스트를 구하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.
■ Graph 클래스의 print_ascii 메소드를 사용해 그래프를 인쇄하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
from dotenv import load_dotenv from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser load_dotenv() faiss = FAISS.from_texts(["harrison worked at kensho"], embedding = OpenAIEmbeddings()) vectorStoreRetriever = faiss.as_retriever() chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) chatOpenAI = ChatOpenAI() runnableSequence = {"context" : vectorStoreRetriever, "question" : RunnablePassthrough()} | chatPromptTemplate | chatOpenAI | StrOutputParser() graph = runnableSequence.get_graph() graph.print_ascii() """ +---------------------------------+ | Parallel<context,question>Input | +---------------------------------+ ** ** *** *** ** ** +----------------------+ +-------------+ | VectorStoreRetriever | | Passthrough | +----------------------+ +-------------+ ** ** *** *** ** ** +----------------------------------+ | Parallel<context,question>Output | +----------------------------------+ * * * +--------------------+ | ChatPromptTemplate | +--------------------+ * * * +------------+ | ChatOpenAI | +------------+ * * * +-----------------+ | StrOutputParser | +-----------------+ * * * +-----------------------+ | StrOutputParserOutput | +-----------------------+ """ |
▶
■ RunnableSequence 클래스의 get_graph 메소드를 사용해 Graph 객체를 구하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
from dotenv import load_dotenv from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser load_dotenv() faiss = FAISS.from_texts(["harrison worked at kensho"], embedding = OpenAIEmbeddings()) vectorStoreRetriever = faiss.as_retriever() chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) chatOpenAI = ChatOpenAI() runnableSequence = {"context" : vectorStoreRetriever, "question" : RunnablePassthrough()} | chatPromptTemplate | chatOpenAI | StrOutputParser() graph = runnableSequence.get_graph() |
■ RunnablePassthrough 클래스를 사용해 검색기와 함께 사용자 질문을 채팅 프롬프트 템플리트에 전달하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.
■ RunnableParallel 클래스에서 itemgetter 객체를 사용해 딕셔너리에서 데이터를 추출하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
from operator import itemgetter from dotenv import load_dotenv from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser load_dotenv() faiss = FAISS.from_texts(["harrison worked at kensho"], embedding = OpenAIEmbeddings()) vectorStoreRetriever = faiss.as_retriever() chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} Answer in the following language : {language} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) chatOpenAI = ChatOpenAI() chain = ( # { ... } 부분은 RunnableParallel 클래스로 자동 변환된다. {"context" : itemgetter("question") | vectorStoreRetriever, "question" : itemgetter("question"), "language" : itemgetter("language")} | chatPromptTemplate | chatOpenAI | StrOutputParser() ) responseString = chain.invoke({"question" : "where did harrison work", "language" : "korean"}) print(responseString) """ 헤리슨은 켄쇼에서 일했습니다. """ |
■ RunnableSequence 클래스의 astream_events 메소드에서 마지막 비스트리밍 단계 이후 비동기 스트리밍 이벤트를 수신하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
import asyncio import os from langchain_openai import ChatOpenAI from langchain_core.output_parsers import JsonOutputParser async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") jsonOutputParser = JsonOutputParser() def getCountryNameList(inputDictionary): """입력 스트림에서 작동하지 않고 스트리밍을 중단하는 함수이다.""" if not isinstance(inputDictionary, dict): return "" if "countries" not in inputDictionary: return "" countryList = inputDictionary["countries"] if not isinstance(countryList, list): return "" countryNameList = [country.get("name") for country in countryList if isinstance(country, dict)] return countryNameList runnableSequence = chatOpenAI | jsonOutputParser | getCountryNameList requestString = """output a list of the countries france, spain and japan and their populations in JSON format. \ Use a dict with an outer key of "countries" which contains a list of countries. \ Each country should have the key `name` and `population`""" async for chunkList in runnableSequence.astream(requestString): print(chunkList, flush = True) eventCount = 0 async for eventDictionary in runnableSequence.astream_events(requestString, version = "v2"): kind = eventDictionary["event"] if kind == "on_chat_model_stream": content = eventDictionary["data"]["chunk"].content print(f"Chat model chunk : {content}", flush = True) if kind == "on_parser_stream": text = eventDictionary["data"]["chunk"] print(f"Parser chunk : {text}", flush = True) eventCount += 1 if eventCount > 30: print("...") break asyncio.run(main()) """ Chat model chunk : Chat model chunk : { Parser chunk : {} Chat model chunk : Chat model chunk : " Chat model chunk : countries Chat model chunk : ": Chat model chunk : [ Parser chunk : {'countries': []} Chat model chunk : Chat model chunk : { Parser chunk : {'countries': [{}]} Chat model chunk : Chat model chunk : " Chat model chunk : name Chat model chunk : ": Chat model chunk : " Parser chunk : {'countries': [{'name': ''}]} Chat model chunk : France Parser chunk : {'countries': [{'name': 'France'}]} Chat model chunk : ", Chat model chunk : Chat model chunk : " Chat model chunk : population Chat model chunk : ": Chat model chunk : Chat model chunk : 652 ... """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.5 langchain-community==0.2.5 langchain-core==0.2.8 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.79 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.34.0 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.4.1 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
※
■ RunnableSequence 클래스의 astream_events 메소드에서 include_tags 인자를 사용해 이벤트를 필터링하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
import asyncio import os from langchain_core.prompts import ChatPromptTemplate from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) faiss = FAISS.from_texts( ["harrison worked at kensho", "harrison likes spicy food"], embedding = OpenAIEmbeddings(), ) vectorStoreRetriever = faiss.as_retriever() chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") runnableSequence = ( { "context" : vectorStoreRetriever.with_config(run_name = "vectorStoreRetriever"), "question" : RunnablePassthrough() } | chatPromptTemplate | (chatOpenAI.with_config(run_name = "chatOpenAI") | StrOutputParser().with_config(run_name = "strOutputParser")).with_config({"tags" : ["chatOpenAI_strOutputParser"]}) ) eventCount = 0 async for eventDictionary in runnableSequence.astream_events( "Where did harrison work? Write 3 made up sentences about this place.", include_tags = ["chatOpenAI_strOutputParser"], version = "v2"): print(eventDictionary) eventCount += 1 if eventCount > 10: print("...") break asyncio.run(main()) """ {'event': 'on_chain_start', 'data': {'input': 'Where did harrison work? Write 3 made up sentences about this place.'}, 'name': 'RunnableSequence', 'tags': ['seq:step:3', 'chatOpenAI_strOutputParser'], 'run_id': '8933aa37-93c0-4c63-ae77-1c568c2f5a7a', 'metadata': {}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2']} {'event': 'on_chat_model_start', 'data': {'input': {'messages': [[HumanMessage(content="Answer the question based only on the following context :\n [Document(page_content='harrison worked at kensho'), Document(page_content='harrison likes spicy food')]\n\n Question : Where did harrison work? Write 3 made up sentences about this place.\n ")]]}}, 'name': 'chatOpenAI', 'tags': ['seq:step:1', 'chatOpenAI_strOutputParser'], 'run_id': '35559cbb-163c-46cf-b278-fbda9d987add', 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='', id='run-35559cbb-163c-46cf-b278-fbda9d987add')}, 'run_id': '35559cbb-163c-46cf-b278-fbda9d987add', 'name': 'chatOpenAI', 'tags': ['seq:step:1', 'chatOpenAI_strOutputParser'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} {'event': 'on_parser_start', 'data': {}, 'name': 'strOutputParser', 'tags': ['seq:step:2', 'chatOpenAI_strOutputParser'], 'run_id': '7c5ca72a-9681-49a0-b33d-23dc26a1e245', 'metadata': {}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} {'event': 'on_parser_stream', 'run_id': '7c5ca72a-9681-49a0-b33d-23dc26a1e245', 'name': 'strOutputParser', 'tags': ['seq:step:2', 'chatOpenAI_strOutputParser'], 'metadata': {}, 'data': {'chunk': ''}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} {'event': 'on_chain_stream', 'run_id': '8933aa37-93c0-4c63-ae77-1c568c2f5a7a', 'name': 'RunnableSequence', 'tags': ['seq:step:3', 'chatOpenAI_strOutputParser'], 'metadata': {}, 'data': {'chunk': ''}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='H', id='run-35559cbb-163c-46cf-b278-fbda9d987add')}, 'run_id': '35559cbb-163c-46cf-b278-fbda9d987add', 'name': 'chatOpenAI', 'tags': ['seq:step:1', 'chatOpenAI_strOutputParser'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} {'event': 'on_parser_stream', 'run_id': '7c5ca72a-9681-49a0-b33d-23dc26a1e245', 'name': 'strOutputParser', 'tags': ['seq:step:2', 'chatOpenAI_strOutputParser'], 'metadata': {}, 'data': {'chunk': 'H'}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} {'event': 'on_chain_stream', 'run_id': '8933aa37-93c0-4c63-ae77-1c568c2f5a7a', 'name': 'RunnableSequence', 'tags': ['seq:step:3', 'chatOpenAI_strOutputParser'], 'metadata': {}, 'data': {'chunk': 'H'}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='arrison', id='run-35559cbb-163c-46cf-b278-fbda9d987add')}, 'run_id': '35559cbb-163c-46cf-b278-fbda9d987add', 'name': 'chatOpenAI', 'tags': ['seq:step:1', 'chatOpenAI_strOutputParser'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} {'event': 'on_parser_stream', 'run_id': '7c5ca72a-9681-49a0-b33d-23dc26a1e245', 'name': 'strOutputParser', 'tags': ['seq:step:2', 'chatOpenAI_strOutputParser'], 'metadata': {}, 'data': {'chunk': 'arrison'}, 'parent_ids': ['bf69804f-1597-4a2a-9ad8-faaff7c767c2', '8933aa37-93c0-4c63-ae77-1c568c2f5a7a']} ... """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.5 langchain-community==0.2.5 langchain-core==0.2.8 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.79 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.34.0 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.4.1 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
※ pip install langchain
■ RunnableSequence 클래스의 astream_events 메소드에서 include_types 인자를 사용해 이벤트를 필터링하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
import asyncio import os from langchain_core.prompts import ChatPromptTemplate from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) faiss = FAISS.from_texts( ["harrison worked at kensho", "harrison likes spicy food"], embedding = OpenAIEmbeddings(), ) vectorStoreRetriever = faiss.as_retriever() chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") runnableSequence = ( { "context" : vectorStoreRetriever.with_config(run_name = "vectorStoreRetriever"), "question" : RunnablePassthrough() } | chatPromptTemplate | chatOpenAI.with_config(run_name = "chatOpenAI") | StrOutputParser().with_config(run_name = "strOutputParser") ) eventCount = 0 async for eventDictionary in runnableSequence.astream_events( "Where did harrison work? Write 3 made up sentences about this place.", include_types = ["chat_model"], version = "v2"): print(eventDictionary) eventCount += 1 if eventCount > 10: print("...") break asyncio.run(main()) """ {'event': 'on_chat_model_start', 'data': {'input': 'Where did harrison work? Write 3 made up sentences about this place.'}, 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='H', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='arrison', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' worked', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' at', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' Kens', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='ho', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=',', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' a', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content=' bustling', id='run-ba5d2b4e-a267-407a-b028-c42ccbcb9d63')}, 'run_id': 'ba5d2b4e-a267-407a-b028-c42ccbcb9d63', 'name': 'chatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['be3b9431-47a9-40fd-ac10-4c6dcc29a0d4']} ... """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.5 langchain-community==0.2.5 langchain-core==0.2.8 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.79 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.34.0 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.4.1 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
※ pip install langchain
■ RunnableSequence 클래스의 astream_events 메소드에서 include_names 인자를 사용해 이벤트를 필터링하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
import asyncio import os from langchain_core.prompts import ChatPromptTemplate from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) faiss = FAISS.from_texts( ["harrison worked at kensho", "harrison likes spicy food"], embedding = OpenAIEmbeddings(), ) vectorStoreRetriever = faiss.as_retriever() chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") runnableSequence = ( { "context" : vectorStoreRetriever.with_config(run_name = "vectorStoreRetriever"), "question" : RunnablePassthrough() } | chatPromptTemplate | chatOpenAI.with_config(run_name = "chatOpenAI") | StrOutputParser().with_config(run_name = "strOutputParser") ) eventCount = 0 async for eventDictionary in runnableSequence.astream_events( "Where did harrison work? Write 3 made up sentences about this place.", include_names = ["strOutputParser"], version = "v2"): print(eventDictionary) eventCount += 1 if eventCount > 10: print("...") break asyncio.run(main()) """ {'event': 'on_parser_start', 'data': {'input': 'Where did harrison work? Write 3 made up sentences about this place.'}, 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'metadata': {}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ''}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': 'H'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': 'arrison'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ' worked'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ' at'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ' a'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ' popular'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ' restaurant'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} {'event': 'on_parser_stream', 'run_id': '99be2832-6fb2-4482-af73-93bb583db3df', 'name': 'strOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ' called'}, 'parent_ids': ['735f3142-4099-43ca-bd0e-c141bd5ab020']} ... """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.5 langchain-community==0.2.5 langchain-core==0.2.8 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.79 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.34.0 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.4.1 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
※ pip install langchain
■ RunnableSequence 클래스의 astream_events 메소드를 사용해 비동기 스트리밍 이벤트 수신시 이벤트 종류에 따라 처리하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
import asyncio import os from langchain_core.prompts import ChatPromptTemplate from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) faiss = FAISS.from_texts( ["harrison worked at kensho", "harrison likes spicy food"], embedding = OpenAIEmbeddings(), ) vectorStoreRetriever = faiss.as_retriever() chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") runnableSequence = ( { "context" : vectorStoreRetriever.with_config(run_name = "vectorStoreRetriever"), "question" : RunnablePassthrough() } | chatPromptTemplate | chatOpenAI | StrOutputParser() ) eventCount = 0 async for eventDictionary in runnableSequence.astream_events("Where did harrison work? Write 3 made up sentences about this place.", version = "v2"): kind = eventDictionary["event"] if kind == "on_chat_model_stream": content = eventDictionary["data"]["chunk"].content print(f"Chat model chunk : {content}", flush = True) if kind == "on_parser_stream": text = eventDictionary["data"]["chunk"] print(f"Parser chunk : {text}", flush = True) eventCount += 1 if eventCount > 30: print("...") break asyncio.run(main()) """ Chat model chunk : Parser chunk : Chat model chunk : H Parser chunk : H Chat model chunk : arrison Parser chunk : arrison Chat model chunk : worked Parser chunk : worked Chat model chunk : at Parser chunk : at Chat model chunk : a Parser chunk : a ... """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.5 langchain-community==0.2.5 langchain-core==0.2.8 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.79 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.34.0 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.4.1 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
■ RunnableSequence 클래스의 astream_events 메소드를 사용해 비동기 스트리밍 이벤트를 수신하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
import asyncio import os from langchain_core.prompts import ChatPromptTemplate from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS from langchain_core.runnables import RunnablePassthrough from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" chatPromptTemplateString = """Answer the question based only on the following context : {context} Question : {question} """ chatPromptTemplate = ChatPromptTemplate.from_template(chatPromptTemplateString) faiss = FAISS.from_texts( ["harrison worked at kensho", "harrison likes spicy food"], embedding = OpenAIEmbeddings(), ) vectorStoreRetriever = faiss.as_retriever() chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") runnableSequence = ( { "context" : vectorStoreRetriever.with_config(run_name = "vectorStoreRetriever"), "question" : RunnablePassthrough() } | chatPromptTemplate | chatOpenAI | StrOutputParser() ) async for eventDictionary in runnableSequence.astream_events("Where did harrison work? Write 3 made up sentences about this place.", version = "v2"): print(eventDictionary) asyncio.run(main()) """ {'event': 'on_chain_start', 'data': {'input': 'Where did harrison work? Write 3 made up sentences about this place.'}, 'name': 'RunnableSequence', 'tags': [], 'run_id': 'd897f470-051b-41be-953a-832c5cc06611', 'metadata': {}, 'parent_ids': []} {'event': 'on_chain_start', 'data': {}, 'name': 'RunnableParallel<context,question>', 'tags': ['seq:step:1'], 'run_id': '98a359fb-0a7b-455e-901a-819e5db32dba', 'metadata': {}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_retriever_start', 'data': {'input': {'query': 'Where did harrison work? Write 3 made up sentences about this place.'}}, 'name': 'Docs', 'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'], 'run_id': '892860bc-24a2-41d2-adc8-1572931c2c8c', 'metadata': {}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611', '98a359fb-0a7b-455e-901a-819e5db32dba']} {'event': 'on_chain_start', 'data': {}, 'name': 'RunnablePassthrough', 'tags': ['map:key:question'], 'run_id': '5343e2d7-1d30-44c8-b599-bf3a11170ffc', 'metadata': {}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611', '98a359fb-0a7b-455e-901a-819e5db32dba']} {'event': 'on_chain_stream', 'run_id': '5343e2d7-1d30-44c8-b599-bf3a11170ffc', 'name': 'RunnablePassthrough', 'tags': ['map:key:question'], 'metadata': {}, 'data': {'chunk': 'Where did harrison work? Write 3 made up sentences about this place.'}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611', '98a359fb-0a7b-455e-901a-819e5db32dba']} {'event': 'on_chain_stream', 'run_id': '98a359fb-0a7b-455e-901a-819e5db32dba', 'name': 'RunnableParallel<context,question>', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': {'question': 'Where did harrison work? Write 3 made up sentences about this place.'}}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_chain_end', 'data': {'output': 'Where did harrison work? Write 3 made up sentences about this place.', 'input': 'Where did harrison work? Write 3 made up sentences about this place.'}, 'run_id': '5343e2d7-1d30-44c8-b599-bf3a11170ffc', 'name': 'RunnablePassthrough', 'tags': ['map:key:question'], 'metadata': {}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611', '98a359fb-0a7b-455e-901a-819e5db32dba']} {'event': 'on_retriever_end', 'data': {'output': [Document(page_content='harrison worked at kensho'), Document(page_content='harrison likes spicy food')], 'input': {'query': 'Where did harrison work? Write 3 made up sentences about this place.'}}, 'run_id': '892860bc-24a2-41d2-adc8-1572931c2c8c', 'name': 'Docs', 'tags': ['map:key:context', 'FAISS', 'OpenAIEmbeddings'], 'metadata': {}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611', '98a359fb-0a7b-455e-901a-819e5db32dba']} {'event': 'on_chain_stream', 'run_id': '98a359fb-0a7b-455e-901a-819e5db32dba', 'name': 'RunnableParallel<context,question>', 'tags': ['seq:step:1'], 'metadata': {}, 'data': {'chunk': {'context': [Document(page_content='harrison worked at kensho'), Document(page_content='harrison likes spicy food')]}}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} ... {'event': 'on_chain_stream', 'run_id': 'd897f470-051b-41be-953a-832c5cc06611', 'name': 'RunnableSequence', 'tags': [], 'metadata': {}, 'data': {'chunk': ' customers'}, 'parent_ids': []} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='.', id='run-8fecc191-56e5-4860-b876-6da901a96e11')}, 'run_id': '8fecc191-56e5-4860-b876-6da901a96e11', 'name': 'ChatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_parser_stream', 'run_id': '6959f904-9b48-42d4-8859-d9c1ae0c0b2e', 'name': 'StrOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': '.'}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_chain_stream', 'run_id': 'd897f470-051b-41be-953a-832c5cc06611', 'name': 'RunnableSequence', 'tags': [], 'metadata': {}, 'data': {'chunk': '.'}, 'parent_ids': []} {'event': 'on_chat_model_stream', 'data': {'chunk': AIMessageChunk(content='', response_metadata={'finish_reason': 'stop'}, id='run-8fecc191-56e5-4860-b876-6da901a96e11')}, 'run_id': '8fecc191-56e5-4860-b876-6da901a96e11', 'name': 'ChatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_parser_stream', 'run_id': '6959f904-9b48-42d4-8859-d9c1ae0c0b2e', 'name': 'StrOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'data': {'chunk': ''}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_chain_stream', 'run_id': 'd897f470-051b-41be-953a-832c5cc06611', 'name': 'RunnableSequence', 'tags': [], 'metadata': {}, 'data': {'chunk': ''}, 'parent_ids': []} {'event': 'on_chat_model_end', 'data': {'output': AIMessageChunk(content='Harrison worked at Kensho, a trendy restaurant known for its fusion cuisine. The atmosphere at Kensho is lively and welcoming, with a modern decor that attracts a hip crowd. The menu at Kensho features a variety of dishes from around the world, all prepared with a spicy twist to cater to the adventurous palate of its customers.', response_metadata={'finish_reason': 'stop'}, id='run-8fecc191-56e5-4860-b876-6da901a96e11'), 'input': {'messages': [[HumanMessage(content="Answer the question based only on the following context :\n [Document(page_content='harrison worked at kensho'), Document(page_content='harrison likes spicy food')]\n\n Question : Where did harrison work? Write 3 made up sentences about this place.\n ")]]}}, 'run_id': '8fecc191-56e5-4860-b876-6da901a96e11', 'name': 'ChatOpenAI', 'tags': ['seq:step:3'], 'metadata': {'ls_provider': 'openai', 'ls_model_name': 'gpt-3.5-turbo-0125', 'ls_model_type': 'chat', 'ls_temperature': 0.7}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_parser_end', 'data': {'output': 'Harrison worked at Kensho, a trendy restaurant known for its fusion cuisine. The atmosphere at Kensho is lively and welcoming, with a modern decor that attracts a hip crowd. The menu at Kensho features a variety of dishes from around the world, all prepared with a spicy twist to cater to the adventurous palate of its customers.', 'input': AIMessageChunk(content='Harrison worked at Kensho, a trendy restaurant known for its fusion cuisine. The atmosphere at Kensho is lively and welcoming, with a modern decor that attracts a hip crowd. The menu at Kensho features a variety of dishes from around the world, all prepared with a spicy twist to cater to the adventurous palate of its customers.', response_metadata={'finish_reason': 'stop'}, id='run-8fecc191-56e5-4860-b876-6da901a96e11')}, 'run_id': '6959f904-9b48-42d4-8859-d9c1ae0c0b2e', 'name': 'StrOutputParser', 'tags': ['seq:step:4'], 'metadata': {}, 'parent_ids': ['d897f470-051b-41be-953a-832c5cc06611']} {'event': 'on_chain_end', 'data': {'output': 'Harrison worked at Kensho, a trendy restaurant known for its fusion cuisine. The atmosphere at Kensho is lively and welcoming, with a modern decor that attracts a hip crowd. The menu at Kensho features a variety of dishes from around the world, all prepared with a spicy twist to cater to the adventurous palate of its customers.'}, 'run_id': 'd897f470-051b-41be-953a-832c5cc06611', 'name': 'RunnableSequence', 'tags': [], 'metadata': {}, 'parent_ids': []} """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.5 langchain-community==0.2.5 langchain-core==0.2.8 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.79 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.34.0 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.4.1 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
※ pip install langchain
■ VectorStoreRetriever 클래스 : 사용자 질문에 따라 벡터 저장소에서 문서 자동 검색하기 ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_huggingface import HuggingFaceEmbeddings from langchain_chroma import Chroma from langchain_community.llms import LlamaCpp from langchain import hub from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables import RunnablePassthrough # 웹 문서를 로드한다. webBaseLoader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/") documentList = webBaseLoader.load() # 문서를 분할한다. recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0) splitDocumentList = recursiveCharacterTextSplitter.split_documents(documentList) # 임베딩 생성 및 유사도 검색을 위한 벡터 저장소를 설정한다. chroma = Chroma.from_documents(documents = splitDocumentList, embedding = HuggingFaceEmbeddings()) # 벡터 저장소 검색기를 설정한다. vectorStoreRetriever = chroma.as_retriever() # LLM 모델을 설정한다. gpuLayerCount = 1 batchSize = 512 # 1에서 n_ctx 사이이어야 한다. llamaCpp = LlamaCpp( model_path = "./llama-2-13b-chat.Q4_0.gguf", # llama-2-13b-chat.Q4_0.gguf 파일 경로를 설정한다. n_gpu_layers = gpuLayerCount, n_batch = batchSize, n_ctx = 2048, f16_kv = True, # 반드시 True로 설정해야 한다. 그렇지 않으면 몇 번의 호출 후에 문제가 발생하게 된다. verbose = False ) # 프롬프트 템플리트를 설정한다. promptTemplate = hub.pull("rlm/rag-prompt") # 문서 페이지 컨텐츠를 병합하는 함수를 정의한다. def mergeDocumentPageContent(documentList): return "\n\n".join(doc.page_content for doc in documentList) # 실행 체인을 설정한다. runnableSequence = ( {"context": vectorStoreRetriever | mergeDocumentPageContent, "question" : RunnablePassthrough()} | promptTemplate | llamaCpp | StrOutputParser() ) # 질의 응답을 실행한다. resultString = runnableSequence.invoke("What are the approaches to Task Decomposition?") print(resultString) """ The approaches to task decomposition include breaking down large tasks into smaller subgoals, using task-specific instructions, and using human inputs. """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 beautifulsoup4==4.12.3 bs4==0.0.2 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 dataclasses-json==0.6.7 Deprecated==1.2.14 diskcache==5.6.3 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.15.1 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 joblib==1.4.2 jsonpatch==1.33 jsonpointer==3.0.0 kubernetes==30.1.0 langchain==0.2.4 langchain-chroma==0.1.1 langchain-community==0.2.4 langchain-core==0.2.6 langchain-huggingface==0.0.3 langchain-text-splitters==0.2.1 langsmith==0.1.77 llama_cpp_python==0.2.78 markdown-it-py==3.0.0 MarkupSafe==2.1.5 marshmallow==3.21.3 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 mypy-extensions==1.0.0 networkx==3.3 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.5.40 nvidia-nvtx-cu12==12.1.105 oauthlib==3.2.2 onnxruntime==1.18.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.5 overrides==7.7.0 packaging==24.1 pillow==10.3.0 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.4 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 safetensors==0.4.3 scikit-learn==1.5.0 scipy==1.13.1 sentence-transformers==3.0.1 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 threadpoolctl==3.5.0 tokenizers==0.19.1 tomli==2.0.1 torch==2.3.1 tqdm==4.66.4 transformers==4.41.2 triton==2.3.1 typer==0.12.3 typing-inspect==0.9.0 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※ pip install langchain
■ create_retriever_tool 함수를 사용해 FAISS 벡터 스토어 검색 도구를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
import ast import os import re from langchain_community.utilities import SQLDatabase from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain.agents.agent_toolkits import create_retriever_tool os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" # SQLITE 데이터베이스를 초기화한다. sqlDatabase = SQLDatabase.from_uri("sqlite:///chinook.db") # SQLite 데이터를 가져오는 함수를 정의한다. def getSQLiteData(sqlDatabase, sql): resultString = sqlDatabase.run(sql) resultTupleList = ast.literal_eval(resultString) resultList1 = [element for resultTuple in resultTupleList for element in resultTuple if element] resultList2 = [re.sub(r"\b\d+\b", "", result).strip() for result in resultList1] return list(set(resultList2)) # 아티스트명, 앨범 타이틀 리스트를 구한다. artistNameList = getSQLiteData(sqlDatabase, "SELECT Name FROM artists") albumTitleList = getSQLiteData(sqlDatabase, "SELECT Title FROM albums" ) # FAISS 벡터 스토어를 생성한다: faiss = FAISS.from_texts(artistNameList + albumTitleList, OpenAIEmbeddings()) # FAISS 벡터 스토어 검색기를 설정한다. faissVectorStoreRetriever = faiss.as_retriever(search_kwargs = {"k" : 5}) # FAISS 벡터 스토어 검색기 도구를 생성한다. faissVectorStoreRetrieverTool = create_retriever_tool( faissVectorStoreRetriever, name = "search_proper_nouns", description = "Use to look up values to filter on. Input is an approximate spelling of the proper noun, output is valid proper nouns. Use the noun most similar to the search." ) # FAISS 벡터 스토어 검색기 도구를 실행한다. resultString = faissVectorStoreRetrieverTool.invoke("Alice Chains") print(resultString) """ Alice In Chains Alanis Morissette Pearl Jam Pearl Jam Audioslave """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.3 langchain-community==0.2.4 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.77 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.33.0 orjson==3.10.4 packaging==23.2 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.3.0 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.1 yarl==1.9.4 |
※ pip install langchain
■ FAISS 클래스의 as_retriever 메소드에서 search_kwargs 인자를 사용해 벡터 스토어 검색기를 구하는 방법을 보여준다. ※ 검색기가 반환하는 문서 수를 제한할 수도 있다.
■ from_texts 함수를 사용해 FAISS 벡터 스토어를 생성하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
import ast import os import re from langchain_community.utilities import SQLDatabase from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" # SQLITE 데이터베이스를 초기화한다. sqlDatabase = SQLDatabase.from_uri("sqlite:///chinook.db") # SQLite 데이터를 가져오는 함수를 정의한다. def getSQLiteData(sqlDatabase, sql): resultString = sqlDatabase.run(sql) resultTupleList = ast.literal_eval(resultString) resultList1 = [element for resultTuple in resultTupleList for element in resultTuple if element] resultList2 = [re.sub(r"\b\d+\b", "", result).strip() for result in resultList1] return list(set(resultList2)) # 아티스트명, 앨범 타이틀 리스트를 구한다. artistNameList = getSQLiteData(sqlDatabase, "SELECT Name FROM artists") albumTitleList = getSQLiteData(sqlDatabase, "SELECT Title FROM albums" ) # FAISS 벡터 스토어를 생성한다: faiss = FAISS.from_texts(artistNameList + albumTitleList, OpenAIEmbeddings()) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.3 langchain-community==0.2.4 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.77 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.33.0 orjson==3.10.4 packaging==23.2 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.3.0 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.1 yarl==1.9.4 |
※ pip install langchain langchain-community langchain-openai
■ Chroma 클래스의 as_retriever 메소드를 사용해 VectorStoreRetriever 객체를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import os import bs4 from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" webBaseLoader = WebBaseLoader( web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs = dict(parse_only = bs4.SoupStrainer(class_ = ("post-content", "post-title", "post-header"))) ) documentList = webBaseLoader.load() recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200) splitDocumentList = recursiveCharacterTextSplitter.split_documents(documentList) chroma = Chroma.from_documents(documents = splitDocumentList, embedding = OpenAIEmbeddings()) vectorStoreRetriever = chroma.as_retriever() |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 beautifulsoup4==4.12.3 bs4==0.0.2 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 dataclasses-json==0.6.7 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==3.0.0 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-community==0.2.4 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.77 markdown-it-py==3.0.0 MarkupSafe==2.1.5 marshmallow==3.21.3 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.4 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing-inspect==0.9.0 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※ pip install langchain langchain-chroma
■ Chroma 클래스의 as_retriever 메소드를 사용해 VectorStoreRetriever 객체를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( documentList, embedding = OpenAIEmbeddings(), ) vectorStoreRetriever = chroma.as_retriever( search_type = "similarity", # "similarity"(디폴트), "mmr" (maximum marginal relevance, described above), "similarity_score_threshold" search_kwargs = {"k" : 1}, ) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==2.4 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.75 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.3 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※ pip install langchain langchain-chroma
■ Chroma 클래스의 similarity_search_by_vector 메소드를 사용해 검색 문자열의 벡터 리스트로 검색 결과 문서 리스트를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma.from_documents( documentList, embedding = openAIEmbeddings, ) catEnbeddingList = openAIEmbeddings.embed_query("cat") # catEnbeddingList 항목 수 : 1536개 searchResultList = chroma.similarity_search_by_vector(catEnbeddingList) print(searchResultList) """ [ Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ) ] """ |
▶ requirements.txt
■ Chroma 클래스의 similarity_search_with_score 메소드를 사용해 검색 문자열로 검색 결과 문서 리스트를 구하는 방법을 보여준다. (점수 포함) ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( documentList, embedding = OpenAIEmbeddings(), ) searchResultList = chroma.similarity_search_with_score("cat") print(searchResultList) """ [ ( Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), 0.375326931476593 ), ( Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), 0.4833090305328369 ), ( Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), 0.4958883225917816 ), ( Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ), 0.4974174499511719 ) ] """ |
▶ requirements.txt
■ Chroma 클래스의 asimilarity_search 메소드를 사용해 검색 문자열로 검색 결과 문서 리스트를 구하는 방법을 보여준다. (비동기) ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
import asyncio import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( documentList, embedding = OpenAIEmbeddings(), ) searchResultList = await chroma.asimilarity_search("cat") print(searchResultList) asyncio.run(main()) """ [ Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ) ] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==2.4 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.75 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.3 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
■ Chroma 클래스의 similarity_search 메소드를 사용해 검색 문자열로 검색 결과 문서 리스트를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( ) documentList, embedding = OpenAIEmbeddings(), searchResultList = chroma.similarity_search("cat") print(searchResultList) """ [ Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ) ] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==2.4 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.75 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.3 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※