[PYTHON/LANGCHAIN] @tool 데코레이터 : parse_docstring 인자를 사용해 Google Style Docstring 파싱하기
■ @tool 데코레이터의 parse_docstring 인자를 사용해 Google Style Docstring을 파싱하는 방법을 보여준다. ※ 기본적으로 @tool(parse_docstring = True)는 docstring이 올바르게 파싱되지 않으면 ValueError를
■ @tool 데코레이터의 parse_docstring 인자를 사용해 Google Style Docstring을 파싱하는 방법을 보여준다. ※ 기본적으로 @tool(parse_docstring = True)는 docstring이 올바르게 파싱되지 않으면 ValueError를
■ BaseModel 클래스를 사용해 도구를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
from pydantic import BaseModel from pydantic import Field from langchain_core.tools import tool class CalculatorModel(BaseModel): a : int = Field(description = "first number" ) b : int = Field(description = "second number") @tool("multiplication-tool", args_schema = CalculatorModel, return_direct = True) def multiply(a : int, b : int) -> int: """Multiply two numbers.""" return a * b print(multiply.name) print(multiply.description) print(multiply.args) print(multiply.return_direct) """ multiplication-tool Multiply two numbers. {'a': {'description': 'first number', 'title': 'A', 'type': 'integer'}, 'b': {'description': 'second number', 'title': 'B', 'type': 'integer'}} True """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 frozenlist==1.4.1 greenlet==3.1.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.2 idna==3.10 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.3.0 langchain-core==0.3.1 langchain-text-splitters==0.3.0 langsmith==0.1.121 multidict==6.1.0 numpy==1.26.4 orjson==3.10.7 packaging==24.1 pydantic==2.9.2 pydantic_core==2.23.4 PyYAML==6.0.2 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.35 tenacity==8.5.0 typing_extensions==4.12.2 urllib3==2.2.3 yarl==1.11.1 |
※ pip install langchain pydantic 명령을 실행했다.
■ ModelMetaclass 클래스의 schema 메소드를 사용해 도구 스키마 딕셔너리를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
from langchain_core.tools import tool from typing import Annotated from typing import List @tool def multiply_by_max( a : Annotated[str , "scale factor" ], b : Annotated[List[int], "list of ints over which to take maximum"] ) -> int: """Multiply a by the maximum of b.""" return a * max(b) # multiply_by_max 함수는 StructuredTool 타입이다. modelMetaclass = multiply_by_max.args_schema dictionary = modelMetaclass.schema() print(dictionary) """ { 'description' : 'Multiply a by the maximum of b.', 'properties' : { 'a' : { 'description' : 'scale factor', 'title' : 'A', 'type' : 'string' }, 'b' : { 'description' : 'list of ints over which to take maximum', 'items' : {'type' : 'integer'}, 'title' : 'B', 'type' : 'array' } }, 'required' : ['a', 'b'], 'title' : 'multiply_by_max', 'type' : 'object' } """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 frozenlist==1.4.1 greenlet==3.1.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.2 idna==3.10 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.3.0 langchain-core==0.3.1 langchain-text-splitters==0.3.0 langsmith==0.1.121 multidict==6.1.0 numpy==1.26.4 orjson==3.10.7 packaging==24.1 pydantic==2.9.2 pydantic_core==2.23.4 PyYAML==6.0.2 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.35 tenacity==8.5.0 typing_extensions==4.12.2 urllib3==2.2.3 yarl==1.11.1 |
※ pip install langchain
■ StructuredTool 클래스의 args_schema 변수를 사용해 ModelMetaclass 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from langchain_core.tools import tool from typing import Annotated from typing import List @tool def multiply_by_max( a : Annotated[str , "scale factor" ], b : Annotated[List[int], "list of ints over which to take maximum"] ) -> int: """Multiply a by the maximum of b.""" return a * max(b) # multiply_by_max 함수는 StructuredTool 타입이다. modelMetaclass = multiply_by_max.args_schema |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 frozenlist==1.4.1 greenlet==3.1.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.2 idna==3.10 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.3.0 langchain-core==0.3.1 langchain-text-splitters==0.3.0 langsmith==0.1.121 multidict==6.1.0 numpy==1.26.4 orjson==3.10.7 packaging==24.1 pydantic==2.9.2 pydantic_core==2.23.4 PyYAML==6.0.2 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.35 tenacity==8.5.0 typing_extensions==4.12.2 urllib3==2.2.3 yarl==1.11.1 |
※ pip install langchain 명령을
■ @tool 데코레이터를 사용해 비동기 함수로 도구를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
from langchain_core.tools import tool @tool async def multiplyAsync(a : int, b : int) -> int: """Multiply two numbers.""" return a * b print(multiplyAsync.name) print(multiplyAsync.description) print(multiplyAsync.args) """ multiplyAsync Multiply two numbers. {'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}} """ |
▶ requirements.txt
1 2 3 |
※ pip install langchain 명령을 실행했다.
■ @tool 데코레이터를 사용해 함수로 도구를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
from langchain_core.tools import tool @tool def multiply(a : int, b : int) -> int: """Multiply two numbers.""" return a * b print(multiply.name) print(multiply.description) print(multiply.args) """ multiply Multiply two numbers. {'a': {'title': 'A', 'type': 'integer'}, 'b': {'title': 'B', 'type': 'integer'}} """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 frozenlist==1.4.1 greenlet==3.1.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.2 idna==3.10 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.3.0 langchain-core==0.3.1 langchain-text-splitters==0.3.0 langsmith==0.1.121 multidict==6.1.0 numpy==1.26.4 orjson==3.10.7 packaging==24.1 pydantic==2.9.2 pydantic_core==2.23.4 PyYAML==6.0.2 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.35 tenacity==8.5.0 typing_extensions==4.12.2 urllib3==2.2.3 yarl==1.11.1 |
※ pip install langchain 명령을 실행했다.
■ TimeWeightedVectorStoreRetriever 클래스에서 mock_now 함수를 사용해 가상 시간을 설정하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
import faiss from langchain_openai import OpenAIEmbeddings from langchain_community.docstore import InMemoryDocstore from langchain_community.vectorstores import FAISS from langchain.retrievers import TimeWeightedVectorStoreRetriever from datetime import datetime from datetime import timedelta from langchain_core.documents import Document from langchain_core.utils import mock_now openAIEmbeddings = OpenAIEmbeddings() embeddingSize = 1536 indexFlatL2 = faiss.IndexFlatL2(embeddingSize) inMemoryDocstore = InMemoryDocstore({}) faiss = FAISS(openAIEmbeddings, indexFlatL2, inMemoryDocstore, {}) timeWeightedVectorStoreRetriever = TimeWeightedVectorStoreRetriever(vectorstore = faiss, decay_rate = 0.999, k = 1) yesterday = datetime.now() - timedelta(days = 1) timeWeightedVectorStoreRetriever.add_documents([Document(page_content = "hello world", metadata = {"last_accessed_at" : yesterday})]) timeWeightedVectorStoreRetriever.add_documents([Document(page_content = "hello foo")]) with mock_now(datetime(2024, 9, 16, 10, 11)): resultDocumentList = timeWeightedVectorStoreRetriever.get_relevant_documents("hello world") for resultDocument in resultDocumentList: print(resultDocument) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 colorama==0.4.6 dataclasses-json==0.6.7 distro==1.9.0 faiss-cpu==1.8.0.post1 frozenlist==1.4.1 greenlet==3.1.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.2 idna==3.10 jiter==0.5.0 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.3.0 langchain-community==0.3.0 langchain-core==0.3.0 langchain-openai==0.2.0 langchain-text-splitters==0.3.0 langsmith==0.1.121 marshmallow==3.22.0 multidict==6.1.0 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.45.1 orjson==3.10.7 packaging==24.1 pydantic==2.9.1 pydantic-settings==2.5.2 pydantic_core==2.23.3 python-dotenv==1.0.1 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.35 tenacity==8.5.0 tiktoken==0.7.0 tqdm==4.66.5 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.3 yarl==1.11.1 |
※ pip install langchain-community langchain-openai
■ TimeWeightedVectorStoreRetriever 클래스를 사용해 높은 감쇠율로 문서를 검색하는 방법을 보여준다. ※ 높은 감소율(예 : 9가 여러 개)에서는 최근성 점수가 빠르게 0으로 떨어진다!
■ TimeWeightedVectorStoreRetriever 클래스를 사용해 낮은 감쇠율로 문서를 검색하는 방법을 보여준다. ※ 낮은 감쇠율(여기서는 극단적으로 0에 가깝게 설정했다)은 기억이 더 오랫동안 "기억"된다는 것을
■ TimeWeightedVectorStoreRetriever 클래스의 생성자에서 vectorstore/decay_rate 인자를 사용해 TimeWeightedVectorStoreRetriever 객체를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import faiss from langchain_openai import OpenAIEmbeddings from langchain_community.docstore import InMemoryDocstore from langchain_community.vectorstores import FAISS from langchain.retrievers import TimeWeightedVectorStoreRetriever openAIEmbeddings = OpenAIEmbeddings() embeddingSize = 1536 indexFlatL2 = faiss.IndexFlatL2(embeddingSize) inMemoryDocstore = InMemoryDocstore({}) faiss = FAISS(openAIEmbeddings, indexFlatL2, inMemoryDocstore, {}) timeWeightedVectorStoreRetriever = TimeWeightedVectorStoreRetriever(vectorstore = faiss, decay_rate = 0.0000000000000000000000001, k = 1) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 attrs==24.2.0 certifi==2024.8.30 charset-normalizer==3.3.2 colorama==0.4.6 dataclasses-json==0.6.7 distro==1.9.0 faiss-cpu==1.8.0.post1 frozenlist==1.4.1 greenlet==3.1.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.2 idna==3.10 jiter==0.5.0 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.3.0 langchain-community==0.3.0 langchain-core==0.3.0 langchain-openai==0.2.0 langchain-text-splitters==0.3.0 langsmith==0.1.121 marshmallow==3.22.0 multidict==6.1.0 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.45.1 orjson==3.10.7 packaging==24.1 pydantic==2.9.1 pydantic-settings==2.5.2 pydantic_core==2.23.3 python-dotenv==1.0.1 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.35 tenacity==8.5.0 tiktoken==0.7.0 tqdm==4.66.5 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.3 yarl==1.11.1 |
※ pip install langchain-community
■ StructuredQueryOutputParser 클래스의 from_components 정적 메소드를 사용해 StructuredQueryOutputParser 객체를 만드는 방법을 보여준다. ▶ 예제 코드 (PY)
1 2 3 4 5 |
from langchain.chains.query_constructor.base import StructuredQueryOutputParser structuredQueryOutputParser = StructuredQueryOutputParser.from_components() |
※ pip install langchain 명령을
■ get_query_constructor_prompt 함수를 사용해 FewShotPromptTemplate 객체를 만드는 방법을 보여준다. ▶ 예제 코드 (PY)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
from langchain.chains.query_constructor.base import AttributeInfo attributeInfoList = [ AttributeInfo( name = "genre", description = "The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']", type = "string" ), AttributeInfo( name = "year", description = "The year the movie was released", type = "integer" ), AttributeInfo( name = "director", description = "The name of the movie director", type = "string" ), AttributeInfo( name = "rating", description = "A 1-10 rating for the movie", type = "float" ) ] documentContentDescription = "Brief summary of a movie" fewShotPromptTemplate = get_query_constructor_prompt( documentContentDescription, attributeInfoList ) |
※ pip install langchain 명령을 실행했다.
■ ChromaTranslator 클래스를 사용해 Chroma 벡더 데이터베이스 쿼리를 만드는 방법을 보여준다. ▶ 예제 코드 (PY)
1 2 3 4 5 |
from langchain_community.query_constructors.chroma import ChromaTranslator chromaTranslator = ChromaTranslator() |
※ pip install langchain-community 명령을 실행했다.
■ SelfQueryRetriever 클래스의 생성자에서 query_constructor 인자를 사용해 LCEL을 설정하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
from dotenv import load_dotenv from langchain_core.documents import Document from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.chains.query_constructor.base import AttributeInfo from langchain.chains.query_constructor.base import get_query_constructor_prompt from langchain.chains.query_constructor.base import StructuredQueryOutputParser from langchain_openai import ChatOpenAI from langchain.retrievers.self_query.base import SelfQueryRetriever from langchain_community.query_constructors.chroma import ChromaTranslator load_dotenv() documentList = [ Document( page_content = "A bunch of scientists bring back dinosaurs and mayhem breaks loose", metadata = {"year" : 1993, "rating" : 7.7, "genre" : "science fiction"} ), Document( page_content = "Leo DiCaprio gets lost in a dream within a dream within a dream within a ...", metadata = {"year" : 2010, "director" : "Christopher Nolan", "rating" : 8.2} ), Document( page_content = "A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea", metadata = {"year" : 2006, "director" : "Satoshi Kon", "rating" : 8.6} ), Document( page_content = "A bunch of normal-sized women are supremely wholesome and some men pine after them", metadata = {"year" : 2019, "director" : "Greta Gerwig", "rating" : 8.3} ), Document( page_content = "Toys come alive and have a blast doing so", metadata = {"year" : 1995, "genre" : "animated"} ), Document( page_content = "Three men walk into the Zone, three men walk out of the Zone", metadata = {"year" : 1979, "director" : "Andrei Tarkovsky", "genre" : "thriller", "rating" : 9.9} ) ] openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma.from_documents(documentList, openAIEmbeddings) attributeInfoList = [ AttributeInfo( name = "genre", description = "The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']", type = "string" ), AttributeInfo( name = "year", description = "The year the movie was released", type = "integer" ), AttributeInfo( name = "director", description = "The name of the movie director", type = "string" ), AttributeInfo( name = "rating", description = "A 1-10 rating for the movie", type = "float" ) ] documentContentDescription = "Brief summary of a movie" fewShotPromptTemplate = get_query_constructor_prompt( documentContentDescription, attributeInfoList ) structuredQueryOutputParser = StructuredQueryOutputParser.from_components() chatOpenAI = ChatOpenAI(temperature = 0) runnableSequence = fewShotPromptTemplate | chatOpenAI | structuredQueryOutputParser selfQueryRetriever = SelfQueryRetriever( query_constructor = runnableSequence, vectorstore = chroma, structured_query_translator = ChromaTranslator(), ) resultDocumentList = selfQueryRetriever.invoke("What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated") for resultDocument in resultDocumentList: print(resultDocument) |
■ SelfQueryRetriever 클래스의 from_llm 정적 메소드에서 enable_limit 인자를 사용해 SelfQueryRetriever 객체를 만드는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.
■ SelfQueryRetriever 클래스를 사용해 자체 쿼리 검색기를 만드는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
from dotenv import load_dotenv from langchain_core.documents import Document from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.chains.query_constructor.base import AttributeInfo from langchain_openai import ChatOpenAI from langchain.retrievers.self_query.base import SelfQueryRetriever load_dotenv() documentList = [ Document( page_content = "A bunch of scientists bring back dinosaurs and mayhem breaks loose", metadata = {"year" : 1993, "rating" : 7.7, "genre" : "science fiction"} ), Document( page_content = "Leo DiCaprio gets lost in a dream within a dream within a dream within a ...", metadata = {"year" : 2010, "director" : "Christopher Nolan", "rating" : 8.2} ), Document( page_content = "A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea", metadata = {"year" : 2006, "director" : "Satoshi Kon", "rating" : 8.6} ), Document( page_content = "A bunch of normal-sized women are supremely wholesome and some men pine after them", metadata = {"year" : 2019, "director" : "Greta Gerwig", "rating" : 8.3} ), Document( page_content = "Toys come alive and have a blast doing so", metadata = {"year" : 1995, "genre" : "animated"} ), Document( page_content = "Three men walk into the Zone, three men walk out of the Zone", metadata = {"year" : 1979, "director" : "Andrei Tarkovsky", "genre" : "thriller", "rating" : 9.9} ) ] openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma.from_documents(documentList, openAIEmbeddings) attributeInfoList = [ AttributeInfo( name = "genre", description = "The genre of the movie. One of ['science fiction', 'comedy', 'drama', 'thriller', 'romance', 'action', 'animated']", type = "string" ), AttributeInfo( name = "year", description = "The year the movie was released", type = "integer" ), AttributeInfo( name = "director", description = "The name of the movie director", type = "string" ), AttributeInfo( name = "rating", description = "A 1-10 rating for the movie", type = "float" ) ] documentContentDescription = "Brief summary of a movie" chatOpenAI = ChatOpenAI(temperature = 0) selfQueryRetriever = SelfQueryRetriever.from_llm( chatOpenAI, chroma, documentContentDescription, attributeInfoList ) # 필터만 지정한다. resultDocumentList = selfQueryRetriever.invoke("I want to watch a movie rated higher than 8.5") for resultDocument in resultDocumentList: print(resultDocument.page_content) print() # 쿼리와 필터를 지정한다. resultDocumentList = selfQueryRetriever.invoke("Has Greta Gerwig directed any movies about women") for resultDocument in resultDocumentList: print(resultDocument.page_content) print() # 복합 필터를 지정한다. resultDocumentList = selfQueryRetriever.invoke("What's a highly rated (above 8.5) science fiction film?") for resultDocument in resultDocumentList: print(resultDocument.page_content) # 쿼리와 복합 필터를 지정한다. resultDocumentList = selfQueryRetriever.invoke("What's a movie after 1990 but before 2005 that's all about toys, and preferably is animated") print() for resultDocument in resultDocumentList: print(resultDocument.page_content) |
▶
■ ParentDocumentRetriever 클래스의 생성자에서 parent_splitter/child_splitter 인자를 사용해 문서를 검색하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
from dotenv import load_dotenv from langchain_community.document_loaders import TextLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.storage import InMemoryStore from langchain.retrievers import ParentDocumentRetriever load_dotenv() textLoaderList = [ TextLoader("paul_graham_essay.txt" , encoding = "utf-8"), TextLoader("state_of_the_union.txt", encoding = "utf-8") ] documentList = [] for textLoader in textLoaderList: documentList.extend(textLoader.load()) recursiveCharacterTextSplitter1 = RecursiveCharacterTextSplitter(chunk_size = 2000) recursiveCharacterTextSplitter2 = RecursiveCharacterTextSplitter(chunk_size = 400 ) openAIEmbeddings = OpenAIEmbeddings() choma = Chroma(collection_name = "split_parents", embedding_function = openAIEmbeddings) inMemoryStore = InMemoryStore() parentDocumentRetriever = ParentDocumentRetriever( vectorstore = choma, docstore = inMemoryStore, parent_splitter = recursiveCharacterTextSplitter1, child_splitter = recursiveCharacterTextSplitter2 ) parentDocumentRetriever.add_documents(documentList, ids = None) print(len(list(inMemoryStore.yield_keys()))) resultSplitDocumentList = choma.similarity_search("justice breyer") print(len(resultSplitDocumentList[0].page_content)) resultDocumentList = parentDocumentRetriever.invoke("justice breyer") print(len(resultDocumentList[0].page_content)) |
■ ParentDocumentRetriever 클래스를 사용해 부모 문서를 검색하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
from dotenv import load_dotenv from langchain_community.document_loaders import TextLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.storage import InMemoryStore from langchain.retrievers import ParentDocumentRetriever load_dotenv() textLoaderList = [ TextLoader("paul_graham_essay.txt" , encoding = "utf-8"), TextLoader("state_of_the_union.txt", encoding = "utf-8") ] documentList = [] for textLoader in textLoaderList: documentList.extend(textLoader.load()) recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size = 400) openAIEmbeddings = OpenAIEmbeddings() choma = Chroma(collection_name = "full_documents", embedding_function = openAIEmbeddings) inMemoryStore = InMemoryStore() parentDocumentRetriever = ParentDocumentRetriever( vectorstore = choma, docstore = inMemoryStore, child_splitter = recursiveCharacterTextSplitter, ) parentDocumentRetriever.add_documents(documentList, ids = None) print(list(inMemoryStore.yield_keys())) resultSplitDocumentList = choma.similarity_search("justice breyer") print(len(resultSplitDocumentList[0].page_content)) resultDocumentList = parentDocumentRetriever.invoke("justice breyer") print(len(resultDocumentList[0].page_content)) |
▶ requirements.txt
■ MultiVectorRetriever 클래스를 사용해 가상 질문 생성 및 문서 연결해 검색 개선하기 ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
■ MultiVectorRetriever 클래스를 사용해 검색을 위해 요약문을 문서와 연관시키는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
import uuid from dotenv import load_dotenv from langchain_community.document_loaders import TextLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.storage import InMemoryByteStore from langchain.retrievers.multi_vector import MultiVectorRetriever from langchain_core.documents import Document load_dotenv() idKey = "doc_id" TextLoaderList = [ TextLoader("paul_graham_essay.txt" , encoding = "utf-8"), TextLoader("state_of_the_union.txt", encoding = "utf-8") ] documentList = [] for textLoader in TextLoaderList: documentList.extend(textLoader.load()) recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size = 10000) splitDocumentList = recursiveCharacterTextSplitter.split_documents(documentList) splitDocumentIDList = [str(uuid.uuid4()) for _ in splitDocumentList] for i, splitDocument in enumerate(splitDocumentList): splitDocument.metadata[idKey] = splitDocumentIDList[i] chatOpenAI = ChatOpenAI(model_name = "gpt-4o-mini") runnableSequence = ( {"doc" : lambda document : document.page_content} | ChatPromptTemplate.from_template("Summarize the following document :\n\n{doc}") | chatOpenAI | StrOutputParser() ) summaryList = runnableSequence.batch(splitDocumentList, {"max_concurrency" : 5}) summaryDocumentList = [ Document(page_content = summary, metadata = {idKey : splitDocumentIDList[i]}) for i, summary in enumerate(summaryList) ] openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma(collection_name = "summaries", embedding_function = openAIEmbeddings) inMemoryByteStore = InMemoryByteStore() multiVectorRetriever = MultiVectorRetriever( vectorstore = chroma, byte_store = inMemoryByteStore, id_key = idKey, ) multiVectorRetriever.vectorstore.add_documents(summaryDocumentList) multiVectorRetriever.docstore.mset(list(zip(splitDocumentIDList, splitDocumentList))) multiVectorRetriever.vectorstore.add_documents(splitDocumentList) resultDocumentList = multiVectorRetriever.vectorstore.similarity_search("justice breyer") for resultDocument in resultDocumentList: print(resultDocument.metadata) |
■ TextLoader 클래스를 사용해 복수 텍스트 파일 문서를 로드하는 방법을 보여준다. ▶ 예제 코드 (PY)
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from langchain_community.document_loaders import TextLoader TextLoaderList = [ TextLoader("paul_graham_essay.txt" , encoding = "utf-8"), TextLoader("state_of_the_union.txt", encoding = "utf-8") ] documentList = [] for textLoader in TextLoaderList: documentList.extend(textLoader.load()) |
※ pip install langchain-community 명령을 실행했다.
■ MultiVectorRetriever 클래스의 search_type 변수를 사용해 MMR(Max Marginal Relevance) 검색하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
import uuid from langchain_community.document_loaders import TextLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.storage import InMemoryByteStore from langchain.retrievers.multi_vector import MultiVectorRetriever from langchain.retrievers.multi_vector import SearchType TextLoaderList = [ TextLoader("paul_graham_essay.txt" , encoding = "utf-8"), TextLoader("state_of_the_union.txt", encoding = "utf-8") ] documentList = [] for textLoader in TextLoaderList: documentList.extend(textLoader.load()) recursiveCharacterTextSplitter1 = RecursiveCharacterTextSplitter(chunk_size = 10000) splitDocumentList = recursiveCharacterTextSplitter1.split_documents(documentList) splitDocumentIDList = [str(uuid.uuid4()) for _ in splitDocumentList] recursiveCharacterTextSplitter2 = RecursiveCharacterTextSplitter(chunk_size = 400) totalSplitSplitDocumentList = [] for i, splitDocument in enumerate(splitDocumentList): splitDocumentID = splitDocumentIDList[i] splitSplitDocumentList = recursiveCharacterTextSplitter2.split_documents([splitDocument]) for splitDocument in splitSplitDocumentList: splitDocument.metadata["doc_id"] = splitDocumentID totalSplitSplitDocumentList.extend(splitSplitDocumentList) openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma(collection_name = "full_documents", embedding_function = openAIEmbeddings) inMemoryByteStore = InMemoryByteStore() multiVectorRetriever = MultiVectorRetriever( vectorstore = chroma, byte_store = inMemoryByteStore, id_key = "doc_id" ) multiVectorRetriever.search_type = SearchType.mmr multiVectorRetriever.vectorstore.add_documents(totalSplitSplitDocumentList) multiVectorRetriever.docstore.mset(list(zip(splitDocumentIDList, splitDocumentList))) resultDocumentList = multiVectorRetriever.invoke("justice breyer") resultDocument = resultDocumentList[0] print(resultDocument) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 attrs==24.2.0 backoff==2.2.1 bcrypt==4.2.0 build==1.2.2 cachetools==5.5.0 certifi==2024.8.30 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.3 click==8.1.7 colorama==0.4.6 coloredlogs==15.0.1 dataclasses-json==0.6.7 Deprecated==1.2.14 distro==1.9.0 fastapi==0.114.2 filelock==3.16.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.9.0 google-auth==2.34.0 googleapis-common-protos==1.65.0 greenlet==3.1.0 grpcio==1.66.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.2 huggingface-hub==0.24.7 humanfriendly==10.0 idna==3.9 importlib_metadata==8.4.0 importlib_resources==6.4.5 jiter==0.5.0 jsonpatch==1.33 jsonpointer==3.0.0 kubernetes==30.1.0 langchain==0.3.0 langchain-chroma==0.1.4 langchain-community==0.3.0 langchain-core==0.3.0 langchain-openai==0.2.0 langchain-text-splitters==0.3.0 langsmith==0.1.120 markdown-it-py==3.0.0 marshmallow==3.22.0 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.1.0 mypy-extensions==1.0.0 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.19.2 openai==1.45.0 opentelemetry-api==1.27.0 opentelemetry-exporter-otlp-proto-common==1.27.0 opentelemetry-exporter-otlp-proto-grpc==1.27.0 opentelemetry-instrumentation==0.48b0 opentelemetry-instrumentation-asgi==0.48b0 opentelemetry-instrumentation-fastapi==0.48b0 opentelemetry-proto==1.27.0 opentelemetry-sdk==1.27.0 opentelemetry-semantic-conventions==0.48b0 opentelemetry-util-http==0.48b0 orjson==3.10.7 overrides==7.7.0 packaging==24.1 posthog==3.6.5 protobuf==4.25.4 pyasn1==0.6.1 pyasn1_modules==0.4.1 pydantic==2.9.1 pydantic-settings==2.5.2 pydantic_core==2.23.3 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 pyreadline3==3.4.3 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.8.1 rsa==4.9 setuptools==74.1.2 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.34 starlette==0.38.5 sympy==1.13.2 tenacity==8.5.0 tiktoken==0.7.0 tokenizers==0.20.0 tqdm==4.66.5 typer==0.12.5 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.3 uvicorn==0.30.6 watchfiles==0.24.0 websocket-client==1.8.0 websockets==13.0.1 wrapt==1.16.0 yarl==1.11.1 zipp==3.20.2 |
※ pip install langchain
■ MultiVectorRetriever 클래스의 invoke 메소드를 사용해 부모 문서를 검색하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
import uuid from langchain_community.document_loaders import TextLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.storage import InMemoryByteStore from langchain.retrievers.multi_vector import MultiVectorRetriever TextLoaderList = [ TextLoader("paul_graham_essay.txt" , encoding = "utf-8"), TextLoader("state_of_the_union.txt", encoding = "utf-8") ] documentList = [] for textLoader in TextLoaderList: documentList.extend(textLoader.load()) recursiveCharacterTextSplitter1 = RecursiveCharacterTextSplitter(chunk_size = 10000) splitDocumentList = recursiveCharacterTextSplitter1.split_documents(documentList) splitDocumentIDList = [str(uuid.uuid4()) for _ in splitDocumentList] recursiveCharacterTextSplitter2 = RecursiveCharacterTextSplitter(chunk_size = 400) totalSplitSplitDocumentList = [] for i, splitDocument in enumerate(splitDocumentList): splitDocumentID = splitDocumentIDList[i] splitSplitDocumentList = recursiveCharacterTextSplitter2.split_documents([splitDocument]) for splitSplitDocument in splitSplitDocumentList: splitSplitDocument.metadata["doc_id"] = splitDocumentID totalSplitSplitDocumentList.extend(splitSplitDocumentList) openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma(collection_name = "full_documents", embedding_function = openAIEmbeddings) inMemoryByteStore = InMemoryByteStore() multiVectorRetriever = MultiVectorRetriever( vectorstore = chroma, byte_store = inMemoryByteStore, id_key = "doc_id" ) multiVectorRetriever.vectorstore.add_documents(totalSplitSplitDocumentList) multiVectorRetriever.docstore.mset(list(zip(splitDocumentIDList, splitDocumentList))) resultDocumentList = multiVectorRetriever.invoke("justice breyer") resultDocument = resultDocumentList[0] print(len(resultDocument.page_content)) print(resultDocument.metadata) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 attrs==24.2.0 backoff==2.2.1 bcrypt==4.2.0 build==1.2.2 cachetools==5.5.0 certifi==2024.8.30 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.3 click==8.1.7 colorama==0.4.6 coloredlogs==15.0.1 dataclasses-json==0.6.7 Deprecated==1.2.14 distro==1.9.0 fastapi==0.114.2 filelock==3.16.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.9.0 google-auth==2.34.0 googleapis-common-protos==1.65.0 greenlet==3.1.0 grpcio==1.66.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.2 huggingface-hub==0.24.7 humanfriendly==10.0 idna==3.9 importlib_metadata==8.4.0 importlib_resources==6.4.5 jiter==0.5.0 jsonpatch==1.33 jsonpointer==3.0.0 kubernetes==30.1.0 langchain==0.3.0 langchain-chroma==0.1.4 langchain-community==0.3.0 langchain-core==0.3.0 langchain-openai==0.2.0 langchain-text-splitters==0.3.0 langsmith==0.1.120 markdown-it-py==3.0.0 marshmallow==3.22.0 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.1.0 mypy-extensions==1.0.0 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.19.2 openai==1.45.0 opentelemetry-api==1.27.0 opentelemetry-exporter-otlp-proto-common==1.27.0 opentelemetry-exporter-otlp-proto-grpc==1.27.0 opentelemetry-instrumentation==0.48b0 opentelemetry-instrumentation-asgi==0.48b0 opentelemetry-instrumentation-fastapi==0.48b0 opentelemetry-proto==1.27.0 opentelemetry-sdk==1.27.0 opentelemetry-semantic-conventions==0.48b0 opentelemetry-util-http==0.48b0 orjson==3.10.7 overrides==7.7.0 packaging==24.1 posthog==3.6.5 protobuf==4.25.4 pyasn1==0.6.1 pyasn1_modules==0.4.1 pydantic==2.9.1 pydantic-settings==2.5.2 pydantic_core==2.23.3 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 pyreadline3==3.4.3 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.8.1 rsa==4.9 setuptools==74.1.2 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.34 starlette==0.38.5 sympy==1.13.2 tenacity==8.5.0 tiktoken==0.7.0 tokenizers==0.20.0 tqdm==4.66.5 typer==0.12.5 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.3 uvicorn==0.30.6 watchfiles==0.24.0 websocket-client==1.8.0 websockets==13.0.1 wrapt==1.16.0 yarl==1.11.1 zipp==3.20.2 |
※ pip install langchain langchain-community
■ MultiVectorRetriever 클래스의 vectorstore/docstore 변수를 사용해 자식 문서 유사도 검색하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
import uuid from langchain_community.document_loaders import TextLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.storage import InMemoryByteStore from langchain.retrievers.multi_vector import MultiVectorRetriever TextLoaderList = [ TextLoader("paul_graham_essay.txt" , encoding = "utf-8"), TextLoader("state_of_the_union.txt", encoding = "utf-8") ] documentList = [] for textLoader in TextLoaderList: documentList.extend(textLoader.load()) recursiveCharacterTextSplitter1 = RecursiveCharacterTextSplitter(chunk_size = 10000) splitDocumentList = recursiveCharacterTextSplitter1.split_documents(documentList) splitDocumentIDList = [str(uuid.uuid4()) for _ in splitDocumentList] recursiveCharacterTextSplitter2 = RecursiveCharacterTextSplitter(chunk_size = 400) totalSplitSplitDocumentList = [] for i, splitDocument in enumerate(splitDocumentList): documentID = splitDocumentIDList[i] splitSplitDocumentList = recursiveCharacterTextSplitter2.split_documents([splitDocument]) for subsidaryDocument in splitSplitDocumentList: subsidaryDocument.metadata["doc_id"] = documentID totalSplitSplitDocumentList.extend(splitSplitDocumentList) openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma(collection_name = "full_documents", embedding_function = openAIEmbeddings) inMemoryByteStore = InMemoryByteStore() multiVectorRetriever = MultiVectorRetriever( vectorstore = chroma, byte_store = inMemoryByteStore, id_key = "doc_id" ) multiVectorRetriever.vectorstore.add_documents(totalSplitSplitDocumentList) multiVectorRetriever.docstore.mset(list(zip(splitDocumentIDList, splitDocumentList))) resultDocumentList = multiVectorRetriever.vectorstore.similarity_search("justice breyer") print(resultDocumentList[0]) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 attrs==24.2.0 backoff==2.2.1 bcrypt==4.2.0 build==1.2.2 cachetools==5.5.0 certifi==2024.8.30 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.3 click==8.1.7 colorama==0.4.6 coloredlogs==15.0.1 dataclasses-json==0.6.7 Deprecated==1.2.14 distro==1.9.0 fastapi==0.114.2 filelock==3.16.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.9.0 google-auth==2.34.0 googleapis-common-protos==1.65.0 greenlet==3.1.0 grpcio==1.66.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.2 huggingface-hub==0.24.7 humanfriendly==10.0 idna==3.9 importlib_metadata==8.4.0 importlib_resources==6.4.5 jiter==0.5.0 jsonpatch==1.33 jsonpointer==3.0.0 kubernetes==30.1.0 langchain==0.3.0 langchain-chroma==0.1.4 langchain-community==0.3.0 langchain-core==0.3.0 langchain-openai==0.2.0 langchain-text-splitters==0.3.0 langsmith==0.1.120 markdown-it-py==3.0.0 marshmallow==3.22.0 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.1.0 mypy-extensions==1.0.0 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.19.2 openai==1.45.0 opentelemetry-api==1.27.0 opentelemetry-exporter-otlp-proto-common==1.27.0 opentelemetry-exporter-otlp-proto-grpc==1.27.0 opentelemetry-instrumentation==0.48b0 opentelemetry-instrumentation-asgi==0.48b0 opentelemetry-instrumentation-fastapi==0.48b0 opentelemetry-proto==1.27.0 opentelemetry-sdk==1.27.0 opentelemetry-semantic-conventions==0.48b0 opentelemetry-util-http==0.48b0 orjson==3.10.7 overrides==7.7.0 packaging==24.1 posthog==3.6.5 protobuf==4.25.4 pyasn1==0.6.1 pyasn1_modules==0.4.1 pydantic==2.9.1 pydantic-settings==2.5.2 pydantic_core==2.23.3 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 pyreadline3==3.4.3 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.8.1 rsa==4.9 setuptools==74.1.2 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.34 starlette==0.38.5 sympy==1.13.2 tenacity==8.5.0 tiktoken==0.7.0 tokenizers==0.20.0 tqdm==4.66.5 typer==0.12.5 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.3 uvicorn==0.30.6 watchfiles==0.24.0 websocket-client==1.8.0 websockets==13.0.1 wrapt==1.16.0 yarl==1.11.1 zipp==3.20.2 |
※ pip install langchain
■ MultiVectorRetriever 클래스의 생성자에서 vectorstore/byte_store/id_key 인자를 사용해 MultiVectorRetriever 객체를 만드는 방법을 보여준다. ▶ 예제 코드 (PY)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
from langchain_openai import OpenAIEmbeddings from langchain_chroma import Chroma from langchain.storage import InMemoryByteStore from langchain.retrievers.multi_vector import MultiVectorRetriever openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma(collection_name = "full_documents", embedding_function = openAIEmbeddings) inMemoryByteStore = InMemoryByteStore() multiVectorRetriever = MultiVectorRetriever( vectorstore = chroma, byte_store = inMemoryByteStore, id_key = "doc_id" ) |
※ pip install langchain langchain-chroma