[PYTHON/LANGCHAIN] from_documents 함수 : documents/embedding 인자를 사용해 메모리 내 벡터 저장소를 위한 Chroma 객체 만들기
■ from_documents 함수의 documents/embedding 인자를 사용해 메모리 내 벡터 저장소를 위한 Chroma 객체를 만드는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env
■ from_documents 함수의 documents/embedding 인자를 사용해 메모리 내 벡터 저장소를 위한 Chroma 객체를 만드는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env
■ Chroma 클래스의 similarity_search_by_vector 메소드를 사용해 검색 문자열의 벡터 리스트로 검색 결과 문서 리스트를 구하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은
■ Chroma 클래스의 as_retriever 메소드를 사용해 VectorStoreRetriever 객체를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
import os import bs4 from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" webBaseLoader = WebBaseLoader( web_paths = ("https://lilianweng.github.io/posts/2023-06-23-agent/",), bs_kwargs = dict(parse_only = bs4.SoupStrainer(class_ = ("post-content", "post-title", "post-header"))) ) documentList = webBaseLoader.load() recursiveCharacterTextSplitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200) splitDocumentList = recursiveCharacterTextSplitter.split_documents(documentList) chroma = Chroma.from_documents(documents = splitDocumentList, embedding = OpenAIEmbeddings()) vectorStoreRetriever = chroma.as_retriever() |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 beautifulsoup4==4.12.3 bs4==0.0.2 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 dataclasses-json==0.6.7 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==3.0.0 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-community==0.2.4 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.77 markdown-it-py==3.0.0 MarkupSafe==2.1.5 marshmallow==3.21.3 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.4 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 soupsieve==2.5 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing-inspect==0.9.0 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※ pip install langchain langchain-chroma
■ Chroma 클래스의 as_retriever 메소드를 사용해 VectorStoreRetriever 객체를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( documentList, embedding = OpenAIEmbeddings(), ) vectorStoreRetriever = chroma.as_retriever( search_type = "similarity", # "similarity"(디폴트), "mmr" (maximum marginal relevance, described above), "similarity_score_threshold" search_kwargs = {"k" : 1}, ) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==2.4 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.75 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.3 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※ pip install langchain langchain-chroma
■ Chroma 클래스의 similarity_search_by_vector 메소드를 사용해 검색 문자열의 벡터 리스트로 검색 결과 문서 리스트를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] openAIEmbeddings = OpenAIEmbeddings() chroma = Chroma.from_documents( documentList, embedding = openAIEmbeddings, ) catEnbeddingList = openAIEmbeddings.embed_query("cat") # catEnbeddingList 항목 수 : 1536개 searchResultList = chroma.similarity_search_by_vector(catEnbeddingList) print(searchResultList) """ [ Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ) ] """ |
▶ requirements.txt
■ Chroma 클래스의 similarity_search_with_score 메소드를 사용해 검색 문자열로 검색 결과 문서 리스트를 구하는 방법을 보여준다. (점수 포함) ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( documentList, embedding = OpenAIEmbeddings(), ) searchResultList = chroma.similarity_search_with_score("cat") print(searchResultList) """ [ ( Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), 0.375326931476593 ), ( Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), 0.4833090305328369 ), ( Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), 0.4958883225917816 ), ( Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ), 0.4974174499511719 ) ] """ |
▶ requirements.txt
■ Chroma 클래스의 asimilarity_search 메소드를 사용해 검색 문자열로 검색 결과 문서 리스트를 구하는 방법을 보여준다. (비동기) ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
import asyncio import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings async def main(): os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( documentList, embedding = OpenAIEmbeddings(), ) searchResultList = await chroma.asimilarity_search("cat") print(searchResultList) asyncio.run(main()) """ [ Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ) ] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==2.4 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.75 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.3 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
■ Chroma 클래스의 similarity_search 메소드를 사용해 검색 문자열로 검색 결과 문서 리스트를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( ) documentList, embedding = OpenAIEmbeddings(), searchResultList = chroma.similarity_search("cat") print(searchResultList) """ [ Document( page_content = 'Cats are independent pets that often enjoy their own space.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Dogs are great companions, known for their loyalty and friendliness.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Rabbits are social animals that need plenty of space to hop around.', metadata = {'source' : 'mammal-pets-doc'} ), Document( page_content = 'Parrots are intelligent birds capable of mimicking human speech.', metadata = {'source' : 'bird-pets-doc'} ) ] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==2.4 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.75 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.3 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※
■ from_documents 함수를 사용해 메모리 내 벡터 저장소를 위한 Chroma 객체를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
import os from langchain_core.documents import Document from langchain_chroma import Chroma from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" documentList = [ Document( page_content = "Dogs are great companions, known for their loyalty and friendliness.", metadata = {"source" : "mammal-pets-doc"} ), Document( page_content = "Cats are independent pets that often enjoy their own space.", metadata = {"source" : "mammal-pets-doc"}, ), Document( page_content = "Goldfish are popular pets for beginners, requiring relatively simple care.", metadata = {"source" : "fish-pets-doc"}, ), Document( page_content = "Parrots are intelligent birds capable of mimicking human speech.", metadata = {"source" : "bird-pets-doc"}, ), Document( page_content = "Rabbits are social animals that need plenty of space to hop around.", metadata = {"source" : "mammal-pets-doc"}, ), ] chroma = Chroma.from_documents( documentList, embedding = OpenAIEmbeddings(), ) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 asgiref==3.8.1 async-timeout==4.0.3 attrs==23.2.0 backoff==2.2.1 bcrypt==4.1.3 build==1.2.1 cachetools==5.3.3 certifi==2024.6.2 charset-normalizer==3.3.2 chroma-hnswlib==0.7.3 chromadb==0.5.0 click==8.1.7 coloredlogs==15.0.1 Deprecated==1.2.14 distro==1.9.0 dnspython==2.6.1 email_validator==2.1.1 exceptiongroup==1.2.1 fastapi==0.111.0 fastapi-cli==0.0.4 filelock==3.14.0 flatbuffers==24.3.25 frozenlist==1.4.1 fsspec==2024.6.0 google-auth==2.30.0 googleapis-common-protos==1.63.1 greenlet==3.0.3 grpcio==1.64.1 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.0 huggingface-hub==0.23.3 humanfriendly==10.0 idna==3.7 importlib_metadata==7.1.0 importlib_resources==6.4.0 Jinja2==3.1.4 jsonpatch==1.33 jsonpointer==2.4 kubernetes==30.1.0 langchain==0.2.3 langchain-chroma==0.1.1 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.75 markdown-it-py==3.0.0 MarkupSafe==2.1.5 mdurl==0.1.2 mmh3==4.1.0 monotonic==1.6 mpmath==1.3.0 multidict==6.0.5 numpy==1.26.4 oauthlib==3.2.2 onnxruntime==1.18.0 openai==1.33.0 opentelemetry-api==1.25.0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-util-http==0.46b0 orjson==3.10.3 overrides==7.7.0 packaging==23.2 posthog==3.5.0 protobuf==4.25.3 pyasn1==0.6.0 pyasn1_modules==0.4.0 pydantic==2.7.3 pydantic_core==2.18.4 Pygments==2.18.0 PyPika==0.48.9 pyproject_hooks==1.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-multipart==0.0.9 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 shellingham==1.5.4 six==1.16.0 sniffio==1.3.1 SQLAlchemy==2.0.30 starlette==0.37.2 sympy==1.12.1 tenacity==8.3.0 tiktoken==0.7.0 tokenizers==0.19.1 tomli==2.0.1 tqdm==4.66.4 typer==0.12.3 typing_extensions==4.12.2 ujson==5.10.0 urllib3==2.2.1 uvicorn==0.30.1 uvloop==0.19.0 watchfiles==0.22.0 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 yarl==1.9.4 zipp==3.19.2 |
※ pip
■ langchain-chroma 패키지를 설치하는 방법을 보여준다. 1. 명령 프롬프트를 실행한다. 2. 명령 프롬프트에서 아래 명령을 실행한다. ▶ 실행 명령
1 2 3 |
pip install langchain-chroma |