■ MilvusClient 클래스의 delete 메소드에서 collection_name/filter 인자를 사용해 엔터티를 삭제하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
|
import numpy as np from pymilvus import MilvusClient milvusClient = MilvusClient("test.db") hasCollection = milvusClient.has_collection(collection_name = "temp") if milvusClient.has_collection(collection_name= "temp"): milvusClient.drop_collection(collection_name = "temp") milvusClient.create_collection( collection_name = "temp", dimension = 384 ) stringList = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England." ] stringVectorList = [[ np.random.uniform(-1, 1) for _ in range(384) ] for _ in range(len(stringList)) ] # NDArray list itemList = [{"id" : i, "vector" : stringVectorList[i], "text" : stringList[i], "subject" : "history"} for i in range(len(stringVectorList))] milvusClient.insert( collection_name = "temp", data = itemList ) extraList1 = milvusClient.search( collection_name = "temp", data = [stringVectorList[0]], filter = "subject == 'history'", limit = 2, output_fields = ["text", "subject"] ) print(extraList1[0][0]) print(extraList1[0][1]) print("-" * 100) extraList2 = milvusClient.query( collection_name = "temp", filter = "subject == 'history'", output_fields = ["text", "subject"] ) print(extraList2[0]) print(extraList2[1]) print(extraList2[2]) print("-" * 100) deletedIDList = milvusClient.delete( collection_name = "temp", filter = "subject == 'history'" ) # int list print(deletedIDList) print("-" * 100) """ {'id': 0, 'distance': 1.0000003576278687, 'entity': {'text': 'Artificial intelligence was founded as an academic discipline in 1956.', 'subject': 'history'}} {'id': 2, 'distance': 0.09245504438877106, 'entity': {'text': 'Born in Maida Vale, London, Turing was raised in southern England.', 'subject': 'history'}} |
{'id': 0, 'text': 'Artificial intelligence was founded
더 읽기
■ MilvusClient 클래스의 생성자에서 uri/token 인자를 사용해 MilvusClient 객체를 만드는 방법을 보여준다. ※ Milvus Lite는 로컬 파이썬 프로그램을 시작하기에 좋다. ※ 대규모
더 읽기
■ MilvusClient 클래스의 delete 메소드에서 collection_name/ids 인자를 사용해 엔터티를 삭제하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
|
from pymilvus import MilvusClient from pymilvus import model milvusClient = MilvusClient("test.db") hasCollection = milvusClient.has_collection(collection_name = "temp") if milvusClient.has_collection(collection_name= "temp"): milvusClient.drop_collection(collection_name = "temp") milvusClient.create_collection( collection_name = "temp", dimension = 768 ) onnxEmbeddingFunction = model.DefaultEmbeddingFunction() stringList1 = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England." ] stringVectorList1 = onnxEmbeddingFunction.encode_documents(stringList1) # NDArray list itemList = [] itemList.extend( [ {"id" : i, "vector" : stringVectorList1[i], "text" : stringList1[i], "subject" : "history"} for i in range(len(stringVectorList1)) ] ) stringList2 = [ "Machine learning has been used for drug design.", "Computational synthesis with AI algorithms predicts molecular properties.", "DDR1 is involved in cancers and fibrosis.", ] stringVectorList2 = onnxEmbeddingFunction.encode_documents(stringList2) # NDArray list itemList.extend( [ {"id" : 3 + i, "vector" : stringVectorList2[i], "text" : stringList2[i], "subject" : "biology"} for i in range(len(stringVectorList2)) ] ) milvusClient.insert(collection_name = "temp", data = itemList) deletedIDList = milvusClient.delete(collection_name = "temp", ids = [0, 2]) # int list print(deletedIDList) """ [0, 2] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
certifi==2024.8.30 charset-normalizer==3.3.2 coloredlogs==15.0.1 environs==9.5.0 filelock==3.16.1 flatbuffers==24.3.25 fsspec==2024.9.0 grpcio==1.66.2 huggingface-hub==0.25.1 humanfriendly==10.0 idna==3.10 marshmallow==3.22.0 milvus-lite==2.4.10 milvus-model==0.2.7 mpmath==1.3.0 numpy==2.1.2 onnxruntime==1.19.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 safetensors==0.4.5 scipy==1.14.1 six==1.16.0 sympy==1.13.3 tokenizers==0.20.0 tqdm==4.66.5 transformers==4.45.1 typing_extensions==4.12.2 tzdata==2024.2 ujson==5.10.0 urllib3==2.2.3 |
※ pip install pymilvus[model]
더 읽기
■ MilvusClient 클래스의 query 메소드에서 ids 인자를 사용해 쿼리하는 방법을 보여준다. ※ 기본 키로 엔터티를 직접 검색한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
|
from pymilvus import MilvusClient from pymilvus import model milvusClient = MilvusClient("test.db") hasCollection = milvusClient.has_collection(collection_name = "temp") if milvusClient.has_collection(collection_name= "temp"): milvusClient.drop_collection(collection_name = "temp") milvusClient.create_collection( collection_name = "temp", dimension = 768 ) onnxEmbeddingFunction = model.DefaultEmbeddingFunction() stringList1 = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England." ] stringVectorList1 = onnxEmbeddingFunction.encode_documents(stringList1) # NDArray List itemList = [] itemList.extend( [ {"id" : i, "vector" : stringVectorList1[i], "text" : stringList1[i], "subject" : "history"} for i in range(len(stringVectorList1)) ] ) stringList2 = [ "Machine learning has been used for drug design.", "Computational synthesis with AI algorithms predicts molecular properties.", "DDR1 is involved in cancers and fibrosis.", ] stringVectorList2 = onnxEmbeddingFunction.encode_documents(stringList2) # NDArray list itemList.extend( [ {"id" : 3 + i, "vector" : stringVectorList2[i], "text" : stringList2[i], "subject" : "biology"} for i in range(len(stringVectorList2)) ] ) milvusClient.insert(collection_name = "temp", data = itemList) extraList = milvusClient.query( collection_name = "temp", ids = [0, 2], output_fields = ["text", "subject"] ) print(extraList[0]) print(extraList[1]) """ {'id' : 0, 'text' : 'Artificial intelligence was founded as an academic discipline in 1956.', 'subject' : 'history'} {'id' : 2, 'text' : 'Born in Maida Vale, London, Turing was raised in southern England.' , 'subject' : 'history'} """ |
▶ requirements.txt
더 읽기
■ MilvusClient 클래스의 query 메소드에서 collection_name/filter/output_fields 인자를 사용해 쿼리하는 방법을 보여준다. ※ 필터 표현식이나 일부 ID와 일치하는 크레트리아와 일치하는 모든 엔터티를 검색하는
더 읽기
■ MilvusClient 클래스의 search 메소드에서 collection_name/data/limit/output_fields 인자를 사용해 벡터를 검색하는 방법을 보여준다. ※ Milvus는 동시에 하나 또는 여러 개의 벡터 검색 요청을
더 읽기
■ MilvusClient 클래스의 insert 메소드에서 collection_name/data 인자를 사용해 데이터를 삽입하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
|
from pymilvus import MilvusClient from pymilvus import model milvusClient = MilvusClient("test.db") hasCollection = milvusClient.has_collection(collection_name = " temp") if milvusClient.has_collection(collection_name= "temp"): milvusClient.drop_collection(collection_name = "temp") milvusClient.create_collection( collection_name = "temp", dimension = 768 ) onnxEmbeddingFunction = model.DefaultEmbeddingFunction() stringList = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England." ] ndArrayList = onnxEmbeddingFunction.encode_documents(stringList) itemList = [ {"id" : i, "vector" : ndArrayList[i], "text" : stringList[i], "subject" : "history"} for i in range(len(ndArrayList)) ] omitZeroDict = milvusClient.insert(collection_name = "temp", data = itemList) print(omitZeroDict) """ {'insert_count': 3, 'ids': [0, 1, 2]} """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
certifi==2024.8.30 charset-normalizer==3.3.2 coloredlogs==15.0.1 environs==9.5.0 filelock==3.16.1 flatbuffers==24.3.25 fsspec==2024.9.0 grpcio==1.66.2 huggingface-hub==0.25.1 humanfriendly==10.0 idna==3.10 marshmallow==3.22.0 milvus-lite==2.4.10 milvus-model==0.2.7 mpmath==1.3.0 numpy==2.1.2 onnxruntime==1.19.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 safetensors==0.4.5 scipy==1.14.1 six==1.16.0 sympy==1.13.3 tokenizers==0.20.0 tqdm==4.66.5 transformers==4.45.1 typing_extensions==4.12.2 tzdata==2024.2 ujson==5.10.0 urllib3==2.2.3 |
※ pip install pymilvus[model]
더 읽기
■ OnnxEmbeddingFunction 클래스의 encode_queries 메소드를 사용해 쿼리 벡터 리스트를 만드는 방법을 보여준다. ▶ main.py
|
from pymilvus import model onnxEmbeddingFunction = model.DefaultEmbeddingFunction() queryVectorList = onnxEmbeddingFunction.encode_queries(["Who is Alan Turing?"]) # NDArray list |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
certifi==2024.8.30 charset-normalizer==3.3.2 coloredlogs==15.0.1 environs==9.5.0 filelock==3.16.1 flatbuffers==24.3.25 fsspec==2024.9.0 grpcio==1.66.2 huggingface-hub==0.25.1 humanfriendly==10.0 idna==3.10 marshmallow==3.22.0 milvus-lite==2.4.10 milvus-model==0.2.7 mpmath==1.3.0 numpy==2.1.2 onnxruntime==1.19.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 safetensors==0.4.5 scipy==1.14.1 six==1.16.0 sympy==1.13.3 tokenizers==0.20.0 tqdm==4.66.5 transformers==4.45.1 typing_extensions==4.12.2 tzdata==2024.2 ujson==5.10.0 urllib3==2.2.3 |
※ pip install pymilvus[model]
더 읽기
■ 임의 벡터로 텍스트를 표현하는 방법을 보여준다. ※ 네트워크 문제로 인해 모델을 다운로드할 수 없는 경우, 임시 방편으로 임의의 벡터를 사용하여 텍스트를
더 읽기
■ 벡터로 텍스트를 표현하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
|
from pymilvus import model # https://huggingface.co/에 연결이 실패한 경우 다음 경로의 주석 처리를 제거한다. # import os # os.environ["HF_ENDPOINT":"] = "https://hf-mirror.com" # 작은 임베딩 모델 "paraphrase-albert-small-v2"(~50MB)이 다운로드된다. onnxEmbeddingFunction = model.DefaultEmbeddingFunction() stringList = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England." ] stringVectorList = onnxEmbeddingFunction.encode_documents(stringList) # NDArray list itemList = [ {"id" : i, "vector" : stringVectorList[i], "text" : stringList[i], "subject" : "history"} for i in range(len(stringVectorList)) ] print("Data has", len(itemList), "entities, each with fields : ", itemList[0].keys()) print("Vector dim :", len(itemList[0]["vector"])) """ Data has 3 entities, each with fields : dict_keys(['id', 'vector', 'text', 'subject']) Vector dim : 768 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
certifi==2024.8.30 charset-normalizer==3.3.2 coloredlogs==15.0.1 environs==9.5.0 filelock==3.16.1 flatbuffers==24.3.25 fsspec==2024.9.0 grpcio==1.66.2 huggingface-hub==0.25.1 humanfriendly==10.0 idna==3.10 marshmallow==3.22.0 milvus-lite==2.4.10 milvus-model==0.2.7 mpmath==1.3.0 numpy==2.1.2 onnxruntime==1.19.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 safetensors==0.4.5 scipy==1.14.1 six==1.16.0 sympy==1.13.3 tokenizers==0.20.0 tqdm==4.66.5 transformers==4.45.1 typing_extensions==4.12.2 tzdata==2024.2 ujson==5.10.0 urllib3==2.2.3 |
※ pip install pymilvus[model] 명령을 실행했다.
■ OnnxEmbeddingFunction 클래스의 dim 속성을 사용해 차원을 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
from pymilvus import model # https://huggingface.co/에 연결이 실패한 경우 다음 경로의 주석 처리를 제거한다. # import os # os.environ["HF_ENDPOINT":"] = "https://hf-mirror.com" # 작은 임베딩 모델 "paraphrase-albert-small-v2"(~50MB)이 다운로드된다. onnxEmbeddingFunction = model.DefaultEmbeddingFunction() stringList = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England." ] stringVectorList = onnxEmbeddingFunction.encode_documents(stringList) # NDArray list print("차원 :", onnxEmbeddingFunction.dim, stringVectorList[0].shape) # Dim: 768 (768,) """ 차원 : 768 (768,) """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
certifi==2024.8.30 charset-normalizer==3.3.2 coloredlogs==15.0.1 environs==9.5.0 filelock==3.16.1 flatbuffers==24.3.25 fsspec==2024.9.0 grpcio==1.66.2 huggingface-hub==0.25.1 humanfriendly==10.0 idna==3.10 marshmallow==3.22.0 milvus-lite==2.4.10 milvus-model==0.2.7 mpmath==1.3.0 numpy==2.1.2 onnxruntime==1.19.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 safetensors==0.4.5 scipy==1.14.1 six==1.16.0 sympy==1.13.3 tokenizers==0.20.0 tqdm==4.66.5 transformers==4.45.1 typing_extensions==4.12.2 tzdata==2024.2 ujson==5.10.0 urllib3==2.2.3 |
※ pip install pymilvus[model] 명령을 실행했다.
■ OnnxEmbeddingFunction 클래스의 encode_documents 메소드를 사용해 벡터 리스트를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
from pymilvus import model # https://huggingface.co/에 연결이 실패한 경우 다음 경로의 주석 처리를 제거한다. # import os # os.environ["HF_ENDPOINT":"] = "https://hf-mirror.com" # 작은 임베딩 모델 "paraphrase-albert-small-v2"(~50MB)이 다운로드된다. onnxEmbeddingFunction = model.DefaultEmbeddingFunction() stringList = [ "Artificial intelligence was founded as an academic discipline in 1956.", "Alan Turing was the first person to conduct substantial research in AI.", "Born in Maida Vale, London, Turing was raised in southern England." ] stringVectorList = onnxEmbeddingFunction.encode_documents(stringList) # NDArray list print(len(stringVectorList)) """ 3 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
certifi==2024.8.30 charset-normalizer==3.3.2 coloredlogs==15.0.1 environs==9.5.0 filelock==3.16.1 flatbuffers==24.3.25 fsspec==2024.9.0 grpcio==1.66.2 huggingface-hub==0.25.1 humanfriendly==10.0 idna==3.10 marshmallow==3.22.0 milvus-lite==2.4.10 milvus-model==0.2.7 mpmath==1.3.0 numpy==2.1.2 onnxruntime==1.19.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 safetensors==0.4.5 scipy==1.14.1 six==1.16.0 sympy==1.13.3 tokenizers==0.20.0 tqdm==4.66.5 transformers==4.45.1 typing_extensions==4.12.2 tzdata==2024.2 ujson==5.10.0 urllib3==2.2.3 |
※ pip install pymilvus[model] 명령을
더 읽기
■ DefaultEmbeddingFunction 함수를 사용해 OnnxEmbeddingFunction 객체를 만드는 방법을 보여준다. ▶ main.py
|
from pymilvus import model # https://huggingface.co/에 연결이 실패한 경우 다음 경로의 주석 처리를 제거한다. # import os # os.environ["HF_ENDPOINT":"] = "https://hf-mirror.com" # 작은 임베딩 모델 "paraphrase-albert-small-v2"(~50MB)이 다운로드된다. onnxEmbeddingFunction = model.DefaultEmbeddingFunction() |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
|
certifi==2024.8.30 charset-normalizer==3.3.2 coloredlogs==15.0.1 environs==9.5.0 filelock==3.16.1 flatbuffers==24.3.25 fsspec==2024.9.0 grpcio==1.66.2 huggingface-hub==0.25.1 humanfriendly==10.0 idna==3.10 marshmallow==3.22.0 milvus-lite==2.4.10 milvus-model==0.2.7 mpmath==1.3.0 numpy==2.1.2 onnxruntime==1.19.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 PyYAML==6.0.2 regex==2024.9.11 requests==2.32.3 safetensors==0.4.5 scipy==1.14.1 six==1.16.0 sympy==1.13.3 tokenizers==0.20.0 tqdm==4.66.5 transformers==4.45.1 typing_extensions==4.12.2 tzdata==2024.2 ujson==5.10.0 urllib3==2.2.3 |
※ pip install pymilvus[model] 명령을 실행했다.
■ MilvusClient 클래스의 create_collection 메소드에서 collection_name/dimension 인자를 사용해 컬렉션을 만드는 방법을 보여준다. ※ 기본 키와 벡터 필드는 기본 이름("id" 및 "vector")을 사용한다.
더 읽기
■ MilvusClient 클래스의 drop_collection 메소드에서 collection_name 인자를 사용해 컬렉션을 삭제하는 방법을 보여준다. ▶ main.py
|
from pymilvus import MilvusClient milvusClient = MilvusClient("test.db") hasCollection = milvusClient.has_collection(collection_name = " temp") if milvusClient.has_collection(collection_name= "temp"): milvusClient.drop_collection(collection_name = "temp") |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
environs==9.5.0 grpcio==1.66.2 marshmallow==3.22.0 milvus-lite==2.4.10 numpy==2.1.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 six==1.16.0 tqdm==4.66.5 tzdata==2024.2 ujson==5.10.0 |
※ pip install pymilvus
더 읽기
■ MilvusClient 클래스의 has_collection 메소드에서 collection_name 인자를 사용해 컬렉션 존재 여부를 구하는 방법을 보여준다. ▶ main.py
|
from pymilvus import MilvusClient milvusClient = MilvusClient("test.db") hasCollection = milvusClient.has_collection(collection_name = "temp") print(hasCollection) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
environs==9.5.0 grpcio==1.66.2 marshmallow==3.22.0 milvus-lite==2.4.10 numpy==2.1.2 packaging==24.1 pandas==2.2.3 protobuf==5.28.2 pymilvus==2.4.7 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 six==1.16.0 tqdm==4.66.5 tzdata==2024.2 ujson==5.10.0 |
※ pip
더 읽기
■ MilvusClient 클래스의 생성자를 사용해 MilvusClient 객체를 만드는 방법을 보여준다. ※ 로컬 Milvus 벡터 데이터베이스를 만들려면 "test.db"와 같이 모든 데이터를 저장할 파일명을
더 읽기
■ connection 함수를 사용해 POSTGRESQL 데이터베이스 연결을 만드는 방법을 보여준다. ▶ ./.streamlit/secrets.toml
|
[connections.posgreSQL] type="sql" dialect="postgresql" username="user1" password="password1" host="127.0.0.1" port=5432 database="test" |
※ postgreSQL : 연결명 ※ postgresql : 데이터베이스 타입
더 읽기
■ SqlConnection 클래스의 query 메소드를 사용해 데이터를 조회하는 방법을 보여준다. ▶ ./.streamlit/secrets.toml
|
[connections.posgreSQL] type="sql" dialect="postgresql" username="user1" password="password1" host="127.0.0.1" port=5432 database="test" |
※ postgreSQL : 연결명 ※ postgresql : 데이터베이스 타입
더 읽기
■ set_llm_cache 함수를 사용해 SQLiteCache 객체를 캐시로 설정하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
|
from dotenv import load_dotenv from langchain.globals import set_llm_cache from langchain_community.cache import SQLiteCache from langchain_openai import ChatOpenAI load_dotenv() sqliteCache = SQLiteCache(database_path = "cache.db") set_llm_cache(sqliteCache) chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") llmResult1 = chatOpenAI.invoke("고양이 울음소리는?") print(llmResult1) """ content='"야옹"이라고 합니다.' response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 18, 'total_tokens': 29}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-70cc38f8-64a1-43a0-b7a5-6e0b7dce0be9-0' usage_metadata={'input_tokens': 18, 'output_tokens': 11, 'total_tokens': 29} """ llmResult2 = chatOpenAI.invoke("고양이 울음소리는?") print(llmResult2) """ content='"야옹"이라고 합니다.' response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 18, 'total_tokens': 29}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-70cc38f8-64a1-43a0-b7a5-6e0b7dce0be9-0' usage_metadata={'input_tokens': 18, 'output_tokens': 11, 'total_tokens': 29} """ llmResult3 = chatOpenAI.invoke("까마귀 울음소리는?") print(llmResult3) """ content='까악 까악!' response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 20, 'total_tokens': 30}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-61db093c-e5d5-462f-b6a8-702eadcb522c-0' usage_metadata={'input_tokens': 20, 'output_tokens': 10, 'total_tokens': 30} """ |
▶
더 읽기
■ SQLChatMessageHistory 클래스를 사용해 SQLITE에 채팅 메시지 히스토리를 저장하는 방법을 보여준다. ※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
from dotenv import load_dotenv from langchain_community.chat_message_histories import SQLChatMessageHistory from langchain_openai import ChatOpenAI from langchain_core.output_parsers import StrOutputParser from langchain_core.runnables.history import RunnableWithMessageHistory load_dotenv() def getSessionHistory(sessionID): return SQLChatMessageHistory(sessionID, "sqlite:///memory.db") chatOpenAI = ChatOpenAI(model = "gpt-3.5-turbo-0125") runnableSequence = chatOpenAI | StrOutputParser() runnableWithMessageHistory = RunnableWithMessageHistory(runnableSequence, getSessionHistory) |
더 읽기
■ create_retriever_tool 함수를 사용해 FAISS 벡터 스토어 검색 도구를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
|
import ast import os import re from langchain_community.utilities import SQLDatabase from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings from langchain.agents.agent_toolkits import create_retriever_tool os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" # SQLITE 데이터베이스를 초기화한다. sqlDatabase = SQLDatabase.from_uri("sqlite:///chinook.db") # SQLite 데이터를 가져오는 함수를 정의한다. def getSQLiteData(sqlDatabase, sql): resultString = sqlDatabase.run(sql) resultTupleList = ast.literal_eval(resultString) resultList1 = [element for resultTuple in resultTupleList for element in resultTuple if element] resultList2 = [re.sub(r"\b\d+\b", "", result).strip() for result in resultList1] return list(set(resultList2)) # 아티스트명, 앨범 타이틀 리스트를 구한다. artistNameList = getSQLiteData(sqlDatabase, "SELECT Name FROM artists") albumTitleList = getSQLiteData(sqlDatabase, "SELECT Title FROM albums" ) # FAISS 벡터 스토어를 생성한다: faiss = FAISS.from_texts(artistNameList + albumTitleList, OpenAIEmbeddings()) # FAISS 벡터 스토어 검색기를 설정한다. faissVectorStoreRetriever = faiss.as_retriever(search_kwargs = {"k" : 5}) # FAISS 벡터 스토어 검색기 도구를 생성한다. faissVectorStoreRetrieverTool = create_retriever_tool( faissVectorStoreRetriever, name = "search_proper_nouns", description = "Use to look up values to filter on. Input is an approximate spelling of the proper noun, output is valid proper nouns. Use the noun most similar to the search." ) # FAISS 벡터 스토어 검색기 도구를 실행한다. resultString = faissVectorStoreRetrieverTool.invoke("Alice Chains") print(resultString) """ Alice In Chains Alanis Morissette Pearl Jam Pearl Jam Audioslave """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
|
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.3 langchain-community==0.2.4 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.77 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.33.0 orjson==3.10.4 packaging==23.2 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.3.0 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.1 yarl==1.9.4 |
※ pip install langchain
더 읽기
■ FAISS 클래스의 as_retriever 메소드에서 search_kwargs 인자를 사용해 벡터 스토어 검색기를 구하는 방법을 보여준다. ※ 검색기가 반환하는 문서 수를 제한할 수도 있다.
더 읽기
■ from_texts 함수를 사용해 FAISS 벡터 스토어를 생성하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
|
import ast import os import re from langchain_community.utilities import SQLDatabase from langchain_community.vectorstores import FAISS from langchain_openai import OpenAIEmbeddings os.environ["OPENAI_API_KEY"] = "<OPENAI_API_KEY>" # SQLITE 데이터베이스를 초기화한다. sqlDatabase = SQLDatabase.from_uri("sqlite:///chinook.db") # SQLite 데이터를 가져오는 함수를 정의한다. def getSQLiteData(sqlDatabase, sql): resultString = sqlDatabase.run(sql) resultTupleList = ast.literal_eval(resultString) resultList1 = [element for resultTuple in resultTupleList for element in resultTuple if element] resultList2 = [re.sub(r"\b\d+\b", "", result).strip() for result in resultList1] return list(set(resultList2)) # 아티스트명, 앨범 타이틀 리스트를 구한다. artistNameList = getSQLiteData(sqlDatabase, "SELECT Name FROM artists") albumTitleList = getSQLiteData(sqlDatabase, "SELECT Title FROM albums" ) # FAISS 벡터 스토어를 생성한다: faiss = FAISS.from_texts(artistNameList + albumTitleList, OpenAIEmbeddings()) |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
|
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.3 langchain-community==0.2.4 langchain-core==0.2.5 langchain-openai==0.1.8 langchain-text-splitters==0.2.1 langsmith==0.1.77 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.33.0 orjson==3.10.4 packaging==23.2 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.30 tenacity==8.3.0 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.1 yarl==1.9.4 |
※ pip install langchain langchain-community langchain-openai
더 읽기
■ SQLDatabase 클래스의 run 메소드를 사용해 SQLITE 데이터를 가져오는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import ast import re from langchain_community.utilities import SQLDatabase sqlDatabase = SQLDatabase.from_uri("sqlite:///chinook.db") def getSQLiteData(sqlDatabase, sql): resultString = sqlDatabase.run(sql) resultTupleList = ast.literal_eval(resultString) resultList1 = [element for resultTuple in resultTupleList for element in resultTuple if element] resultList2 = [re.sub(r"\b\d+\b", "", result).strip() for result in resultList1] return list(set(resultList2)) artistNameList = getSQLiteData(sqlDatabase, "SELECT Name FROM artists") albumTitleList = getSQLiteData(sqlDatabase, "SELECT Title FROM albums ") print(artistNameList[:5]) print(albumTitleList[:5]) """ ['Fretwork', 'Aaron Copland & London Symphony Orchestra', 'Raimundos', 'Seu Jorge', 'Ney Matogrosso'] ['St. Anger', 'Mezmerize', 'Speak of the Devil', 'Allegri: Miserere', 'Machine Head'] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
|
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 frozenlist==1.4.1 greenlet==3.0.3 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.3 langchain-community==0.2.4 langchain-core==0.2.5 langchain-text-splitters==0.2.1 langsmith==0.1.77 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 orjson==3.10.4 packaging==23.2 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 requests==2.32.3 SQLAlchemy==2.0.30 tenacity==8.3.0 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.1 yarl==1.9.4 |
※ pip install langchain-community 명령을
더 읽기