[PYTHON/LANGCHAIN] OllamaEmbeddings 클래스 : 문자열간 벡터 유사도 계산하기

■ OllamaEmbeddings 클래스를 사용해 문자열간 벡터 유사도를 계산하는 방법을 보여준다.

※ ollama run chatfire/bge-m3:q8_0 명령을 실행해 모델에 사전 다운로드한다.

▶ main.py


import numpy as np

from langchain_ollama.embeddings import OllamaEmbeddings

ollamaEmbeddings = OllamaEmbeddings(model = "chatfire/bge-m3:q8_0")

query = "LangChain 에 대해서 상세히 알려주세요."

queryVector = ollamaEmbeddings.embed_query(query) # 항목 수 : 1024개

textList = [
    "안녕, 만나서 반가워.",
    "LangChain simplifies the process of building applications with large language models",
    "랭체인 한국어 튜토리얼은 LangChain의 공식 문서, cookbook 및 다양한 실용 예제를 바탕으로 하여 사용자가 LangChain을 더 쉽고 효과적으로 활용할 수 있도록 구성되어 있습니다. ",
    "LangChain은 초거대 언어모델로 애플리케이션을 구축하는 과정을 단순화합니다.",
    "Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses."
]

textVectorList = ollamaEmbeddings.embed_documents(textList)

similarity = np.array(queryVector) @ np.array(textVectorList).T

sortedTextVectorIndexList = (np.array(queryVector) @ np.array(textVectorList).T).argsort()[::-1]

print(query)
print("-" * 50)

for index, sortedTextVectorIndex in enumerate(sortedTextVectorIndexList):
    print(f"[{index}] 유사도 : {similarity[sortedTextVectorIndex]:.3f} | {textList[sortedTextVectorIndex]}")
    print()

"""
LangChain 에 대해서 상세히 알려주세요.
--------------------------------------------------
[0] 유사도 : 0.926 | LangChain은 초거대 언어모델로 애플리케이션을 구축하는 과정을 단순화합니다.

[1] 유사도 : 0.842 | 랭체인 한국어 튜토리얼은 LangChain의 공식 문서, cookbook 및 다양한 실용 예제를 바탕으로 하여 사용자가 LangChain을 더 쉽고 효과적으로 활용할 수 있도록 구성되어 있습니다.

[2] 유사도 : 0.744 | LangChain simplifies the process of building applications with large language models

[3] 유사도 : 0.706 | 안녕, 만나서 반가워.

[4] 유사도 : 0.424 | Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.
"""

import numpy as np

from langchain_ollama.embeddings import OllamaEmbeddings

ollamaEmbeddings = OllamaEmbeddings(model = "chatfire/bge-m3:q8_0")

query = "LangChain 에 대해서 상세히 알려주세요."

queryVector = ollamaEmbeddings.embed_query(query) # 항목 수 : 1024개

textList = [

"안녕, 만나서 반가워.",

"LangChain simplifies the process of building applications with large language models",

"랭체인 한국어 튜토리얼은 LangChain의 공식 문서, cookbook 및 다양한 실용 예제를 바탕으로 하여 사용자가 LangChain을 더 쉽고 효과적으로 활용할 수 있도록 구성되어 있습니다. ",

"LangChain은 초거대 언어모델로 애플리케이션을 구축하는 과정을 단순화합니다.",

"Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses."

]

textVectorList = ollamaEmbeddings.embed_documents(textList)

similarity = np.array(queryVector) @ np.array(textVectorList).T

sortedTextVectorIndexList = (np.array(queryVector) @ np.array(textVectorList).T).argsort()[::-1]

print(query)

print("-" * 50)

for index, sortedTextVectorIndex in enumerate(sortedTextVectorIndexList):

print(f"[{index}] 유사도 : {similarity[sortedTextVectorIndex]:.3f} | {textList[sortedTextVectorIndex]}")

print()

"""

LangChain 에 대해서 상세히 알려주세요.

--------------------------------------------------

[0] 유사도 : 0.926 | LangChain은 초거대 언어모델로 애플리케이션을 구축하는 과정을 단순화합니다.

[1] 유사도 : 0.842 | 랭체인 한국어 튜토리얼은 LangChain의 공식 문서, cookbook 및 다양한 실용 예제를 바탕으로 하여 사용자가 LangChain을 더 쉽고 효과적으로 활용할 수 있도록 구성되어 있습니다.

[2] 유사도 : 0.744 | LangChain simplifies the process of building applications with large language models

[3] 유사도 : 0.706 | 안녕, 만나서 반가워.

[4] 유사도 : 0.424 | Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.

"""

▶ requirements.txt


annotated-types==0.7.0
anyio==4.8.0
certifi==2024.12.14
charset-normalizer==3.4.1
exceptiongroup==1.2.2
h11==0.14.0
httpcore==1.0.7
httpx==0.27.2
idna==3.10
jsonpatch==1.33
jsonpointer==3.0.0
langchain-core==0.3.29
langchain-ollama==0.2.2
langsmith==0.2.10
numpy==2.2.1
ollama==0.4.6
orjson==3.10.14
packaging==24.2
pydantic==2.10.5
pydantic_core==2.27.2
PyYAML==6.0.2
requests==2.32.3
requests-toolbelt==1.0.0
sniffio==1.3.1
tenacity==9.0.0
typing_extensions==4.12.2
urllib3==2.3.0

annotated-types==0.7.0

anyio==4.8.0

certifi==2024.12.14

charset-normalizer==3.4.1

exceptiongroup==1.2.2

h11==0.14.0

httpcore==1.0.7

httpx==0.27.2

idna==3.10

jsonpatch==1.33

jsonpointer==3.0.0

langchain-core==0.3.29

langchain-ollama==0.2.2

langsmith==0.2.10

numpy==2.2.1

ollama==0.4.6

orjson==3.10.14

packaging==24.2

pydantic==2.10.5

pydantic_core==2.27.2

PyYAML==6.0.2

requests==2.32.3

requests-toolbelt==1.0.0

sniffio==1.3.1

tenacity==9.0.0

typing_extensions==4.12.2

urllib3==2.3.0

※ pip install langchain_ollama numpy 명령을 실행했다.

Post Views: 3

AI EMBEDDING LANGCHAIN LLM OLLAMA PYTHON SIMILARITY

icodebroker

[PYTHON/LANGCHAIN] OllamaEmbeddings 클래스 : 문자열간 벡터 유사도 계산하기

분류

보관함