[PYTHON/LANGCHAIN] LlamaCpp 클래스 : LLaMA2 모델을 사용해 로컬 RAG 애플리케이션 만들기

■ LlamaCpp 클래스에서 LLaMA2 모델을 사용해 로컬 RAG 애플리케이션을 만드는 방법을 보여준다.

▶ main.py


from langchain_community.llms import LlamaCpp

llamaCpp = LlamaCpp(
    model_path   = "./llama-2-13b-chat.Q4_0.gguf", # llama-2-13b-chat.Q4_0.gguf 파일 경로를 설정한다.
    n_gpu_layers = 1,                              # GPU 메모리에 로드할 레이어 수, 모델의 한 레이어만 GPU 메모리에 로드된다(1개이면 충분하다).
    n_batch      = 512,                            # 모델이 병렬로 처리해야 하는 토큰 수, 1과 n_ctx 사이의 값을 선택하는 것이 좋다
    n_ctx        = 2048,                           # 토큰 컨텍스트 창, 모델은 한 번에 2048개의 토큰 창을 고려한다.
    f16_kv       = True,                           # 모델이 키/값 캐시에 반정밀도를 사용해야 하는지 여부를 설정한다.
                                                   # 모델은 메모리 효율성이 더 높은 반정밀도를 사용한다.
                                                   # 반드시 True로 설정해야 하며, 그렇지 않으면 몇 번의 호출 후에 문제가 발생하게 된다.
    verbose      = False
)

resultString = llamaCpp.invoke("Simulate a rap battle between Stephen Colbert and John Oliver")

print(resultString)

"""
Introduction:
Welcome to the ultimate showdown of wit, humor, and intelligence as Stephen Colbert and John Oliver go head-to-head in a rap battle. The stakes are high, the crowd is rowdy, and the rhymes are blazing hot. So without further ado, let's get this battle started!

Verse 1 (Stephen Colbert):
Listen up, y'all, I'm the king of the game
My satire's so sharp, it'll leave you in shame
I've got the flow of a genius, the style of a pro
I'm the one they call Stephen, the Colbert-ino

Verse 2 (John Oliver):
Oh no, not another pretender to the throne
I've been here for years, and I'm still going strong
My jokes are so sharp, they'll cut like a knife
I'm the one they call John, the Oliver-life

Chorus:
It's Colbert vs. Oliver, the ultimate rap battle
Two comedic giants, going head to head
Who will come out on top?
"""

from langchain_community.llms import LlamaCpp

llamaCpp = LlamaCpp(

model_path = "./llama-2-13b-chat.Q4_0.gguf", # llama-2-13b-chat.Q4_0.gguf 파일 경로를 설정한다.

n_gpu_layers = 1, # GPU 메모리에 로드할 레이어 수, 모델의 한 레이어만 GPU 메모리에 로드된다(1개이면 충분하다).

n_batch = 512, # 모델이 병렬로 처리해야 하는 토큰 수, 1과 n_ctx 사이의 값을 선택하는 것이 좋다

n_ctx = 2048, # 토큰 컨텍스트 창, 모델은 한 번에 2048개의 토큰 창을 고려한다.

f16_kv = True, # 모델이 키/값 캐시에 반정밀도를 사용해야 하는지 여부를 설정한다.

# 모델은 메모리 효율성이 더 높은 반정밀도를 사용한다.

# 반드시 True로 설정해야 하며, 그렇지 않으면 몇 번의 호출 후에 문제가 발생하게 된다.

verbose = False

)

resultString = llamaCpp.invoke("Simulate a rap battle between Stephen Colbert and John Oliver")

print(resultString)

"""

Introduction:

Welcome to the ultimate showdown of wit, humor, and intelligence as Stephen Colbert and John Oliver go head-to-head in a rap battle. The stakes are high, the crowd is rowdy, and the rhymes are blazing hot. So without further ado, let's get this battle started!

Verse 1 (Stephen Colbert):

Listen up, y'all, I'm the king of the game

My satire's so sharp, it'll leave you in shame

I've got the flow of a genius, the style of a pro

I'm the one they call Stephen, the Colbert-ino

Verse 2 (John Oliver):

Oh no, not another pretender to the throne

I've been here for years, and I'm still going strong

My jokes are so sharp, they'll cut like a knife

I'm the one they call John, the Oliver-life

Chorus:

It's Colbert vs. Oliver, the ultimate rap battle

Two comedic giants, going head to head

Who will come out on top?

"""

※ llama-2-13b-chat.Q4_0.gguf 모델 파일은 https://gpt4all.io 웹 사이트에서 다운로드 받은 [Desktop Chat Client] 프로그램에서 다운로드 받을 수 있다.

▶ requirements.txt


aiohttp==3.9.5
aiosignal==1.3.1
annotated-types==0.7.0
async-timeout==4.0.3
attrs==23.2.0
certifi==2024.6.2
charset-normalizer==3.3.2
dataclasses-json==0.6.7
diskcache==5.6.3
frozenlist==1.4.1
greenlet==3.0.3
idna==3.7
Jinja2==3.1.4
jsonpatch==1.33
jsonpointer==3.0.0
langchain==0.2.3
langchain-community==0.2.4
langchain-core==0.2.5
langchain-text-splitters==0.2.1
langsmith==0.1.77
llama_cpp_python==0.2.78
MarkupSafe==2.1.5
marshmallow==3.21.3
multidict==6.0.5
mypy-extensions==1.0.0
numpy==1.26.4
orjson==3.10.4
packaging==23.2
pydantic==2.7.4
pydantic_core==2.18.4
PyYAML==6.0.1
requests==2.32.3
SQLAlchemy==2.0.30
tenacity==8.3.0
typing-inspect==0.9.0
typing_extensions==4.12.2
urllib3==2.2.1
yarl==1.9.4

aiohttp==3.9.5

aiosignal==1.3.1

annotated-types==0.7.0

async-timeout==4.0.3

attrs==23.2.0

certifi==2024.6.2

charset-normalizer==3.3.2

dataclasses-json==0.6.7

diskcache==5.6.3

frozenlist==1.4.1

greenlet==3.0.3

idna==3.7

Jinja2==3.1.4

jsonpatch==1.33

jsonpointer==3.0.0

langchain==0.2.3

langchain-community==0.2.4

langchain-core==0.2.5

langchain-text-splitters==0.2.1

langsmith==0.1.77

llama_cpp_python==0.2.78

MarkupSafe==2.1.5

marshmallow==3.21.3

multidict==6.0.5

mypy-extensions==1.0.0

numpy==1.26.4

orjson==3.10.4

packaging==23.2

pydantic==2.7.4

pydantic_core==2.18.4

PyYAML==6.0.1

requests==2.32.3

SQLAlchemy==2.0.30

tenacity==8.3.0

typing-inspect==0.9.0

typing_extensions==4.12.2

urllib3==2.2.1

yarl==1.9.4

※ pip install langchain-community llama-cpp-python 명령을 실행했다.

AI LAMACPP LANGCHAIN LLM MODEL PYTHON

icodebroker

[PYTHON/LANGCHAIN] LlamaCpp 클래스 : LLaMA2 모델을 사용해 로컬 RAG 애플리케이션 만들기

분류

가장 많이 읽힌 글

보관함