[PYTHON/LANGCHAIN] 답변 생성 후 다음 모델에서 자체 답변에 인용문을 주석으로 추가하기

■ 답변 생성 후 다음 모델에서 자체 답변에 인용문을 주석으로 추가하는 방법을 보여준다.

※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.

▶ main.py


from dotenv                         import load_dotenv
from langchain_community.retrievers import WikipediaRetriever
from langchain_core.prompts         import ChatPromptTemplate
from langchain_core.prompts         import MessagesPlaceholder
from langchain_openai               import ChatOpenAI
from pydantic                       import BaseModel
from typing                         import List
from pydantic                       import Field
from langchain_core.runnables       import RunnableParallel
from langchain_core.runnables       import RunnablePassthrough

load_dotenv()

wikipediaRetriever = WikipediaRetriever(top_k_results = 6, doc_content_chars_max = 2000)

systemString = """You're a helpful AI assistant. Given a user question and some Wikipedia article snippets, answer the user question and provide citations. If none of the articles answer the question, just say you don't know.

Remember, you must return both an answer and citations. A citation consists of a VERBATIM quote that justifies the answer and the ID of the quote article. Return a citation for every quote across all articles that justify the answer. Use the following format for your final output :

<cited_answer>
    <answer></answer>
    <citations>
        <citation><source_id></source_id><quote></quote></citation>
        <citation><source_id></source_id><quote></quote></citation>
        ...
    </citations>
</cited_answer>

Here are the Wikipedia articles : {context}"""

chatPromptTemplate = ChatPromptTemplate.from_messages(
    [
        ("system", systemString),
        ("human" , "{question}"),
        MessagesPlaceholder("chat_history", optional = True)
    ]
)

chatOpenAI = ChatOpenAI(model = "gpt-4o-mini")

runnableSequence1 = chatPromptTemplate | chatOpenAI

class Citation(BaseModel):
    source_id : int = Field(..., description = "The integer ID of a SPECIFIC source which justifies the answer."        )
    quote     : str = Field(..., description = "The VERBATIM quote from the specified source that justifies the answer.")

class AnnotatedAnswer(BaseModel):
    """Annotate the answer to the user question with quote citations that justify the answer."""

    citations : List[Citation] = Field(..., description = "Citations from the given sources that justify the answer.")

runnableSequence2 = chatOpenAI.with_structured_output(AnnotatedAnswer)

runnableSequence3 = chatPromptTemplate | runnableSequence2

runnableSequence4 = (
    RunnableParallel(question = RunnablePassthrough(), documents = (lambda x : x["input"]) | wikipediaRetriever)
    .assign(context      = format)
    .assign(ai_message   = runnableSequence1)
    .assign(chat_history = (lambda x : [x["ai_message"]]), answer = (lambda x : x["ai_message"].content))
    .assign(annotations  = runnableSequence3)
    .pick(["answer", "documents", "annotations"])
)

responseDictionary = runnableSequence4.invoke({"input" : "How fast are cheetahs?"})

print(responseDictionary["answer"])
print("-" * 50)

print(responseDictionary["annotations"])
print("-" * 50)

"""
<cited_answer>
    <answer>Cheetahs are capable of running at speeds of 93 to 104 km/h (58 to 65 mph).</answer>
    <citations>
        <citation><source_id>1</source_id><quote>The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail.</quote></citation>
        <citation><source_id>4</source_id><quote>The fastest land animal is the cheetah.</quote></citation>
    </citations>
</cited_answer>
--------------------------------------------------
citations = [
    Citation(
        source_id = 1,
        quote     = 'The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail.'
    ),
    Citation(
        source_id = 3,
        quote     = 'The fastest land animal is the cheetah.'
    )
]
--------------------------------------------------
"""

from dotenv import load_dotenv

from langchain_community.retrievers import WikipediaRetriever

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.prompts import MessagesPlaceholder

from langchain_openai import ChatOpenAI

from pydantic import BaseModel

from typing import List

from pydantic import Field

from langchain_core.runnables import RunnableParallel

from langchain_core.runnables import RunnablePassthrough

load_dotenv()

wikipediaRetriever = WikipediaRetriever(top_k_results = 6, doc_content_chars_max = 2000)

systemString = """You're a helpful AI assistant. Given a user question and some Wikipedia article snippets, answer the user question and provide citations. If none of the articles answer the question, just say you don't know.

Remember, you must return both an answer and citations. A citation consists of a VERBATIM quote that justifies the answer and the ID of the quote article. Return a citation for every quote across all articles that justify the answer. Use the following format for your final output :

<cited_answer>

...

</citations>

</cited_answer>

Here are the Wikipedia articles : {context}"""

chatPromptTemplate = ChatPromptTemplate.from_messages(

[

("system", systemString),

("human" , "{question}"),

MessagesPlaceholder("chat_history", optional = True)

]

)

chatOpenAI = ChatOpenAI(model = "gpt-4o-mini")

runnableSequence1 = chatPromptTemplate | chatOpenAI

class Citation(BaseModel):

source_id : int = Field(..., description = "The integer ID of a SPECIFIC source which justifies the answer." )

quote : str = Field(..., description = "The VERBATIM quote from the specified source that justifies the answer.")

class AnnotatedAnswer(BaseModel):

"""Annotate the answer to the user question with quote citations that justify the answer."""

citations : List[Citation] = Field(..., description = "Citations from the given sources that justify the answer.")

runnableSequence2 = chatOpenAI.with_structured_output(AnnotatedAnswer)

runnableSequence3 = chatPromptTemplate | runnableSequence2

runnableSequence4 = (

RunnableParallel(question = RunnablePassthrough(), documents = (lambda x : x["input"]) | wikipediaRetriever)

.assign(context = format)

.assign(ai_message = runnableSequence1)

.assign(chat_history = (lambda x : [x["ai_message"]]), answer = (lambda x : x["ai_message"].content))

.assign(annotations = runnableSequence3)

.pick(["answer", "documents", "annotations"])

)

responseDictionary = runnableSequence4.invoke({"input" : "How fast are cheetahs?"})

print(responseDictionary["answer"])

print("-" * 50)

print(responseDictionary["annotations"])

print("-" * 50)

"""

<cited_answer>

<answer>Cheetahs are capable of running at speeds of 93 to 104 km/h (58 to 65 mph).</answer>

<citation><source_id>1</source_id><quote>The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail.</quote></citation>

<citation><source_id>4</source_id><quote>The fastest land animal is the cheetah.</quote></citation>

</citations>

</cited_answer>

--------------------------------------------------

citations = [

Citation(

source_id = 1,

quote = 'The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail.'

Citation(

source_id = 3,

quote = 'The fastest land animal is the cheetah.'

)

]

--------------------------------------------------

"""

▶ requirements.txt


aiohappyeyeballs==2.4.4
aiohttp==3.11.9
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.6.2.post1
attrs==24.2.0
beautifulsoup4==4.12.3
certifi==2024.8.30
charset-normalizer==3.4.0
colorama==0.4.6
dataclasses-json==0.6.7
distro==1.9.0
frozenlist==1.5.0
greenlet==3.1.1
h11==0.14.0
httpcore==1.0.7
httpx==0.28.0
httpx-sse==0.4.0
idna==3.10
jiter==0.8.0
jsonpatch==1.33
jsonpointer==3.0.0
langchain==0.3.9
langchain-community==0.3.9
langchain-core==0.3.21
langchain-openai==0.2.10
langchain-text-splitters==0.3.2
langsmith==0.1.147
marshmallow==3.23.1
multidict==6.1.0
mypy-extensions==1.0.0
numpy==2.1.3
openai==1.56.0
orjson==3.10.12
packaging==24.2
propcache==0.2.1
pydantic==2.10.2
pydantic-settings==2.6.1
pydantic_core==2.27.1
python-dotenv==1.0.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
requests-toolbelt==1.0.0
sniffio==1.3.1
soupsieve==2.6
SQLAlchemy==2.0.36
tenacity==9.0.0
tiktoken==0.8.0
tqdm==4.67.1
typing-inspect==0.9.0
typing_extensions==4.12.2
urllib3==2.2.3
wikipedia==1.4.0
yarl==1.18.3

aiohappyeyeballs==2.4.4

aiohttp==3.11.9

aiosignal==1.3.1

annotated-types==0.7.0

anyio==4.6.2.post1

attrs==24.2.0

beautifulsoup4==4.12.3

certifi==2024.8.30

charset-normalizer==3.4.0

colorama==0.4.6

dataclasses-json==0.6.7

distro==1.9.0

frozenlist==1.5.0

greenlet==3.1.1

h11==0.14.0

httpcore==1.0.7

httpx==0.28.0

httpx-sse==0.4.0

idna==3.10

jiter==0.8.0

jsonpatch==1.33

jsonpointer==3.0.0

langchain==0.3.9

langchain-community==0.3.9

langchain-core==0.3.21

langchain-openai==0.2.10

langchain-text-splitters==0.3.2

langsmith==0.1.147

marshmallow==3.23.1

multidict==6.1.0

mypy-extensions==1.0.0

numpy==2.1.3

openai==1.56.0

orjson==3.10.12

packaging==24.2

propcache==0.2.1

pydantic==2.10.2

pydantic-settings==2.6.1

pydantic_core==2.27.1

python-dotenv==1.0.1

PyYAML==6.0.2

regex==2024.11.6

requests==2.32.3

requests-toolbelt==1.0.0

sniffio==1.3.1

soupsieve==2.6

SQLAlchemy==2.0.36

tenacity==9.0.0

tiktoken==0.8.0

tqdm==4.67.1

typing-inspect==0.9.0

typing_extensions==4.12.2

urllib3==2.2.3

wikipedia==1.4.0

yarl==1.18.3

※ pip install python-dotenv langchain langchain-community langchain-openai wikipedia 명령을 실행했다.