■ FAISS 클래스의 similarity_search 메소드를 사용해 문자열 쿼리 유사성 기반으로 문서 리스트를 구하는 방법을 보여준다.
※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.
▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
from dotenv import load_dotenv from langchain_community.document_loaders import PyPDFLoader from langchain_openai import OpenAIEmbeddings from langchain_community.vectorstores import FAISS load_dotenv() pyPDFLoader = PyPDFLoader("./nke-10k-2023.pdf") sourceDocumentList = pyPDFLoader.load_and_split() openAIEmbeddings = OpenAIEmbeddings() faiss = FAISS.from_documents(sourceDocumentList, openAIEmbeddings) targetDocumentList = faiss.similarity_search("What is manufacturing?", k = 2) for targetDocument in targetDocumentList: print(str(targetDocument.metadata["page"]) + ":", targetDocument.page_content[:300]) """ 22: changes in labor standards, whether government mandated or otherwise, and increases in compliance costs due to governmental regulation concerning certain metals, fabrics or raw materials used in the manufacturing of our products. In addition, we cannot be certain that manufacturers that we do not 7: of total NIKE Brand apparel, respectively. For fiscal 2023, one apparel contract manufacturer accounted for more than 10% of apparel production, and the top five contract manufacturers in the aggregate accounted for approximately 52% of NIKE Brand apparel production. NIKE's contract manufacturers """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 faiss-gpu==1.7.2 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.6 langchain-community==0.2.6 langchain-core==0.2.10 langchain-openai==0.1.10 langchain-text-splitters==0.2.2 langsmith==0.1.82 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.35.6 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 pypdf==4.2.0 python-dotenv==1.0.1 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.31 tenacity==8.4.2 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
※ pip install python-dotenv langchain-community langchain-openai pypdf faiss-gpu 명령을 실행했다.