■ SemanticChunker 클래스의 생성자에서 breakpoint_threshold_type 인자를 사용해 문서 분할 임계값 기준을 설정하는 방법을 보여준다.
※ breakpoint_threshold_type 인자의 디폴트 값은 "percentile"이고, "percentile", "standard_deviation", "interquartile", "gradient" 값 중 하나를 선택할 수 있다.
※ OPENAI_API_KEY 환경 변수 값은 .env 파일에 정의한다.
▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
from dotenv import load_dotenv from langchain_openai.embeddings import OpenAIEmbeddings from langchain_experimental.text_splitter import SemanticChunker load_dotenv() with open("state_of_the_union.txt") as textIOWrapper: fileContent = textIOWrapper.read() openAIEmbeddings = OpenAIEmbeddings() semanticChunker = SemanticChunker(openAIEmbeddings, breakpoint_threshold_type = "percentile") documentList = semanticChunker.create_documents([fileContent]) print(len(documentList)) print() print(documentList[0].page_content) """ 26 Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. Last year COVID-19 kept us apart. This year we are finally together again. Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an unwavering resolve that freedom will always triumph over tyranny. Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. He met the Ukrainian people. From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight. Let each of us here tonight in this Chamber send an unmistakable signal to Ukraine and to the world. Please rise if you are able and show that, Yes, we the United States of America stand with the Ukrainian people. Throughout our history we’ve learned this lesson when dictators do not pay a price for their aggression they cause more chaos. They keep moving. """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
aiohttp==3.9.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.4.0 async-timeout==4.0.3 attrs==23.2.0 certifi==2024.6.2 charset-normalizer==3.3.2 dataclasses-json==0.6.7 distro==1.9.0 exceptiongroup==1.2.1 frozenlist==1.4.1 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 jsonpatch==1.33 jsonpointer==3.0.0 langchain==0.2.6 langchain-community==0.2.6 langchain-core==0.2.10 langchain-experimental==0.0.62 langchain-openai==0.1.13 langchain-text-splitters==0.2.2 langsmith==0.1.82 marshmallow==3.21.3 multidict==6.0.5 mypy-extensions==1.0.0 numpy==1.26.4 openai==1.35.7 orjson==3.10.5 packaging==24.1 pydantic==2.7.4 pydantic_core==2.18.4 python-dotenv==1.0.1 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 sniffio==1.3.1 SQLAlchemy==2.0.31 tenacity==8.4.2 tiktoken==0.7.0 tqdm==4.66.4 typing-inspect==0.9.0 typing_extensions==4.12.2 urllib3==2.2.2 yarl==1.9.4 |
※ pip install python-dotenv langchain-experimental langchain-openai 명령을 실행했다.