[PYTHON/LANGCHAIN] TokenTextSplitter 클래스 : split_text 메소드를 사용해 문자열에서 문자열 리스트 구하기

icodebroker LANGCHAIN 2024-11-23

■ TokenTextSplitter 클래스의 split_text 메소드를 사용해 문자열에서 문자열 리스트를 구하는 방법을 보여준다.

▶ main.py


from langchain_text_splitters import TokenTextSplitter

with open("state_of_the_union.txt") as textIOWrapper:
    fileContent = textIOWrapper.read()

tokenTextSplitter = TokenTextSplitter(chunk_size = 10, chunk_overlap = 0)

stringList = tokenTextSplitter.split_text(fileContent)

for string in stringList[:3]:
    print(string)
    print()

"""
Madam Speaker, Madam Vice President, our

 First Lady and Second Gentleman. Members of Congress and

 the Cabinet. Justices of the Supreme Court.
"""

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

from langchain_text_splitters import TokenTextSplitter

with open("state_of_the_union.txt") as textIOWrapper:

fileContent = textIOWrapper.read()

tokenTextSplitter = TokenTextSplitter(chunk_size = 10, chunk_overlap = 0)

stringList = tokenTextSplitter.split_text(fileContent)

for string in stringList[:3]:

print(string)

print()

"""

Madam Speaker, Madam Vice President, our

First Lady and Second Gentleman. Members of Congress and

the Cabinet. Justices of the Supreme Court.

"""

▶ requirements.txt


annotated-types==0.7.0
certifi==2024.6.2
charset-normalizer==3.3.2
idna==3.7
jsonpatch==1.33
jsonpointer==3.0.0
langchain-core==0.2.10
langchain-text-splitters==0.2.2
langsmith==0.1.82
orjson==3.10.5
packaging==24.1
pydantic==2.7.4
pydantic_core==2.18.4
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
tenacity==8.4.2
tiktoken==0.7.0
typing_extensions==4.12.2
urllib3==2.2.2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

annotated-types==0.7.0

certifi==2024.6.2

charset-normalizer==3.3.2

idna==3.7

jsonpatch==1.33

jsonpointer==3.0.0

langchain-core==0.2.10

langchain-text-splitters==0.2.2

langsmith==0.1.82

orjson==3.10.5

packaging==24.1

pydantic==2.7.4

pydantic_core==2.18.4

PyYAML==6.0.1

regex==2024.5.15

requests==2.32.3

tenacity==8.4.2

tiktoken==0.7.0

typing_extensions==4.12.2

urllib3==2.2.2

※ pip install langchain-text-splitters tiktoken 명령을 실행했다.

state_of_the_union.zip

Post Views: 0

AI DOCUMENT DOCUMENT SPLITTER LANGCHAIN LLM PYTHON TEXT

icodebroker

[PYTHON/LANGCHAIN] TokenTextSplitter 클래스 : split_text 메소드를 사용해 문자열에서 문자열 리스트 구하기

분류

보관함