■ SentenceTransformersTokenTextSplitter 클래스의 count_tokens 메소드를 사용해 토큰 수를 구하는 방법을 보여준다.
▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
from langchain_text_splitters import SentenceTransformersTokenTextSplitter with open("state_of_the_union.txt") as textIOWrapper: fileContent = textIOWrapper.read() sentenceTransformersTokenTextSplitter = SentenceTransformersTokenTextSplitter( model_name = "sentence-transformers/all-mpnet-base-v2", tokens_per_chunk = 384, chunk_overlap = 32 ) tokenCount = sentenceTransformersTokenTextSplitter.count_tokens(text = fileContent) print(tokenCount) """ 8093 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
annotated-types==0.7.0 certifi==2024.6.2 charset-normalizer==3.3.2 filelock==3.15.4 fsspec==2024.6.1 huggingface-hub==0.23.4 idna==3.7 Jinja2==3.1.4 joblib==1.4.2 jsonpatch==1.33 jsonpointer==3.0.0 langchain-core==0.2.10 langchain-text-splitters==0.2.2 langsmith==0.1.82 MarkupSafe==2.1.5 mpmath==1.3.0 networkx==3.3 numpy==1.26.4 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.20.5 nvidia-nvjitlink-cu12==12.5.40 nvidia-nvtx-cu12==12.1.105 orjson==3.10.5 packaging==24.1 pillow==10.3.0 pydantic==2.7.4 pydantic_core==2.18.4 PyYAML==6.0.1 regex==2024.5.15 requests==2.32.3 safetensors==0.4.3 scikit-learn==1.5.0 scipy==1.14.0 sentence-transformers==3.0.1 sympy==1.12.1 tenacity==8.4.2 threadpoolctl==3.5.0 tokenizers==0.19.1 torch==2.3.1 tqdm==4.66.4 transformers==4.42.3 triton==2.3.1 typing_extensions==4.12.2 urllib3==2.2.2 |
※ pip install langchain-text-splitters sentence-transformers 명령을 실행했다.