■ UnstructuredPowerPointLoader 클래스의 load 메소드를 사용해 MS POWER POINT 문서를 로드하는 방법을 보여준다.
▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from langchain_community.document_loaders import UnstructuredPowerPointLoader unstructuredPowerPointLoader = UnstructuredPowerPointLoader("sample-ppt.pptx") documentList = unstructuredPowerPointLoader.load() print(len(documentList)) """ 1 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
aiofiles==24.1.0 aiohappyeyeballs==2.4.4 aiohttp==3.11.11 aiosignal==1.3.2 annotated-types==0.7.0 anyio==4.8.0 async-timeout==4.0.3 attrs==24.3.0 backoff==2.2.1 beautifulsoup4==4.12.3 certifi==2024.12.14 cffi==1.17.1 chardet==5.2.0 charset-normalizer==3.4.1 click==8.1.8 cryptography==44.0.0 dataclasses-json==0.6.7 emoji==2.14.0 eval_type_backport==0.2.2 exceptiongroup==1.2.2 filetype==1.2.0 frozenlist==1.5.0 greenlet==3.1.1 h11==0.14.0 html5lib==1.1 httpcore==1.0.7 httpx==0.28.1 httpx-sse==0.4.0 idna==3.10 joblib==1.4.2 jsonpatch==1.33 jsonpath-python==1.0.6 jsonpointer==3.0.0 langchain==0.3.14 langchain-community==0.3.14 langchain-core==0.3.29 langchain-text-splitters==0.3.5 langdetect==1.0.9 langsmith==0.2.10 lxml==5.3.0 marshmallow==3.25.1 multidict==6.1.0 mypy-extensions==1.0.0 ndjson==0.3.1 nest-asyncio==1.6.0 nltk==3.9.1 numpy==1.26.4 olefile==0.47 orjson==3.10.14 packaging==24.2 pillow==11.1.0 propcache==0.2.1 psutil==6.1.1 pycparser==2.22 pydantic==2.9.2 pydantic-settings==2.7.1 pydantic_core==2.23.4 pypdf==5.1.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 python-iso639==2024.10.22 python-magic==0.4.27 python-oxmsg==0.0.1 python-pptx==1.0.2 PyYAML==6.0.2 RapidFuzz==3.11.0 regex==2024.11.6 requests==2.32.3 requests-toolbelt==1.0.0 six==1.17.0 sniffio==1.3.1 soupsieve==2.6 SQLAlchemy==2.0.37 tenacity==9.0.0 tqdm==4.67.1 typing-inspect==0.9.0 typing_extensions==4.12.2 unstructured==0.16.12 unstructured-client==0.28.1 urllib3==2.3.0 webencodings==0.5.1 wrapt==1.17.1 XlsxWriter==3.2.0 yarl==1.18.3 |
※ pip install langchain langchain-community unstructured python-pptx 명령을 실행했다.