■ MarkdownHeaderTextSplitter 클래스의 생성자에서 headers_to_split_on/strip_headers 인자를 사용해 MarkdownHeaderTextSplitter 객체를 만드는 방법을 보여준다.
▶ 예제 코드 (PY)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
from langchain_text_splitters import MarkdownHeaderTextSplitter codeString = "# Foo\n\n ## Bar\n\nHi this is Jim\n\nHi this is Joe\n\n ### Boo \n\n Hi this is Lance \n\n ## Baz\n\n Hi this is Molly" headerTupleListToSplitOn = [ ("#" , "Header 1"), ("##" , "Header 2"), ("###", "Header 3") ] markdownHeaderTextSplitter = MarkdownHeaderTextSplitter(headers_to_split_on = headerTupleListToSplitOn, strip_headers = False) documentList = markdownHeaderTextSplitter.split_text(codeString) for document in documentList: print(document) """ page_content='# Foo ## Bar Hi this is Jim Hi this is Joe' metadata={'Header 1': 'Foo', 'Header 2': 'Bar'} page_content='### Boo Hi this is Lance' metadata={'Header 1': 'Foo', 'Header 2': 'Bar', 'Header 3': 'Boo'} page_content='## Baz Hi this is Molly' metadata={'Header 1': 'Foo', 'Header 2': 'Baz'} """ |
※ pip install langchain-text-splitters 명령을 실행했다.