■ 엑셀에서 OLLAMA에게 질문을 하는 사용자 함수를 추가하는 방법을 보여준다. ▶ TestLibrary.csproj
|
<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <TargetFramework>net6.0-windows</TargetFramework> <ImplicitUsings>disable</ImplicitUsings> <Nullable>disable</Nullable> </PropertyGroup> <ItemGroup> <PackageReference Include="ExcelDna.AddIn" Version="1.8.0" /> </ItemGroup> </Project> |
▶ launchSettings.json
|
{ "profiles": { "Excel": { "commandName": "Executable", "executablePath": "C:\\Program Files\\Microsoft Office\\root\\Office16\\EXCEL.EXE", "commandLineArgs": "/x \"TestLibrary-AddIn64.xll\"" } } } |
▶ OllamaClient.cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
|
using System.Net.Http; using System.Text; using System.Text.Json; namespace TestLibrary; /// <summary> /// OLLAMA 클라이언트 /// </summary> public class OllamaClient { //////////////////////////////////////////////////////////////////////////////////////////////////// Field ////////////////////////////////////////////////////////////////////////////////////////// Private #region Field /// <summary> /// 기본 URL /// </summary> private readonly string baseURL; /// <summary> /// 모델명 /// </summary> private readonly string modelName; /// <summary> /// HTTP 클라이언트 /// </summary> private readonly HttpClient client; #endregion //////////////////////////////////////////////////////////////////////////////////////////////////// Constructor ////////////////////////////////////////////////////////////////////////////////////////// Public #region 생성자 - OllamaClient(baseURL, modelName) /// <summary> /// 생성자 /// </summary> /// <param name="baseURL">기본 URL</param> /// <param name="modelName">모델명</param> public OllamaClient(string baseURL = "http://localhost:11434", string modelName = "bnksys/eeve-yanolja-v1:latest") { this.baseURL = baseURL; this.modelName = modelName; this.client = new HttpClient(); } #endregion //////////////////////////////////////////////////////////////////////////////////////////////////// Method ////////////////////////////////////////////////////////////////////////////////////////// Public #region 질문하기 - Ask(question) /// <summary> /// 질문하기 /// </summary> /// <param name="question">질문</param> /// <returns>답변</returns> public string Ask(string question) { var requestContent = new { model = this.modelName, prompt = question, stream = false, raw = false }; string json = JsonSerializer.Serialize(requestContent); StringContent stringContent = new StringContent(json, Encoding.UTF8, "application/json"); using(HttpResponseMessage httpResponseMessage = client.PostAsync($"{baseURL}/api/generate", stringContent).Result) { httpResponseMessage.EnsureSuccessStatusCode(); string respnseJSON = httpResponseMessage.Content.ReadAsStringAsync().Result; using(JsonDocument jsonElement = JsonDocument.Parse(respnseJSON)) { JsonElement rootJSONElement = jsonElement.RootElement; if(rootJSONElement.TryGetProperty("response", out JsonElement responseElement)) { return responseElement.GetString() ?? string.Empty; } return string.Empty; } } } #endregion } |
▶ CustomFunction.cs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
|
using ExcelDna.Integration; namespace TestLibrary; /// <summary> /// 커스텀 함수 /// </summary> public static class CustomFunction { //////////////////////////////////////////////////////////////////////////////////////////////////// Method ////////////////////////////////////////////////////////////////////////////////////////// Public #region OLLAMA 질문하기 - AskOllama(question) /// <summary> /// OLLAMA 질문하기 /// </summary> /// <param name="question">질문</param> /// <returns>답변</returns> [ExcelFunction(Description = "Ollama에게 질문을 합니다.")] public static string AskOllama(string question) { OllamaClient ollamaClient = new OllamaClient(); string answer = ollamaClient.Ask(question); return answer; } #endregion } |
더 읽기
■ ExcelDna.AddIn 누겟을 설치하는 방법을 보여준다. 1. Visual Studio를 실행한다. 2. [도구] / [NuGet 패키지 관리자] / [패키지 관리자 콘솔] 메뉴를 실행한다.
더 읽기
■ read_excel 함수의 index_col/na_values 인자를 사용해 EXCEL 파일 데이터를 로드하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
|
import pandas as pd import numpy as np datetimeIndex = pd.date_range("2000/01/01", periods = 1000) dataFrame1 = pd.DataFrame( np.random.randn(1000, 4), index = datetimeIndex, columns = ["A", "B", "C", "D"] ) dataFrame1.to_excel("test.xlsx", sheet_name = "Sheet1") print(dataFrame1) """ A B C D 2000-01-01 0.053874 0.285479 2.200199 -0.497760 2000-01-02 0.382564 0.032303 2.176588 0.970032 2000-01-03 -0.960918 0.380824 -0.187437 -0.691698 2000-01-04 -0.854058 -1.280247 -1.469207 -1.037654 2000-01-05 0.141451 -0.025518 1.005225 -0.792409 ... ... ... ... ... 2002-09-22 -0.827884 0.614906 -0.871884 1.219365 2002-09-23 0.083260 0.468676 -0.803836 0.798946 2002-09-24 0.908064 -0.526946 0.538067 0.067274 2002-09-25 1.715925 -1.253660 -0.327901 -2.050811 2002-09-26 -0.224802 0.335902 -1.158583 -0.413148 [1000 rows x 4 columns] """ print() dataFrame2 = pd.read_excel("test.xlsx", "Sheet1", index_col = None, na_values = ["NA"]) print(dataFrame2) """ Unnamed: 0 A B C D 0 2000-01-01 0.053874 0.285479 2.200199 -0.497760 1 2000-01-02 0.382564 0.032303 2.176588 0.970032 2 2000-01-03 -0.960918 0.380824 -0.187437 -0.691698 3 2000-01-04 -0.854058 -1.280247 -1.469207 -1.037654 4 2000-01-05 0.141451 -0.025518 1.005225 -0.792409 .. ... ... ... ... ... 995 2002-09-22 -0.827884 0.614906 -0.871884 1.219365 996 2002-09-23 0.083260 0.468676 -0.803836 0.798946 997 2002-09-24 0.908064 -0.526946 0.538067 0.067274 998 2002-09-25 1.715925 -1.253660 -0.327901 -2.050811 999 2002-09-26 -0.224802 0.335902 -1.158583 -0.413148 [1000 rows x 5 columns] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 packaging==24.2 pandas==2.2.3 python-calamine==0.3.1 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]"
더 읽기
■ StringMethods 클래스에서 [] 연산자를 사용해 문자열을 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) stringMethods = dataFrame["sex"].str series = stringMethods[0:1] print(series) """ 0 F 1 M 2 M 3 M 4 F .. 239 M 240 F 241 M 242 M 243 F Name: sex, Length: 244, dtype: object """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]" 명령을 실행했다.
더 읽기
■ StringMethods 클래스의 find 메소드를 사용해 해당 문자열을 찾는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) stringMethods = dataFrame["sex"].str series = stringMethods.find("ale") # 없는 경우 -1을 반환한다. print(series) """ 0 3 1 1 2 1 3 1 4 3 .. 239 1 240 3 241 1 242 1 243 3 Name: sex, Length: 244, dtype: int64 """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]" 명령을
더 읽기
■ StringMethods 클래스의 rstrip 메소드를 사용해 문자열에서 후행 공백을 제거하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) stringMethods1 = dataFrame["time"].str series = stringMethods1.rstrip() stringMethods2 = series.str dataFrame["time_length"] = stringMethods2.len() print(dataFrame) """ total_bill tip sex smoker day time size time_length 0 16.99 1.01 Female No Sun Dinner 2 6 1 10.34 1.66 Male No Sun Dinner 3 6 2 21.01 3.50 Male No Sun Dinner 3 6 3 23.68 3.31 Male No Sun Dinner 2 6 4 24.59 3.61 Female No Sun Dinner 4 6 .. ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 6 240 27.18 2.00 Female Yes Sat Dinner 2 6 241 22.67 2.00 Male Yes Sat Dinner 2 6 242 17.82 1.75 Male No Sat Dinner 2 6 243 18.78 3.00 Female No Thur Dinner 2 6 [244 rows x 8 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]"
더 읽기
■ StringMethods 클래스의 len 메소드를 사용해 DataFrame 객체에 문자열 길이를 갖는 신규 컬럼을 추가하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) stringMethods = dataFrame["time"].str dataFrame["time_length"] = stringMethods.len() print(dataFrame) """ total_bill tip sex smoker day time size time_length 0 16.99 1.01 Female No Sun Dinner 2 6 1 10.34 1.66 Male No Sun Dinner 3 6 2 21.01 3.50 Male No Sun Dinner 3 6 3 23.68 3.31 Male No Sun Dinner 2 6 4 24.59 3.61 Female No Sun Dinner 4 6 .. ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 6 240 27.18 2.00 Female Yes Sat Dinner 2 6 241 22.67 2.00 Male Yes Sat Dinner 2 6 242 17.82 1.75 Male No Sat Dinner 2 6 243 18.78 3.00 Female No Thur Dinner 2 6 [244 rows x 8 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
더 읽기
■ DatetimeProperties 클래스의 to_period 메소드를 사용해 DataFrame 객체에서 두 날짜 컬럼 값 사이의 월 수를 갖는 신규 컬럼을 추가하는 방법을 보여준다. ▶
더 읽기
■ MonthBegin 클래스를 사용해 DataFrame 객체에서 날짜 컬럼의 값을 기준으로 다음 달 시작일을 갖는 신규 컬럼을 추가하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) dataFrame["date1"] = pd.Timestamp("2013-01-15") dataFrame["date1_next" ] = dataFrame["date1"] + pd.offsets.MonthBegin() print(dataFrame) """ total_bill tip sex smoker day time size date1 date1_next 0 16.99 1.01 Female No Sun Dinner 2 2013-01-15 2013-02-01 1 10.34 1.66 Male No Sun Dinner 3 2013-01-15 2013-02-01 2 21.01 3.50 Male No Sun Dinner 3 2013-01-15 2013-02-01 3 23.68 3.31 Male No Sun Dinner 2 2013-01-15 2013-02-01 4 24.59 3.61 Female No Sun Dinner 4 2013-01-15 2013-02-01 .. ... ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 2013-01-15 2013-02-01 240 27.18 2.00 Female Yes Sat Dinner 2 2013-01-15 2013-02-01 241 22.67 2.00 Male Yes Sat Dinner 2 2013-01-15 2013-02-01 242 17.82 1.75 Male No Sat Dinner 2 2013-01-15 2013-02-01 243 18.78 3.00 Female No Thur Dinner 2 2013-01-15 2013-02-01 [244 rows x 9 columns] """ |
더 읽기
■ DatetimeProperties 클래스의 year/month/day 속성을 사용해 DataFrame 객체 컬럼의 값을 설정하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) dataFrame["date1"] = pd.Timestamp("2013-01-15") datetimeProperties = dataFrame["date1"].dt dataFrame["date1_year" ] = datetimeProperties.year dataFrame["date1_month"] = datetimeProperties.month dataFrame["date1_day" ] = datetimeProperties.day print(dataFrame) """ total_bill tip sex smoker day time size date1 date1_year date1_month date1_day 0 16.99 1.01 Female No Sun Dinner 2 2013-01-15 2013 1 15 1 10.34 1.66 Male No Sun Dinner 3 2013-01-15 2013 1 15 2 21.01 3.50 Male No Sun Dinner 3 2013-01-15 2013 1 15 3 23.68 3.31 Male No Sun Dinner 2 2013-01-15 2013 1 15 4 24.59 3.61 Female No Sun Dinner 4 2013-01-15 2013 1 15 .. ... ... ... ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 2013-01-15 2013 1 15 240 27.18 2.00 Female Yes Sat Dinner 2 2013-01-15 2013 1 15 241 22.67 2.00 Male Yes Sat Dinner 2 2013-01-15 2013 1 15 242 17.82 1.75 Male No Sat Dinner 2 2013-01-15 2013 1 15 243 18.78 3.00 Female No Thur Dinner 2 2013-01-15 2013 1 15 [244 rows x 11 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install
더 읽기
■ where 함수를 사용해 특정 조건에 따라 값을 설정하는 DataFrame 객체의 신규 컬럼을 추가하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
|
import pandas as pd import numpy as np dataFrame = pd.read_excel("tips.xlsx", index_col = 0) dataFrame["bucket"] = np.where(dataFrame["total_bill"] < 10, "low", "high") print(dataFrame) """ total_bill tip sex smoker day time size bucket 0 16.99 1.01 Female No Sun Dinner 2 high 1 10.34 1.66 Male No Sun Dinner 3 high 2 21.01 3.50 Male No Sun Dinner 3 high 3 23.68 3.31 Male No Sun Dinner 2 high 4 24.59 3.61 Female No Sun Dinner 4 high .. ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 high 240 27.18 2.00 Female Yes Sat Dinner 2 high 241 22.67 2.00 Male Yes Sat Dinner 2 high 242 17.82 1.75 Male No Sat Dinner 2 high 243 18.78 3.00 Female No Thur Dinner 2 high [244 rows x 8 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
더 읽기
■ read_excel 함수의 index_col 인자를 사용해 엑셀 파일 데이터를 로드하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) print(dataFrame) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]"
더 읽기
■ DataFrame 클래스의 to_excel 메소드를 사용해 엑셀 파일을 생성하는 방법을 보여준다. ▶ main.py
|
import pandas as pd dataFrame = pd.read_table("tips.csv", sep = ",", header = [0]) dataFrame.to_excel("tips.xlsx") |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]" 명령을
더 읽기
■ DataFrame 클래스의 iloc 속성을 사용해 값을 설정하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame = pd.read_excel("titanic.xlsx", sheet_name = "passengers") dataFrame.iloc[0:3, 3] = "anonymous" print(dataFrame) """ PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 anonymous male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 1 1 anonymous female 38.0 1 0 PC 17599 71.2833 C85 C 2 3 1 3 anonymous female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S .. ... ... ... ... ... ... ... ... ... ... ... ... 886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S 887 888 1 1 Graham, Miss Margaret Edith female 19.0 0 0 112053 30.0000 B42 S 888 889 0 3 Johnston, Miss Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S 889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C 890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q [891 rows x 12 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]" 명령을 실행했다.
더 읽기
■ DataFrame 클래스의 iloc 속성을 사용해 부분 데이터 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
|
import pandas as pd dataFrame1 = pd.read_excel("titanic.xlsx", sheet_name = "passengers") dataFrame2 = dataFrame1.iloc[9:25, 2:5] print(dataFrame2) """ Pclass Name Sex 9 2 Nasser, Mrs. Nicholas (Adele Achem) female 10 3 Sandstrom, Miss Marguerite Rut female 11 1 Bonnell, Miss Elizabeth female 12 3 Saundercock, Mr. William Henry male 13 3 Andersson, Mr. Anders Johan male 14 3 Vestrom, Miss Hulda Amanda Adolfina female 15 2 Hewlett, Mrs. (Mary D Kingcome) female 16 3 Rice, Master Eugene male 17 2 Williams, Mr. Charles Eugene male 18 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female 19 3 Masselmani, Mrs. Fatima female 20 2 Fynney, Mr. Joseph J male 21 2 Beesley, Mr. Lawrence male 22 3 McGowan, Miss Anna "Annie" female 23 1 Sloper, Mr. William Thompson male 24 3 Palsson, Miss Torborg Danira female """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install
더 읽기
■ DataFrame 클래스의 loc 속성을 사용해 특정 조건을 만족하는 특정 컬럼 Series 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
import pandas as pd dataFrame = pd.read_excel("titanic.xlsx", sheet_name = "passengers") series = dataFrame.loc[dataFrame["Age"] > 35, "Name"] print(series) """ 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 6 McCarthy, Mr. Timothy J 11 Bonnell, Miss Elizabeth 13 Andersson, Mr. Anders Johan 15 Hewlett, Mrs. (Mary D Kingcome) ... 865 Bystrom, Mrs. (Karolina) 871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 873 Vander Cruyssen, Mr. Victor 879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 885 Rice, Mrs. William (Margaret Norton) Name: Name, Length: 217, dtype: object """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
더 읽기
■ DataFrame 클래스에서 [] 연산자를 사용해 특정 컬럼에서 값을 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame1 = pd.read_excel("titanic.xlsx", sheet_name = "passengers") dataFrame2 = dataFrame[dataFrame1["Age"].notna()] print(dataFrame2) """ PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 2 3 1 3 Heikkinen, Miss Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S .. ... ... ... ... ... ... ... ... ... ... ... ... 885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Q 886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S 887 888 1 1 Graham, Miss Margaret Edith female 19.0 0 0 112053 30.0000 B42 S 889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C 890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q [714 rows x 12 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※
더 읽기
■ DataFrame 클래스에서 [] 연산자를 사용해 특정 컬럼에서 값을 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame1 = pd.read_excel("titanic.xlsx", sheet_name = "passengers") dataFrame2 = dataFrame[dataFrame1["Age"].notna()] print(dataFrame2) """ PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 2 3 1 3 Heikkinen, Miss Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S .. ... ... ... ... ... ... ... ... ... ... ... ... 885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Q 886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S 887 888 1 1 Graham, Miss Margaret Edith female 19.0 0 0 112053 30.0000 B42 S 889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C 890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q [714 rows x 12 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※
더 읽기
■ Series 클래스의 notna 메소드를 사용해 값 여부 Series 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame = pd.read_excel("titanic.xlsx", sheet_name = "passengers") series1 = dataFrame["Age"] series2 = series1.notna() print(series2) """ 0 True 1 True 2 True 3 True 4 True ... 886 True 887 True 888 False 889 True 890 True Name: Age, Length: 891, dtype: bool """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install
더 읽기
■ DataFrame 클래스에서 [] 연산자를 사용해 특정 컬럼에서 특정 값들을 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
import pandas as pd dataFrame1 = pd.read_excel("titanic.xlsx", sheet_name = "passengers") series = dataFrame1["Pclass"] dataFrame2 = dataFrame1[(dataFrame1["Pclass"] == 2) | (dataFrame1["Pclass"] == 3)] print(dataFrame2) """ PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 2 3 1 3 Heikkinen, Miss Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S 5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q 7 8 0 3 Palsson, Master Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S .. ... ... ... ... ... ... ... ... ... ... ... ... 884 885 0 3 Sutehall, Mr. Henry Jr male 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S 885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Q 886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S 888 889 0 3 Johnston, Miss Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S 890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q [675 rows x 12 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
더 읽기
■ DataFrame 클래스에서 [] 연산자를 사용해 특정 컬럼에서 특정 값들을 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
import pandas as pd dataFrame1 = pd.read_excel("titanic.xlsx", sheet_name = "passengers") series = dataFrame1["Pclass"] dataFrame2 = dataFrame1[series.isin([2, 3])] print(dataFrame2) """ PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S 2 3 1 3 Heikkinen, Miss Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S 4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S 5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q 7 8 0 3 Palsson, Master Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S .. ... ... ... ... ... ... ... ... ... ... ... ... 884 885 0 3 Sutehall, Mr. Henry Jr male 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S 885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Q 886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S 888 889 0 3 Johnston, Miss Catherine Helen "Carrie" female NaN 1 2 W./C. 6607 23.4500 NaN S 890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q [675 rows x 12 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
더 읽기
■ Series 클래스의 isin 메소드를 사용해 지정 값 리스트에 포함 여부를 갖는 Series 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame = pd.read_excel("titanic.xlsx", sheet_name = "passengers") series1 = dataFrame["Pclass"] series2 = series1.isin([2, 3]) print(series2) """ 0 True 1 False 2 True 3 False 4 True ... 886 True 887 False 888 True 889 False 890 True Name: Pclass, Length: 891, dtype: bool """ |
▶ requirements.txt
더 읽기
■ DataFrame 클래스에서 특정 조건에 해당하는 데이터를 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame1 = pd.read_excel("titanic.xlsx", sheet_name = "passengers") dataFrame2 = dataFrame1[dataFrame1["Age"] > 35] print(dataFrame2) """ PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C 6 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S 11 12 1 1 Bonnell, Miss Elizabeth female 58.0 0 0 113783 26.5500 C103 S 13 14 0 3 Andersson, Mr. Anders Johan male 39.0 1 5 347082 31.2750 NaN S 15 16 1 2 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 0 248706 16.0000 NaN S .. ... ... ... ... ... ... ... ... ... ... ... ... 865 866 1 2 Bystrom, Mrs. (Karolina) female 42.0 0 0 236852 13.0000 NaN S 871 872 1 1 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 1 11751 52.5542 D35 S 873 874 0 3 Vander Cruyssen, Mr. Victor male 47.0 0 0 345765 9.0000 NaN S 879 880 1 1 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 1 11767 83.1583 C50 C 885 886 0 3 Rice, Mrs. William (Margaret Norton) female 39.0 0 5 382652 29.1250 NaN Q [217 rows x 12 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install
더 읽기
■ DataFrame 클래스에서 [] 연산자를 사용해 특정 컬럼들의 데이터를 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame1 = pd.read_excel("titanic.xlsx", sheet_name = "passengers") dataFrame2 = dataFrame1[["Age", "Sex"]] print(dataFrame2) """ Age Sex 0 22.0 male 1 38.0 female 2 26.0 female 3 35.0 female 4 35.0 male .. ... ... 886 27.0 male 887 19.0 female 888 NaN female 889 26.0 male 890 32.0 male [891 rows x 2 columns] """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※
더 읽기
■ Series 클래스의 shape 속성을 사용해 데이터 차원을 구하는 방법을 보여준다. ▶ main.py
|
import pandas as pd dataFrame = pd.read_excel("titanic.xlsx", sheet_name = "passengers") series = dataFrame["Age"] shapeTuple = series.shape print(shapeTuple) """ (891,) """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.2 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]" 명령을
더 읽기