■ DataFrameGroupBy 클래스의 transform 메소드를 사용해 그룹별 데이터를 집계하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") dataFrameGroupBy = dataFrame3.groupby("key") series1 = dataFrameGroupBy.transform("mean" )["value_x"] series2 = dataFrameGroupBy.transform("mean" )["value_y"] series3 = dataFrameGroupBy.transform("sum" )["value_x"] series4 = dataFrameGroupBy.transform("sum" )["value_y"] series5 = dataFrameGroupBy.transform("count")["value_x"] series6 = dataFrameGroupBy.transform("count")["value_y"] dataFrame3 = pd.DataFrame( { 'mean_x' : series1, 'mean_y' : series2, 'sum_x' : series3, 'sum_y' : series4, 'count_x' : series5, 'count_y' : series6 } ) print(dataFrame3) """ mean_x mean_y sum_x sum_y count_x count_y 0 1.054826 NaN 1.054826 0.000000 1 0 1 0.780643 0.105718 0.780643 0.105718 1 1 2 -0.780831 NaN -0.780831 0.000000 1 0 3 -0.709413 -1.636002 -1.418825 -3.272004 2 2 4 -0.709413 -1.636002 -1.418825 -3.272004 2 2 5 NaN 1.348015 0.000000 1.348015 0 1 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ DataFrameGroupBy 클래스의 agg 메소드를 사용해 그룹 데이터의 평균/합계/수를 집계하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") dataFrameGroupBy = dataFrame3.groupby("key") dataFrame4 = dataFrameGroupBy.agg(["mean", "sum", "count"]) print(dataFrame4) """ value_x value_y mean sum count mean sum count key A 0.052300 0.052300 1 NaN 0.000000 0 B -0.128339 -0.128339 1 0.874078 0.874078 1 C 0.868257 0.868257 1 NaN 0.000000 0 D 0.095844 0.191688 2 -0.356354 -0.712709 2 E NaN 0.000000 0 0.843800 0.843800 1 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ DataFrameGroupBy 클래스의 transform 메소드에서 커스텀 집계 함수를 설정하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") dataFrameGroupBy = dataFrame3.groupby("key") def customFunction(x): return x - x.mean() dataFrame4 = dataFrameGroupBy.transform(customFunction) print(dataFrame4) """ value_x value_y 0 0.0 NaN 1 0.0 0.000000 2 0.0 NaN 3 0.0 0.993518 4 0.0 -0.993518 5 NaN 0.000000 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ DataFrameGroupBy 클래스의 transform 메소드에서 lambda 함수를 사용해 커스텀 집계 함수를 설정하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") dataFrameGroupBy = dataFrame3.groupby("key") dataFrame4 = dataFrameGroupBy.transform(lambda x : x - x.mean()) print(dataFrame4) """ value_x value_y 0 0.0 NaN 1 0.0 0.000000 2 0.0 NaN 3 0.0 -1.167042 4 0.0 1.167042 5 NaN 0.000000 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip
더 읽기
■ DataFrameGroupBy 클래스의 transform 메소드를 사용해 그룹별 데이터를 집계하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") dataFrameGroupBy = dataFrame3.groupby("key") dataFrame04 = dataFrameGroupBy.transform("mean" ) # 평균 dataFrame05 = dataFrameGroupBy.transform("sum" ) # 합계 dataFrame06 = dataFrameGroupBy.transform("count" ) # 개수 dataFrame07 = dataFrameGroupBy.transform("min" ) # 최소값 dataFrame08 = dataFrameGroupBy.transform("max" ) # 최대값 dataFrame09 = dataFrameGroupBy.transform("median") # 중앙값 dataFrame10 = dataFrameGroupBy.transform("std" ) # 표준 편차 |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ read_sas 함수의 format 인자를 사용해 SAS 데이터 파일을 로드하는 방법을 보여준다. ▶ main.py
|
import pandas as pd dataFrame1 = pd.read_sas("transport-file.xpt" ) dataFrame2 = pd.read_sas("binary-file.sas7bdat") dataFrame3 = pd.read_sas("transport-file.xpt" , format = "xport" ) dataFrame4 = pd.read_sas("binary-file.sas7bdat", format = "sas7bdat") |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ DataFrameGroupBy 클래스의 first 메소드를 사용해 그룹별 첫번째 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrameGroupBy = dataFrame1.groupby(["sex", "smoker"]) dataFrame2 = dataFrameGroupBy.first() print(dataFrame2) """ total_bill tip day time size sex smoker Female No 16.99 1.01 Sun Dinner 2 Yes 3.07 1.00 Sat Dinner 1 Male No 10.34 1.66 Sun Dinner 3 Yes 38.01 3.00 Sat Dinner 4 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ SeriesGroupBy 클래스의 transform 메소드를 사용해 각 그룹의 평균값을 계산하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame = pd.read_csv(url) seriesGroupBy = dataFrame.groupby("smoker")["total_bill"] series = seriesGroupBy.transform("mean") print(series) """ 0 19.188278 1 19.188278 2 19.188278 3 19.188278 4 19.188278 ... 239 19.188278 240 20.756344 241 20.756344 242 19.188278 243 19.188278 Name: total_bill, Length: 244, dtype: float64 """ print() dataFrame["adj_total_bill"] = dataFrame["total_bill"] - series print(dataFrame) """ total_bill tip sex smoker day time size adj_total_bill 0 16.99 1.01 Female No Sun Dinner 2 -2.198278 1 10.34 1.66 Male No Sun Dinner 3 -8.848278 2 21.01 3.50 Male No Sun Dinner 3 1.821722 3 23.68 3.31 Male No Sun Dinner 2 4.491722 4 24.59 3.61 Female No Sun Dinner 4 5.401722 .. ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 9.841722 240 27.18 2.00 Female Yes Sat Dinner 2 6.423656 241 22.67 2.00 Male Yes Sat Dinner 2 1.913656 242 17.82 1.75 Male No Sat Dinner 2 -1.368278 243 18.78 3.00 Female No Thur Dinner 2 -0.408278 [244 rows x 8 columns] """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ DataFrame 클래스를 사용해 특정 컬럼 그룹에 대해 특정 컬럼 값을 집계하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrame2 = dataFrame1.groupby(["sex", "smoker"])[["total_bill", "tip"]].sum() print(dataFrame2) """ total_bill tip sex smoker Female No 977.68 149.77 Yes 593.27 96.74 Male No 1919.75 302.00 Yes 1337.07 183.07 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip
더 읽기
■ Series 클래스의 fillna 메소드를 사용해 NaN 값을 지정 값으로 설정하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") print(dataFrame3) """ key value_x value_y 0 A 0.372651 NaN 1 B -0.690978 -0.519133 2 C -0.125689 NaN 3 D 1.271387 -0.428321 4 D 1.271387 0.245696 5 E NaN 0.102741 """ print() meavValue = dataFrame3["value_x"].mean() # numpy.float64 print(f"Mean Value : {meavValue}") """ Mean Value : 0.4197517418509344 """ series = dataFrame3["value_x"].fillna(meavValue) print(series) """ 0 0.372651 1 -0.690978 2 -0.125689 3 1.271387 4 1.271387 5 0.419752 Name: value_x, dtype: float64 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
더 읽기
■ DataFrame 클래스의 ffill 메소드를 사용해 NaN 값을 이전 행에서 앞으로 채우는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") print(dataFrame3) """ key value_x value_y 0 A 1.980287 NaN 1 B 1.004949 -2.342287 2 C 0.175712 NaN 3 D 0.497413 -1.295190 4 D 0.497413 -0.349127 5 E NaN -0.017074 """ print() dataFrame4 = dataFrame3.ffill() print(dataFrame4) """ key value_x value_y 0 A 1.980287 NaN 1 B 1.004949 -2.342287 2 C 0.175712 -2.342287 3 D 0.497413 -1.295190 4 D 0.497413 -0.349127 5 E 0.497413 -0.017074 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip
더 읽기
■ Series 클래스의 sum 메소드를 사용해 값을 합산하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") print(dataFrame3) """ key value_x value_y 0 A 1.760074 NaN 1 B -0.959353 0.390332 2 C -0.410334 NaN 3 D -0.157667 -0.996457 4 D -0.157667 -0.214345 5 E NaN 1.238488 """ print() totalValue = dataFrame3["value_x"].sum() print(totalValue) # numpy.float64 """ 0.07505333274381987 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
■ Series 클래스에서 + 연산자를 사용해 Series 객체간 합산하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
|
import pandas as pd import numpy as np dataFrame1 = pd.DataFrame({"key" : ["A", "B", "C", "D"], "value" : np.random.randn(4)}) dataFrame2 = pd.DataFrame({"key" : ["B", "D", "D", "E"], "value" : np.random.randn(4)}) dataFrame3 = dataFrame1.merge(dataFrame2, on = ["key"], how = "outer") print(dataFrame3) """ key value_x value_y 0 A -0.802146 NaN 1 B 1.121540 1.202377 2 C 0.885321 NaN 3 D 0.640868 -0.420476 4 D 0.640868 0.438782 5 E NaN -0.762258 """ print() series = dataFrame3["value_x"] + dataFrame3["value_y"] print(series) """ 0 NaN 1 2.323917 2 NaN 3 0.220392 4 1.079650 5 NaN dtype: float64 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ to_csv 함수를 사용해 CSV 파일을 저장하는 방법을 보여준다. ▶ main.py
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame = pd.read_csv(url) dataFrame.to_csv("tips.csv") |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
■ DataFrame 클래스에서 == 연산자를 사용해 값을 비교하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1 == "Sun" print(dataFrame2) """ total_bill tip sex smoker day time size 0 False False False False True False False 1 False False False False True False False 2 False False False False True False False 3 False False False False True False False 4 False False False False True False False .. ... ... ... ... ... ... ... 239 False False False False False False False 240 False False False False False False False 241 False False False False False False False 242 False False False False False False False 243 False False False False False False False [244 rows x 7 columns] """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
■ Series 클래스의 replace 메소드를 사용해 특정 컬럼의 값을 일괄 변경하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.copy() series = dataFrame2["day"] dataFrame2["day"] = series.replace("Thur", "Thursday") print(dataFrame2) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thursday Dinner 2 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
더 읽기
■ DataFrame 클래스의 copy 메소드를 사용해 DataFrame 객체를 복사하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.copy() """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ DataFrame 클래스의 replace 메소드를 사용해 문자열을 변경하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
|
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.replace("Thur", "Thursday") print(dataFrame2) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thursday Dinner 2 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
■ pivot_table 함수의 values/index/columns/aggfunc 인자를 사용해 그룹별 평균 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
import pandas as pd import numpy as np url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrame2 = pd.pivot_table(dataFrame1, values = "tip", index = ["size"], columns = ["sex"], aggfunc = np.average) print(dataFrame2) """ sex Female Male size 1 1.276667 1.920000 2 2.528448 2.614184 3 3.250000 3.476667 4 4.021111 4.172143 5 5.140000 3.750000 6 4.600000 5.850000 """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ DataFrame 클래스의 drop_duplicates 메소드를 사용해 중복 데이터를 제거하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
|
import pandas as pd dataFrame = pd.DataFrame( { "class" : ["A", "A", "A", "B", "C", "D"], "student_count" : [42, 35, 42, 50, 47, 45], "all_pass" : ["Yes", "Yes", "Yes", "No", "No", "Yes"] } ) print(dataFrame) """ class student_count all_pass 0 A 42 Yes 1 A 35 Yes 2 A 42 Yes 3 B 50 No 4 C 47 No 5 D 45 Yes """ print() dataFrame.drop_duplicates(["class", "student_count"], inplace = True) print(dataFrame) """ class student_count all_pass 0 A 42 Yes 1 A 35 Yes 3 B 50 No 4 C 47 No 5 D 45 Yes """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ DataFrame 클래스의 loc 속성에서 [] 연산자를 사용해 특정 컬럼의 값을 변경하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
import pandas as pd dataFrame = pd.DataFrame({"AAA" : [1] * 8, "BBB" : list(range(0, 8))}) list1 = list(range(1, 5)) dataFrame.loc[2:5, "AAA"] = list1 # 2:5는 행 인덱스가 2, 3, 4, 5가 된다. print(dataFrame) """ AAA BBB 0 1 0 1 1 1 2 1 2 3 2 3 4 3 4 5 4 5 6 1 6 7 1 7 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip
더 읽기
■ StringMethods 클래스의 upper/lower/title 메소드를 사용해 문자열 컬럼 값을 대소문자 변경한 신규 컬럼을 추가하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
import pandas as pd dataFrame = pd.DataFrame({"String" : ["John Smith", "Jane Cook"]}) stringMethods = dataFrame["String"].str dataFrame["upper"] = stringMethods.upper() dataFrame["lower"] = stringMethods.lower() dataFrame["title"] = stringMethods.title() print(dataFrame) """ String upper lower title 0 John Smith JOHN SMITH john smith John Smith 1 Jane Cook JANE COOK jane cook Jane Cook """ |
▶ requirements.txt
|
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
더 읽기
■ StringMethods 클래스의 split/rsplit 메소드에서 expand 인자를 사용해 문자열 컬럼 값을 분리해 신규 컬럼를 추가하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
import pandas as pd dataFrame = pd.DataFrame({"String" : ["John Smith", "Jane Cook"]}) stringMethods = dataFrame["String"].str dataFrame["First_Name"] = stringMethods.split (" ", expand = True)[0] dataFrame["Last_Name" ] = stringMethods.rsplit(" ", expand = True)[1] print(dataFrame) """ String First_Name Last_Name 0 John Smith John Smith 1 Jane Cook Jane Cook """ |
▶ requirements.txt
더 읽기
■ StringMethods 클래스에서 [] 연산자를 사용해 문자열을 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) stringMethods = dataFrame["sex"].str series = stringMethods[0:1] print(series) """ 0 F 1 M 2 M 3 M 4 F .. 239 M 240 F 241 M 242 M 243 F Name: sex, Length: 244, dtype: object """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]" 명령을 실행했다.
더 읽기
■ StringMethods 클래스의 find 메소드를 사용해 해당 문자열을 찾는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame = pd.read_excel("tips.xlsx", index_col = 0) stringMethods = dataFrame["sex"].str series = stringMethods.find("ale") # 없는 경우 -1을 반환한다. print(series) """ 0 3 1 1 2 1 3 1 4 3 .. 239 1 240 3 241 1 242 1 243 3 Name: sex, Length: 244, dtype: int64 """ |
▶ requirements.txt
|
defusedxml==0.7.1 et_xmlfile==2.0.0 numpy==2.1.3 odfpy==1.4.1 openpyxl==3.1.5 pandas==2.2.3 python-calamine==0.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 pyxlsb==1.0.10 six==1.16.0 tzdata==2024.2 xlrd==2.0.1 XlsxWriter==3.2.0 |
※ pip install "pandas[excel]" 명령을
더 읽기