[C#/COMMON/.NET8] 대용량 CSV 파일 병합하기
■ 대용량 CSV 파일을 병합하는 방법을 보여준다. ※ CSV 파일에 헤더 라인이 있어야 합니다. ※ 병합하는 CSV 파일은 동일한 헤더 라인을 갖고
■ 대용량 CSV 파일을 병합하는 방법을 보여준다. ※ CSV 파일에 헤더 라인이 있어야 합니다. ※ 병합하는 CSV 파일은 동일한 헤더 라인을 갖고
■ DataFrameGroupBy 클래스의 first 메소드를 사용해 그룹별 첫번째 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrameGroupBy = dataFrame1.groupby(["sex", "smoker"]) dataFrame2 = dataFrameGroupBy.first() print(dataFrame2) """ total_bill tip day time size sex smoker Female No 16.99 1.01 Sun Dinner 2 Yes 3.07 1.00 Sat Dinner 1 Male No 10.34 1.66 Sun Dinner 3 Yes 38.01 3.00 Sat Dinner 4 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
■ SeriesGroupBy 클래스의 transform 메소드를 사용해 각 그룹의 평균값을 계산하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame = pd.read_csv(url) seriesGroupBy = dataFrame.groupby("smoker")["total_bill"] series = seriesGroupBy.transform("mean") print(series) """ 0 19.188278 1 19.188278 2 19.188278 3 19.188278 4 19.188278 ... 239 19.188278 240 20.756344 241 20.756344 242 19.188278 243 19.188278 Name: total_bill, Length: 244, dtype: float64 """ print() dataFrame["adj_total_bill"] = dataFrame["total_bill"] - series print(dataFrame) """ total_bill tip sex smoker day time size adj_total_bill 0 16.99 1.01 Female No Sun Dinner 2 -2.198278 1 10.34 1.66 Male No Sun Dinner 3 -8.848278 2 21.01 3.50 Male No Sun Dinner 3 1.821722 3 23.68 3.31 Male No Sun Dinner 2 4.491722 4 24.59 3.61 Female No Sun Dinner 4 5.401722 .. ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 9.841722 240 27.18 2.00 Female Yes Sat Dinner 2 6.423656 241 22.67 2.00 Male Yes Sat Dinner 2 1.913656 242 17.82 1.75 Male No Sat Dinner 2 -1.368278 243 18.78 3.00 Female No Thur Dinner 2 -0.408278 [244 rows x 8 columns] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
■ DataFrame 클래스를 사용해 특정 컬럼 그룹에 대해 특정 컬럼 값을 집계하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrame2 = dataFrame1.groupby(["sex", "smoker"])[["total_bill", "tip"]].sum() print(dataFrame2) """ total_bill tip sex smoker Female No 977.68 149.77 Yes 593.27 96.74 Male No 1919.75 302.00 Yes 1337.07 183.07 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip
■ to_csv 함수를 사용해 CSV 파일을 저장하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame = pd.read_csv(url) dataFrame.to_csv("tips.csv") |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
■ DataFrame 클래스에서 == 연산자를 사용해 값을 비교하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1 == "Sun" print(dataFrame2) """ total_bill tip sex smoker day time size 0 False False False False True False False 1 False False False False True False False 2 False False False False True False False 3 False False False False True False False 4 False False False False True False False .. ... ... ... ... ... ... ... 239 False False False False False False False 240 False False False False False False False 241 False False False False False False False 242 False False False False False False False 243 False False False False False False False [244 rows x 7 columns] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
■ Series 클래스의 replace 메소드를 사용해 특정 컬럼의 값을 일괄 변경하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.copy() series = dataFrame2["day"] dataFrame2["day"] = series.replace("Thur", "Thursday") print(dataFrame2) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thursday Dinner 2 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
■ DataFrame 클래스의 copy 메소드를 사용해 DataFrame 객체를 복사하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.copy() """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
■ DataFrame 클래스의 replace 메소드를 사용해 문자열을 변경하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.replace("Thur", "Thursday") print(dataFrame2) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thursday Dinner 2 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
■ pivot_table 함수의 values/index/columns/aggfunc 인자를 사용해 그룹별 평균 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import pandas as pd import numpy as np url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrame2 = pd.pivot_table(dataFrame1, values = "tip", index = ["size"], columns = ["sex"], aggfunc = np.average) print(dataFrame2) """ sex Female Male size 1 1.276667 1.920000 2 2.528448 2.614184 3 3.250000 3.476667 4 4.021111 4.172143 5 5.140000 3.750000 6 4.600000 5.850000 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.3 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
■ read_table 함수에서 sep/header 인자를 사용해 CSV 파일 데이터를 로드하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import pandas as pd dataFrame = pd.read_table("tips.csv", sep = ",", header = [0]) # header = [0]는 0번째 행을 헤더로 사용한다. print(dataFrame) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
■ read_csv 함수에서 sep/header 인자를 사용해 CSV 파일 데이터를 로드하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import pandas as pd dataFrame = pd.read_csv("tips.csv", sep = ",", header = [0]) # header = [0]는 0번째 행을 헤더로 사용한다. print(dataFrame) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
■ DataFrame 클래스의 sort_values 메소드에서 inplace 인자를 사용해 특정 컬럼을 기준으로 정렬하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
import pandas as pd import numpy as np url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame = pd.read_csv(url) print(dataFrame) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame.sort_values("tip", inplace = True) print(dataFrame) """ total_bill tip sex smoker day time size 92 5.75 1.00 Female Yes Fri Dinner 2 111 7.25 1.00 Female No Sat Dinner 1 67 3.07 1.00 Female Yes Sat Dinner 1 236 12.60 1.00 Male Yes Sat Dinner 2 0 16.99 1.01 Female No Sun Dinner 2 .. ... ... ... ... ... ... ... 141 34.30 6.70 Male No Thur Lunch 6 59 48.27 6.73 Male No Sat Dinner 4 23 39.42 7.58 Male No Sat Dinner 4 212 48.33 9.00 Male No Sat Dinner 4 170 50.81 10.00 Male Yes Sat Dinner 3 [244 rows x 7 columns] """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip
■ DataFrame 클래스에서 그룹당 상위 N개 행 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrameGroupBy = dataFrame1.groupby(["sex"]) seriesGroupBy = dataFrameGroupBy["tip"] series = seriesGroupBy.rank(method = "min") print(series) """ 0 4.0 1 20.0 2 109.0 3 103.0 4 70.0 ... 239 150.0 240 18.0 241 29.0 242 23.0 243 49.0 Name: tip, Length: 244, dtype: float64 """ print() dataFrame2 = dataFrame1[dataFrame1["tip"] < 2] print(dataFrame2) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 8 15.04 1.96 Male No Sun Dinner 2 10 10.27 1.71 Male No Sun Dinner 2 12 15.42 1.57 Male No Sun Dinner 2 16 10.33 1.67 Female No Sun Dinner 3 30 9.55 1.45 Male No Sat Dinner 2 43 9.68 1.32 Male No Sun Dinner 2 53 9.94 1.56 Male No Sun Dinner 2 57 26.41 1.50 Female No Sat Dinner 2 58 11.24 1.76 Male Yes Sat Dinner 2 62 11.02 1.98 Male Yes Sat Dinner 2 67 3.07 1.00 Female Yes Sat Dinner 1 70 12.02 1.97 Male No Sat Dinner 2 75 10.51 1.25 Male No Sat Dinner 2 82 10.07 1.83 Female No Thur Lunch 1 92 5.75 1.00 Female Yes Fri Dinner 2 97 12.03 1.50 Male Yes Fri Dinner 2 99 12.46 1.50 Male No Fri Dinner 2 105 15.36 1.64 Male Yes Sat Dinner 2 111 7.25 1.00 Female No Sat Dinner 1 117 10.65 1.50 Female No Thur Lunch 2 118 12.43 1.80 Female No Thur Lunch 2 121 13.42 1.68 Female No Thur Lunch 2 126 8.52 1.48 Male No Thur Lunch 2 130 19.08 1.50 Male No Thur Lunch 2 132 11.17 1.50 Female No Thur Lunch 2 135 8.51 1.25 Female No Thur Lunch 2 145 8.35 1.50 Female No Thur Lunch 2 146 18.64 1.36 Female No Thur Lunch 3 147 11.87 1.63 Female No Thur Lunch 2 148 9.78 1.73 Male No Thur Lunch 2 168 10.59 1.61 Female Yes Sat Dinner 2 190 15.69 1.50 Male Yes Sun Dinner 2 195 7.56 1.44 Male No Thur Lunch 2 215 12.90 1.10 Female Yes Sat Dinner 2 217 11.59 1.50 Male Yes Sat Dinner 2 218 7.74 1.44 Male Yes Sat Dinner 2 222 8.58 1.92 Male Yes Fri Lunch 1 224 13.42 1.58 Male Yes Fri Lunch 2 233 10.77 1.47 Male No Sat Dinner 2 235 10.07 1.25 Male No Sat Dinner 2 236 12.60 1.00 Male Yes Sat Dinner 2 237 32.83 1.17 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 """ print() dataFrame3 = dataFrame2.assign(rnk_min = series) print(dataFrame3) """ total_bill tip sex smoker day time size rnk_min 0 16.99 1.01 Female No Sun Dinner 2 4.0 1 10.34 1.66 Male No Sun Dinner 3 20.0 8 15.04 1.96 Male No Sun Dinner 2 26.0 10 10.27 1.71 Male No Sun Dinner 2 21.0 12 15.42 1.57 Male No Sun Dinner 2 17.0 16 10.33 1.67 Female No Sun Dinner 3 14.0 30 9.55 1.45 Male No Sat Dinner 2 8.0 43 9.68 1.32 Male No Sun Dinner 2 5.0 53 9.94 1.56 Male No Sun Dinner 2 16.0 57 26.41 1.50 Female No Sat Dinner 2 8.0 58 11.24 1.76 Male Yes Sat Dinner 2 24.0 62 11.02 1.98 Male Yes Sat Dinner 2 28.0 67 3.07 1.00 Female Yes Sat Dinner 1 1.0 70 12.02 1.97 Male No Sat Dinner 2 27.0 75 10.51 1.25 Male No Sat Dinner 2 3.0 82 10.07 1.83 Female No Thur Lunch 1 17.0 92 5.75 1.00 Female Yes Fri Dinner 2 1.0 97 12.03 1.50 Male Yes Fri Dinner 2 11.0 99 12.46 1.50 Male No Fri Dinner 2 11.0 105 15.36 1.64 Male Yes Sat Dinner 2 19.0 111 7.25 1.00 Female No Sat Dinner 1 1.0 117 10.65 1.50 Female No Thur Lunch 2 8.0 118 12.43 1.80 Female No Thur Lunch 2 16.0 121 13.42 1.68 Female No Thur Lunch 2 15.0 126 8.52 1.48 Male No Thur Lunch 2 10.0 130 19.08 1.50 Male No Thur Lunch 2 11.0 132 11.17 1.50 Female No Thur Lunch 2 8.0 135 8.51 1.25 Female No Thur Lunch 2 6.0 145 8.35 1.50 Female No Thur Lunch 2 8.0 146 18.64 1.36 Female No Thur Lunch 3 7.0 147 11.87 1.63 Female No Thur Lunch 2 13.0 148 9.78 1.73 Male No Thur Lunch 2 22.0 168 10.59 1.61 Female Yes Sat Dinner 2 12.0 190 15.69 1.50 Male Yes Sun Dinner 2 11.0 195 7.56 1.44 Male No Thur Lunch 2 6.0 215 12.90 1.10 Female Yes Sat Dinner 2 5.0 217 11.59 1.50 Male Yes Sat Dinner 2 11.0 218 7.74 1.44 Male Yes Sat Dinner 2 6.0 222 8.58 1.92 Male Yes Fri Lunch 1 25.0 224 13.42 1.58 Male Yes Fri Lunch 2 18.0 233 10.77 1.47 Male No Sat Dinner 2 9.0 235 10.07 1.25 Male No Sat Dinner 2 3.0 236 12.60 1.00 Male Yes Sat Dinner 2 1.0 237 32.83 1.17 Male Yes Sat Dinner 2 2.0 242 17.82 1.75 Male No Sat Dinner 2 23.0 """ print() dataFrame4 = dataFrame3.query("rnk_min < 3") print(dataFrame4) """ total_bill tip sex smoker day time size rnk_min 67 3.07 1.00 Female Yes Sat Dinner 1 1.0 92 5.75 1.00 Female Yes Fri Dinner 2 1.0 111 7.25 1.00 Female No Sat Dinner 1 1.0 236 12.60 1.00 Male Yes Sat Dinner 2 1.0 237 32.83 1.17 Male Yes Sat Dinner 2 2.0 """ print() dataFrame5 = dataFrame4.sort_values(["sex", "rnk_min"]) print(dataFrame5) """ total_bill tip sex smoker day time size rnk_min 67 3.07 1.00 Female Yes Sat Dinner 1 1.0 92 5.75 1.00 Female Yes Fri Dinner 2 1.0 111 7.25 1.00 Female No Sat Dinner 1 1.0 236 12.60 1.00 Male Yes Sat Dinner 2 1.0 237 32.83 1.17 Male Yes Sat Dinner 2 2.0 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
■ SeriesGroupBy 클래스의 rank 메소드에서 method/ascending 인자를 사용해 그룹 내 항목 순서를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrameGroupBy = dataFrame1.groupby(["day"]) seriesGroupBy = dataFrameGroupBy["total_bill"] series = seriesGroupBy.rank(method = "first", ascending = False) print(series) """ 0 49.0 1 68.0 2 34.0 3 27.0 4 23.0 ... 239 13.0 240 16.0 241 26.0 242 47.0 243 20.0 Name: total_bill, Length: 244, dtype: float64 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※
■ DataFrameGroupBy 클래스의 cumcount 메소드를 사용해 그룹 내 항목 순서를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.sort_values(["total_bill"], ascending = False) print(dataFrame2) """ total_bill tip sex smoker day time size 170 50.81 10.00 Male Yes Sat Dinner 3 212 48.33 9.00 Male No Sat Dinner 4 59 48.27 6.73 Male No Sat Dinner 4 156 48.17 5.00 Male No Sun Dinner 6 182 45.35 3.50 Male Yes Sun Dinner 3 .. ... ... ... ... ... ... ... 149 7.51 2.00 Male No Thur Lunch 2 111 7.25 1.00 Female No Sat Dinner 1 172 7.25 5.15 Male Yes Sun Dinner 2 92 5.75 1.00 Female Yes Fri Dinner 2 67 3.07 1.00 Female Yes Sat Dinner 1 [244 rows x 7 columns] """ print() dataFrameGroupBy = dataFrame2.groupby(["day"]) series = dataFrameGroupBy.cumcount() + 1 # 그룹 내에서 각 항목의 누적 카운트(순서)를 반환한다. print(series) print() """ 170 1 212 2 59 3 156 1 182 2 .. 149 62 111 86 172 76 92 19 67 87 Length: 244, dtype: int64 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
■ DataFrame 클래스에서 그룹당 상위 N개 행 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrameGroupBy = dataFrame1.groupby(["day"]) seriesGroupBy = dataFrameGroupBy["total_bill"] series = seriesGroupBy.rank(method = "first", ascending = False) print(series) """ 0 49.0 1 68.0 2 34.0 3 27.0 4 23.0 ... 239 13.0 240 16.0 241 26.0 242 47.0 243 20.0 Name: total_bill, Length: 244, dtype: float64 """ print() dataFrame2 = dataFrame1.assign(rank = series) print(dataFrame2) """ total_bill tip sex smoker day time size rank 0 16.99 1.01 Female No Sun Dinner 2 49.0 1 10.34 1.66 Male No Sun Dinner 3 68.0 2 21.01 3.50 Male No Sun Dinner 3 34.0 3 23.68 3.31 Male No Sun Dinner 2 27.0 4 24.59 3.61 Female No Sun Dinner 4 23.0 .. ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 13.0 240 27.18 2.00 Female Yes Sat Dinner 2 16.0 241 22.67 2.00 Male Yes Sat Dinner 2 26.0 242 17.82 1.75 Male No Sat Dinner 2 47.0 243 18.78 3.00 Female No Thur Dinner 2 20.0 [244 rows x 8 columns] """ print() dataFrame3 = dataFrame2.query("rank < 3") print(dataFrame3) """ total_bill tip sex smoker day time size rank 90 28.97 3.00 Male Yes Fri Dinner 2 2.0 95 40.17 4.73 Male Yes Fri Dinner 4 1.0 142 41.19 5.00 Male No Thur Lunch 5 2.0 156 48.17 5.00 Male No Sun Dinner 6 1.0 170 50.81 10.00 Male Yes Sat Dinner 3 1.0 182 45.35 3.50 Male Yes Sun Dinner 3 2.0 197 43.11 5.00 Female Yes Thur Lunch 4 1.0 212 48.33 9.00 Male No Sat Dinner 4 2.0 """ print() dataFrame4 = dataFrame3.sort_values(["day", "rank"]) print(dataFrame4) """ total_bill tip sex smoker day time size rnk 95 40.17 4.73 Male Yes Fri Dinner 4 1.0 90 28.97 3.00 Male Yes Fri Dinner 2 2.0 170 50.81 10.00 Male Yes Sat Dinner 3 1.0 212 48.33 9.00 Male No Sat Dinner 4 2.0 156 48.17 5.00 Male No Sun Dinner 6 1.0 182 45.35 3.50 Male Yes Sun Dinner 3 2.0 197 43.11 5.00 Female Yes Thur Lunch 4 1.0 142 41.19 5.00 Male No Thur Lunch 5 2.0 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
■ DataFrame 클래스에서 그룹당 상위 N개 행 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) print(dataFrame1) """ total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4 .. ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 240 27.18 2.00 Female Yes Sat Dinner 2 241 22.67 2.00 Male Yes Sat Dinner 2 242 17.82 1.75 Male No Sat Dinner 2 243 18.78 3.00 Female No Thur Dinner 2 [244 rows x 7 columns] """ print() dataFrame2 = dataFrame1.sort_values(["total_bill"], ascending = False) print(dataFrame2) """ total_bill tip sex smoker day time size 170 50.81 10.00 Male Yes Sat Dinner 3 212 48.33 9.00 Male No Sat Dinner 4 59 48.27 6.73 Male No Sat Dinner 4 156 48.17 5.00 Male No Sun Dinner 6 182 45.35 3.50 Male Yes Sun Dinner 3 .. ... ... ... ... ... ... ... 149 7.51 2.00 Male No Thur Lunch 2 111 7.25 1.00 Female No Sat Dinner 1 172 7.25 5.15 Male Yes Sun Dinner 2 92 5.75 1.00 Female Yes Fri Dinner 2 67 3.07 1.00 Female Yes Sat Dinner 1 [244 rows x 7 columns] """ print() dataFrameGroupBy = dataFrame2.groupby(["day"]) series = dataFrameGroupBy.cumcount() + 1 # 그룹 내에서 각 항목의 누적 카운트(순서)를 반환한다. print(series) print() """ 170 1 212 2 59 3 156 1 182 2 .. 149 62 111 86 172 76 92 19 67 87 Length: 244, dtype: int64 """ dataFrame3 = dataFrame1.assign(rank = series) print(dataFrame3) """ total_bill tip sex smoker day time size rank 0 16.99 1.01 Female No Sun Dinner 2 49 1 10.34 1.66 Male No Sun Dinner 3 68 2 21.01 3.50 Male No Sun Dinner 3 34 3 23.68 3.31 Male No Sun Dinner 2 27 4 24.59 3.61 Female No Sun Dinner 4 23 .. ... ... ... ... ... ... ... ... 239 29.03 5.92 Male No Sat Dinner 3 13 240 27.18 2.00 Female Yes Sat Dinner 2 16 241 22.67 2.00 Male Yes Sat Dinner 2 26 242 17.82 1.75 Male No Sat Dinner 2 47 243 18.78 3.00 Female No Thur Dinner 2 20 [244 rows x 8 columns] """ print() dataFrame4 = dataFrame3.query("rank < 3") print(dataFrame4) """ total_bill tip sex smoker day time size rank 90 28.97 3.00 Male Yes Fri Dinner 2 2 95 40.17 4.73 Male Yes Fri Dinner 4 1 142 41.19 5.00 Male No Thur Lunch 5 2 156 48.17 5.00 Male No Sun Dinner 6 1 170 50.81 10.00 Male Yes Sat Dinner 3 1 182 45.35 3.50 Male Yes Sun Dinner 3 2 197 43.11 5.00 Female Yes Thur Lunch 4 1 212 48.33 9.00 Male No Sat Dinner 4 2 """ print() dataFrame5 = dataFrame4.sort_values(["day", "rank"]) print(dataFrame5) """ total_bill tip sex smoker day time size rank 95 40.17 4.73 Male Yes Fri Dinner 4 1 90 28.97 3.00 Male Yes Fri Dinner 2 2 170 50.81 10.00 Male Yes Sat Dinner 3 1 212 48.33 9.00 Male No Sat Dinner 4 2 156 48.17 5.00 Male No Sun Dinner 6 1 182 45.35 3.50 Male Yes Sun Dinner 3 2 197 43.11 5.00 Female Yes Thur Lunch 4 1 142 41.19 5.00 Male No Thur Lunch 5 2 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
■ DataFrameGroupBy 클래스의 agg 메소드를 사용해 그룹 데이터의 평균/수를 집계하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrameGroupBy = dataFrame1.groupby(["smoker", "day"]) dataFrame2 = dataFrameGroupBy.agg({"tip" : ["size", "mean"]}) print(dataFrame2) """ tip size mean smoker day No Fri 4 2.812500 Sat 45 3.102889 Sun 57 3.167895 Thur 45 2.673778 Yes Fri 15 2.714000 Sat 42 2.875476 Sun 19 3.516842 Thur 17 3.030000 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
■ DataFrameGroupBy 클래스의 agg 메소드를 사용해 그룹 데이터의 평균/수를 집계하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrameGroupBy = dataFrame1.groupby("day") dataFrame2 = dataFrameGroupBy.agg({"tip" : "mean", "day" : "size"}) print(dataFrame2) """ tip day day Fri 2.734737 19 Sat 2.993103 87 Sun 3.255132 76 Thur 2.771452 62 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
■ SeriesGroupBy 클래스의 count 메소드를 사용해 그룹 컬럼에 대한 특정 컬럼 값 수를 구하는 방법을 보여준다. (NaN 값 제외) ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame = pd.read_csv(url) dataFrameGroupBy = dataFrame.groupby("sex") seriesGroupBy = dataFrameGroupBy["total_bill"] series = seriesGroupBy.count() print(series) """ sex Female 87 Male 157 Name: total_bill, dtype: int64 """ |
■ DataFrameGroupBy 클래스의 count 메소드를 사용해 그룹 컬럼에 대한 모든 컬럼 값의 수를 구하는 방법을 보여준다. (NaN 값 제외) ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrameGroupBy = dataFrame1.groupby("sex") dataFrame2 = dataFrameGroupBy.count() print(dataFrame2) """ total_bill tip smoker day time size sex Female 87 87 87 87 87 87 Male 157 157 157 157 157 157 """ |
■ DataFrameGroupBy 클래스의 size 메소드를 사용해 그룹 컬럼 값의 수를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame = pd.read_csv(url) dataFrameGroupBy = dataFrame.groupby("sex") series = dataFrameGroupBy.size() print(series) """ sex Female 87 Male 157 dtype: int64 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
■ DataFrame 클래스에서 [] 연산자를 사용해 특정 컬럼에서 특정 값들을 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrame2 = dataFrame1[(dataFrame1["size"] >= 5) | (dataFrame1["total_bill"] > 45)] print(dataFrame2) """ total_bill tip sex smoker day time size 59 48.27 6.73 Male No Sat Dinner 4 125 29.80 4.20 Female No Thur Lunch 6 141 34.30 6.70 Male No Thur Lunch 6 142 41.19 5.00 Male No Thur Lunch 5 143 27.05 5.00 Female No Thur Lunch 6 155 29.85 5.14 Female No Sun Dinner 5 156 48.17 5.00 Male No Sun Dinner 6 170 50.81 10.00 Male Yes Sat Dinner 3 182 45.35 3.50 Male Yes Sun Dinner 3 185 20.69 5.00 Male No Sun Dinner 5 187 30.46 2.00 Male Yes Sun Dinner 5 212 48.33 9.00 Male No Sat Dinner 4 216 28.15 3.00 Male Yes Sat Dinner 5 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
■ DataFrame 클래스에서 [] 연산자를 사용해 특정 컬럼에서 특정 값들을 갖는 DataFrame 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import pandas as pd url = "https://raw.githubusercontent.com/pandas-dev/pandas/main/pandas/tests/io/data/csv/tips.csv" dataFrame1 = pd.read_csv(url) dataFrame2 = dataFrame1[(dataFrame1["time"] == "Dinner") & (dataFrame1["tip"] > 5.00)] print(dataFrame2) """ total_bill tip sex smoker day time size 23 39.42 7.58 Male No Sat Dinner 4 44 30.40 5.60 Male No Sun Dinner 4 47 32.40 6.00 Male No Sun Dinner 4 52 34.81 5.20 Female No Sun Dinner 4 59 48.27 6.73 Male No Sat Dinner 4 116 29.93 5.07 Male No Sun Dinner 4 155 29.85 5.14 Female No Sun Dinner 5 170 50.81 10.00 Male Yes Sat Dinner 3 172 7.25 5.15 Male Yes Sun Dinner 2 181 23.33 5.65 Male Yes Sun Dinner 2 183 23.17 6.50 Male Yes Sun Dinner 4 211 25.89 5.16 Male Yes Sat Dinner 4 212 48.33 9.00 Male No Sat Dinner 4 214 28.17 6.50 Female Yes Sat Dinner 3 239 29.03 5.92 Male No Sat Dinner 3 """ |
▶ requirements.txt
1 2 3 4 5 6 7 8 |
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |