■ DataFrame 클래스의 index 속성에서 DatetimeIndex 객체를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
|
import pandas as pd sourceDataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameDataFrame = sourceDataFrame.rename(columns = {"date.utc" : "datetime"}) pivotDataFrame = renameDataFrame.pivot(index = "datetime", columns = "location", values = "value") datetimeIndexResampler = pivotDataFrame.resample("ME") targetDataFrame = datetimeIndexResampler.max() datetimeIndex = targetDataFrame.index print(datetimeIndex.freq) """ <MonthEnd> """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
더 읽기
■ DatetimeIndexResampler 클래스의 max 메소드를 사용해 시계열 특정 구간의 최대값을 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
|
import pandas as pd sourceDataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameDataFrame = sourceDataFrame.rename(columns = {"date.utc" : "datetime"}) pivotDataFrame = renameDataFrame.pivot(index = "datetime", columns = "location", values = "value") datetimeIndexResampler = pivotDataFrame.resample("ME") targetDataFrame = datetimeIndexResampler.max() print(targetDataFrame) """ location BETR801 FR04014 London Westminster datetime 2019-05-31 00:00:00+00:00 74.5 97.0 97.0 2019-06-30 00:00:00+00:00 52.5 84.7 52.0 """ |
※ DatetimeIndexResampler 클래스에는 max 메소드 외에
더 읽기
■ DataFrame 클래스의 resample 메소드를 사용해 시계열에서 다른 주기로 리샘플링하는 datetimeIndexResampler 객체를 만드는 방법을 보여준다. ▶ main.py
|
import pandas as pd sourceDataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameDataFrame = sourceDataFrame.rename(columns = {"date.utc" : "datetime"}) pivotDataFrame = renameDataFrame.pivot(index = "datetime", columns = "location", values = "value") datetimeIndexResampler = pivotDataFrame.resample("ME") |
※ DataFrame 클래스의 resample
더 읽기
■ DataFrame 클래스의 pivot/plot 메소드를 사용해 LINE 차트를 그리는 방법을 보여준다. ▶ main.py
|
import pandas as pd import matplotlib.pyplot as plt sourceDataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameDataFrame = sourceDataFrame.rename(columns = {"date.utc" : "datetime"}) pivotDataFrame = renameDataFrame.pivot(index = "datetime", columns = "location", values = "value") pivotDataFrame["2019-05-20":"2019-05-21"].plot(); plt.show() |
▶ requirements.txt
|
contourpy==1.3.0 cycler==0.12.1 fonttools==4.54.1 kiwisolver==1.4.7 matplotlib==3.9.2 numpy==2.1.2 packaging==24.1 pandas==2.2.3 pillow==11.0.0 pyparsing==3.2.0 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas matplotlib
더 읽기
■ DataFrame 클래스의 pivot 메소드에서 index/columns/values 인자를 사용해 LONG 포맷 데이터에서 WIDE 포맷 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
|
import pandas as pd sourceDataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameDataFrame = sourceDataFrame.rename(columns = {"date.utc" : "datetime"}) pivotDataFrame = renameDataFrame.pivot(index = "datetime", columns = "location", values = "value") """ location BETR801 FR04014 London Westminster datetime 2019-05-07 01:00:00+00:00 50.5 25.0 23.0 2019-05-07 02:00:00+00:00 45.0 27.7 19.0 2019-05-07 03:00:00+00:00 NaN 50.4 19.0 2019-05-07 04:00:00+00:00 NaN 61.9 16.0 2019-05-07 05:00:00+00:00 NaN 72.4 NaN ... ... ... ... 2019-06-20 20:00:00+00:00 NaN 21.4 NaN 2019-06-20 21:00:00+00:00 NaN 24.9 NaN 2019-06-20 22:00:00+00:00 NaN 26.5 NaN 2019-06-20 23:00:00+00:00 NaN 21.8 NaN 2019-06-21 00:00:00+00:00 NaN 20.0 NaN [1033 rows x 3 columns] """ |
▶ requirements.txt
더 읽기
■ DataFrame 클래스에서 시계열 데이터를 사용해 BAR 차트를 그리는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
|
import pandas as pd import matplotlib.pyplot as plt sourceDataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameDataFrame = sourceDataFrame.rename(columns = {"date.utc" : "datetime"}) targetDataFrameGroupBy = renameDataFrame.groupby([renameDataFrame["datetime"].dt.hour]) targetSeriesGroupBy = targetDataFrameGroupBy["value"] targetSeries = targetSeriesGroupBy.mean() figure, axes = plt.subplots(figsize = (12, 4)) # 가로 12인치 세로 4인치 targetSeries.plot(kind = "bar", rot = 0, ax = axes) # 막대 그래프, X축 레이블 회전 0도 plt.xlabel("Hour of the day"); plt.ylabel("$NO_2 (µg/m^3)$"); plt.show() |
▶ requirements.txt
|
contourpy==1.3.0 cycler==0.12.1 fonttools==4.54.1 kiwisolver==1.4.7 matplotlib==3.9.2 numpy==2.1.2 packaging==24.1 pandas==2.2.3 pillow==11.0.0 pyparsing==3.2.0 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install air_quality_no2_long.csv
■ DataFrame 클래스의 groupby 메소드를 사용해 특정 컬럼들의 그룹별 평균값을 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
|
import pandas as pd sourceDataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameDataFrame = sourceDataFrame.rename(columns = {"date.utc" : "datetime"}) targetDataFrameGroupBy = renameDataFrame.groupby([renameDataFrame["datetime"].dt.weekday, "location"]) targetSeriesGroupBy = targetDataFrameGroupBy["value"] targetSeries = targetSeriesGroupBy.mean() print(targetSeries) """ datetime location 0 BETR801 27.875000 FR04014 24.856250 London Westminster 23.969697 1 BETR801 22.214286 FR04014 30.999359 London Westminster 24.885714 2 BETR801 21.125000 FR04014 29.165753 London Westminster 23.460432 3 BETR801 27.500000 FR04014 28.600690 London Westminster 24.780142 4 BETR801 28.400000 FR04014 31.617986 London Westminster 26.446809 5 BETR801 33.500000 FR04014 25.266154 London Westminster 24.977612 6 BETR801 21.896552 FR04014 23.274306 London Westminster 24.859155 Name: value, dtype: float64 """ |
※ renameDataFrame[“datetime”].dt.weekday 속성에서 0은 월요일, 6은
더 읽기
■ Series 클래스의 dt 속성을 사용해 DataFrame 객체에 월별 컬럼을 추가하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
|
import pandas as pd import matplotlib.pyplot as plt airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameAirQualityNO2DataFrame = airQualityNO2DataFrame.rename(columns = {"date.utc" : "datetime"}) datetimeSeries = renameAirQualityNO2DataFrame["datetime"] renameAirQualityNO2DataFrame["month"] = datetimeSeries.dt.month print(renameAirQualityNO2DataFrame) """ city country datetime location parameter value unit month 0 Paris FR 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 µg/m³ 6 1 Paris FR 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 µg/m³ 6 2 Paris FR 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 µg/m³ 6 3 Paris FR 2019-06-20 21:00:00+00:00 FR04014 no2 24.9 µg/m³ 6 4 Paris FR 2019-06-20 20:00:00+00:00 FR04014 no2 21.4 µg/m³ 6 ... ... ... ... ... ... ... ... ... 2063 London GB 2019-05-07 06:00:00+00:00 London Westminster no2 26.0 µg/m³ 5 2064 London GB 2019-05-07 04:00:00+00:00 London Westminster no2 16.0 µg/m³ 5 2065 London GB 2019-05-07 03:00:00+00:00 London Westminster no2 19.0 µg/m³ 5 2066 London GB 2019-05-07 02:00:00+00:00 London Westminster no2 19.0 µg/m³ 5 2067 London GB 2019-05-07 01:00:00+00:00 London Westminster no2 23.0 µg/m³ 5 [2068 rows x 8 columns] """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
더 읽기
■ read_csv 함수의 parse_dates 인자를 사용해 특정 컬럼을 날짜 타입으로 로드하는 방법을 보여준다. ▶ main.py
|
import pandas as pd airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv", parse_dates = ["date.utc"]) renameAirQualityNO2DataFrame = airQualityNO2DataFrame.rename(columns = {"date.utc" : "datetime"}) print(renameAirQualityNO2DataFrame["datetime"].dtype) # datetime64[ns, UTC] |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
더 읽기
■ to_datetime 함수를 사용해 DataFrame 객체의 특정 컬럼을 날짜 타입으로 변경하는 방법을 보여준다. ▶ main.py
|
import pandas as pd airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv") renameAirQualityNO2DataFrame = airQualityNO2DataFrame.rename(columns = {"date.utc" : "datetime"}) print(renameAirQualityNO2DataFrame["datetime"].dtype) # object renameAirQualityNO2DataFrame["datetime"] = pd.to_datetime(renameAirQualityNO2DataFrame["datetime"]) print(renameAirQualityNO2DataFrame["datetime"].dtype) # datetime64[ns, UTC] |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
더 읽기
■ Series 클래스의 unique 메소드를 사용해 유일 값 리스트를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
import pandas as pd airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv") renameAirQualityNO2DataFrame = airQualityNO2DataFrame.rename(columns = {"date.utc" : "datetime"}) citySeries = renameAirQualityNO2DataFrame.city cityNDArray = citySeries.unique() print(cityNDArray) """ ['Paris' 'Antwerpen' 'London'] """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ DataFrame 클래스에서 특정 컬럼 Series 객체를 구하는 방법을 보여준다. ▶ main.py
|
import pandas as pd airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv") renameAirQualityNO2DataFrame = airQualityNO2DataFrame.rename(columns = {"date.utc" : "datetime"}) citySeries = renameAirQualityNO2DataFrame.city |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
더 읽기
■ DataFrame 클래스의 rename 메소드에서 columns 인자를 사용해 컬럼명을 변경하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv") renameAirQualityNO2DataFrame = airQualityNO2DataFrame.rename(columns = {"date.utc" : "datetime"}) print(renameAirQualityNO2DataFrame) """ city country datetime location parameter value unit 0 Paris FR 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 µg/m³ 1 Paris FR 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 µg/m³ 2 Paris FR 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 µg/m³ 3 Paris FR 2019-06-20 21:00:00+00:00 FR04014 no2 24.9 µg/m³ 4 Paris FR 2019-06-20 20:00:00+00:00 FR04014 no2 21.4 µg/m³ ... ... ... ... ... ... ... ... 2063 London GB 2019-05-07 06:00:00+00:00 London Westminster no2 26.0 µg/m³ 2064 London GB 2019-05-07 04:00:00+00:00 London Westminster no2 16.0 µg/m³ 2065 London GB 2019-05-07 03:00:00+00:00 London Westminster no2 19.0 µg/m³ 2066 London GB 2019-05-07 02:00:00+00:00 London Westminster no2 19.0 µg/m³ 2067 London GB 2019-05-07 01:00:00+00:00 London Westminster no2 23.0 µg/m³ [2068 rows x 7 columns] """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ merge 함수의 how/left_on/right_on 인자를 사용해 데이터를 조인하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
|
import pandas as pd airQualityParameterDataFrame = pd.read_csv("air_quality_parameters.csv") airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv" , parse_dates = True)[["date.utc", "location", "parameter", "value"]] AirQualityPM25DataFrame = pd.read_csv("air_quality_pm25_long.csv", parse_dates = True)[["date.utc", "location", "parameter", "value"]] airQualityDataFrame = pd.concat([AirQualityPM25DataFrame, airQualityNO2DataFrame], axis = 0).sort_values("date.utc") targetDataFrame = pd.merge(airQualityDataFrame, airQualityParameterDataFrame, how = "left", left_on = "parameter", right_on = "id") print(targetDataFrame) """ date.utc location parameter value id description name 0 2019-05-07 01:00:00+00:00 London Westminster no2 23.0 no2 Nitrogen Dioxide NO2 1 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 no2 Nitrogen Dioxide NO2 2 2019-05-07 01:00:00+00:00 BETR801 pm25 12.5 pm25 Particulate matter less than 2.5 micrometers i... PM2.5 3 2019-05-07 01:00:00+00:00 BETR801 no2 50.5 no2 Nitrogen Dioxide NO2 4 2019-05-07 01:00:00+00:00 London Westminster pm25 8.0 pm25 Particulate matter less than 2.5 micrometers i... PM2.5 ... ... ... ... ... ... ... ... 3173 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 no2 Nitrogen Dioxide NO2 3174 2019-06-20 23:00:00+00:00 London Westminster pm25 7.0 pm25 Particulate matter less than 2.5 micrometers i... PM2.5 3175 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 no2 Nitrogen Dioxide NO2 3176 2019-06-21 00:00:00+00:00 London Westminster pm25 7.0 pm25 Particulate matter less than 2.5 micrometers i... PM2.5 3177 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 no2 Nitrogen Dioxide NO2 [3178 rows x 7 columns] """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
더 읽기
■ merge 함수의 how/on 인자를 사용해 데이터를 조인하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
|
import pandas as pd airQualityStationDataFrame = pd.read_csv("air_quality_stations.csv" ) airQualityNO2DataFrame = pd.read_csv("air_quality_no2_long.csv" , parse_dates = True)[["date.utc", "location", "parameter", "value"]] AirQualityPM25DataFrame = pd.read_csv("air_quality_pm25_long.csv", parse_dates = True)[["date.utc", "location", "parameter", "value"]] airQualityDataFrame = pd.concat([AirQualityPM25DataFrame, airQualityNO2DataFrame], axis = 0).sort_values("date.utc") targetDataFrame = pd.merge(airQualityDataFrame, airQualityStationDataFrame, how = "left", on = "location") print(targetDataFrame) """ date.utc location parameter value coordinates.latitude coordinates.longitude 0 2019-05-07 01:00:00+00:00 London Westminster no2 23.0 51.49467 -0.13193 1 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 48.83724 2.39390 2 2019-05-07 01:00:00+00:00 FR04014 no2 25.0 48.83722 2.39390 3 2019-05-07 01:00:00+00:00 BETR801 pm25 12.5 51.20966 4.43182 4 2019-05-07 01:00:00+00:00 BETR801 no2 50.5 51.20966 4.43182 ... ... ... ... ... ... ... 4177 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 48.83724 2.39390 4178 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 48.83722 2.39390 4179 2019-06-21 00:00:00+00:00 London Westminster pm25 7.0 51.49467 -0.13193 4180 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 48.83724 2.39390 4181 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 48.83722 2.39390 [4182 rows x 6 columns] """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
더 읽기
■ DataFrame 클래스의 reset_index 메소드에서 level 인자를 사용해 모든 수준의 인덱스를 열로 변환하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_no2_long.csv", parse_dates = True) dataFrame2 = dataFrame1[["date.utc", "location","parameter", "value"]] dataFrame3 = pd.read_csv("air_quality_pm25_long.csv",parse_dates = True) dataFrame4 = dataFrame3[["date.utc", "location","parameter", "value"]] dataFrame5 = pd.concat([dataFrame2, dataFrame4], keys = ["NO2", "PM25"]) dataFrame6 = dataFrame5.reset_index(level = 0) print(dataFrame6.head()) """ level_0 date.utc location parameter value 0 NO2 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 1 NO2 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 2 NO2 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 3 NO2 2019-06-20 21:00:00+00:00 FR04014 no2 24.9 4 NO2 2019-06-20 20:00:00+00:00 FR04014 no2 21.4 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※
더 읽기
■ concat 함수의 keys 인자를 사용해 N개의 DataFrame 객체를 순서대로 결합하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
import pandas as pd no2DataFrame1 = pd.read_csv("air_quality_no2_long.csv", parse_dates = True) no2DataFrame2 = no2DataFrame1[["date.utc", "location","parameter", "value"]] pm25DataFrame1 = pd.read_csv("air_quality_pm25_long.csv",parse_dates = True) pm25DataFrame2 = pm25DataFrame1[["date.utc", "location","parameter", "value"]] concatDataFrame = pd.concat([no2DataFrame2, pm25DataFrame2], keys = ["NO2", "PM25"]) print(concatDataFrame.head()) """ date.utc location parameter value NO2 0 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 1 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 2 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 3 2019-06-20 21:00:00+00:00 FR04014 no2 24.9 4 2019-06-20 20:00:00+00:00 FR04014 no2 21.4 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
더 읽기
■ concat 함수를 사용해 N개의 DataFrame 객체를 순서대로 결합하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
|
import pandas as pd no2DataFrame1 = pd.read_csv("air_quality_no2_long.csv", parse_dates = True) no2DataFrame2 = no2DataFrame1[["date.utc", "location","parameter", "value"]] pm25DataFrame1 = pd.read_csv("air_quality_pm25_long.csv",parse_dates = True) pm25DataFrame2 = pm25DataFrame1[["date.utc", "location","parameter", "value"]] concatDataFrame = pd.concat([no2DataFrame2, pm25DataFrame2], axis = 0) # axis : 0(행), 1(컬럼) print(concatDataFrame.head()) """ date.utc location parameter value 0 2019-06-21 00:00:00+00:00 FR04014 no2 20.0 1 2019-06-20 23:00:00+00:00 FR04014 no2 21.8 2 2019-06-20 22:00:00+00:00 FR04014 no2 26.5 3 2019-06-20 21:00:00+00:00 FR04014 no2 24.9 4 2019-06-20 20:00:00+00:00 FR04014 no2 21.4 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ DataFrame 클래스의 melt 메소드에서 id_vars/value_vars/value_name/var_name 인자를 사용하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_long.csv", index_col = "date.utc", parse_dates = True) # 5272건 dataFrame2 = dataFrame1[dataFrame1["parameter"] == "no2"] # 3447건 dataFrame3 = dataFrame2.pivot(columns = "location", values = "value") dataFrame4 = dataFrame3.reset_index() print(dataFrame4) """ location date.utc BETR801 FR04014 London Westminster 0 2019-04-09 01:00:00+00:00 22.5 24.4 NaN 1 2019-04-09 02:00:00+00:00 53.5 27.4 67.0 2 2019-04-09 03:00:00+00:00 54.5 34.2 67.0 3 2019-04-09 04:00:00+00:00 34.5 48.5 41.0 4 2019-04-09 05:00:00+00:00 46.5 59.5 41.0 ... ... ... ... ... 1700 2019-06-20 20:00:00+00:00 NaN 21.4 NaN 1701 2019-06-20 21:00:00+00:00 NaN 24.9 NaN 1702 2019-06-20 22:00:00+00:00 NaN 26.5 NaN 1703 2019-06-20 23:00:00+00:00 NaN 21.8 NaN 1704 2019-06-21 00:00:00+00:00 NaN 20.0 NaN [1705 rows x 4 columns] """ print() dataFrame5 = dataFrame4.melt( id_vars = "date.utc", value_vars = ["BETR801", "FR04014", "London Westminster"], value_name = "NO_2", var_name = "id_location" ) print(dataFrame5) """ date.utc id_location NO_2 0 2019-04-09 01:00:00+00:00 BETR801 22.5 1 2019-04-09 02:00:00+00:00 BETR801 53.5 2 2019-04-09 03:00:00+00:00 BETR801 54.5 3 2019-04-09 04:00:00+00:00 BETR801 34.5 4 2019-04-09 05:00:00+00:00 BETR801 46.5 ... ... ... ... 5110 2019-06-20 20:00:00+00:00 London Westminster NaN 5111 2019-06-20 21:00:00+00:00 London Westminster NaN 5112 2019-06-20 22:00:00+00:00 London Westminster NaN 5113 2019-06-20 23:00:00+00:00 London Westminster NaN 5114 2019-06-21 00:00:00+00:00 London Westminster NaN [5115 rows x 3 columns] """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을 실행했다.
더 읽기
■ DataFrame 클래스의 melt 메소드에서 id_vars 인자를 사용해 WIDE 포맷 데이터에서 LONG 포맷 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_long.csv", index_col = "date.utc", parse_dates = True) # 5272건 dataFrame2 = dataFrame1[dataFrame1["parameter"] == "no2"] # 3447건 dataFrame3 = dataFrame2.pivot(columns = "location", values = "value") dataFrame4 = dataFrame3.reset_index() print(dataFrame4) """ location date.utc BETR801 FR04014 London Westminster 0 2019-04-09 01:00:00+00:00 22.5 24.4 NaN 1 2019-04-09 02:00:00+00:00 53.5 27.4 67.0 2 2019-04-09 03:00:00+00:00 54.5 34.2 67.0 3 2019-04-09 04:00:00+00:00 34.5 48.5 41.0 4 2019-04-09 05:00:00+00:00 46.5 59.5 41.0 ... ... ... ... ... 1700 2019-06-20 20:00:00+00:00 NaN 21.4 NaN 1701 2019-06-20 21:00:00+00:00 NaN 24.9 NaN 1702 2019-06-20 22:00:00+00:00 NaN 26.5 NaN 1703 2019-06-20 23:00:00+00:00 NaN 21.8 NaN 1704 2019-06-21 00:00:00+00:00 NaN 20.0 NaN [1705 rows x 4 columns] """ print() dataFrame5 = dataFrame4.melt(id_vars = "date.utc") print(dataFrame5) """ date.utc location value 0 2019-04-09 01:00:00+00:00 BETR801 22.5 1 2019-04-09 02:00:00+00:00 BETR801 53.5 2 2019-04-09 03:00:00+00:00 BETR801 54.5 3 2019-04-09 04:00:00+00:00 BETR801 34.5 4 2019-04-09 05:00:00+00:00 BETR801 46.5 ... ... ... ... 5110 2019-06-20 20:00:00+00:00 London Westminster NaN 5111 2019-06-20 21:00:00+00:00 London Westminster NaN 5112 2019-06-20 22:00:00+00:00 London Westminster NaN 5113 2019-06-20 23:00:00+00:00 London Westminster NaN 5114 2019-06-21 00:00:00+00:00 London Westminster NaN [5115 rows x 3 columns] """ |
▶ requirements.txt
더 읽기
■ DataFrame 클래스의 pivot 메소드에서 columns/values 인자를 사용해 LONG 포맷 데이터에서 WIDE 포맷 데이터를 구하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_long.csv", index_col = "date.utc", parse_dates = True) # 5272건 dataFrame2 = dataFrame1[dataFrame1["parameter"] == "no2"] # 3447건 dataFrame3 = dataFrame2.pivot(columns = "location", values = "value") print(dataFrame3) """ location BETR801 FR04014 London Westminster date.utc 2019-04-09 01:00:00+00:00 22.5 24.4 NaN 2019-04-09 02:00:00+00:00 53.5 27.4 67.0 2019-04-09 03:00:00+00:00 54.5 34.2 67.0 2019-04-09 04:00:00+00:00 34.5 48.5 41.0 2019-04-09 05:00:00+00:00 46.5 59.5 41.0 ... ... ... ... 2019-06-20 20:00:00+00:00 NaN 21.4 NaN 2019-06-20 21:00:00+00:00 NaN 24.9 NaN 2019-06-20 22:00:00+00:00 NaN 26.5 NaN 2019-06-20 23:00:00+00:00 NaN 21.8 NaN 2019-06-21 00:00:00+00:00 NaN 20.0 NaN [1705 rows x 3 columns] """ |
▶ requirements.txt
더 읽기
■ DataFrame 클래스의 reset_index 메소드를 사용해 신규 인덱스를 추가하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_long.csv", index_col = "date.utc", parse_dates = True) # 5272건 dataFrame2 = dataFrame1.pivot_table(values = "value", index = "location", columns = "parameter", aggfunc = "mean", margins = True) dataFrame3 = dataFrame2.reset_index() print(dataFrame3) """ parameter location no2 pm25 All 0 BETR801 26.950920 23.169492 24.982353 1 FR04014 29.374284 NaN 29.374284 2 London Westminster 29.740050 13.443568 21.491708 3 All 29.430316 14.386849 24.222743 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ DataFrame 클래스의 groupby 메소드를 사용해 피벗 데이터를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_long.csv", index_col = "date.utc", parse_dates = True) # 5272건 dataFrame2 = dataFrame1.pivot_table(values = "value", index = "location", columns = "parameter", aggfunc = "mean", margins = True) print(dataFrame2) """ parameter no2 pm25 All location BETR801 26.950920 23.169492 24.982353 FR04014 29.374284 NaN 29.374284 London Westminster 29.740050 13.443568 21.491708 All 29.430316 14.386849 24.222743 """ print() dataFrame3 = dataFrame1.groupby(["parameter", "location"])[["value"]].mean() print(dataFrame3) """ value parameter location no2 BETR801 26.950920 FR04014 29.374284 London Westminster 29.740050 pm25 BETR801 23.169492 London Westminster 13.443568 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas 명령을
더 읽기
■ DataFrame 클래스의 pivot_table 메소드에서 margins 인자를 사용해 소계/총계를 표시하는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_long.csv", index_col = "date.utc", parse_dates = True) # 5272건 dataFrame2 = dataFrame1.pivot_table(values = "value", index = "location", columns = "parameter", aggfunc = "mean", margins = True) print(dataFrame2) """ parameter no2 pm25 All location BETR801 26.950920 23.169492 24.982353 FR04014 29.374284 NaN 29.374284 London Westminster 29.740050 13.443568 21.491708 All 29.430316 14.386849 24.222743 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install pandas
더 읽기
■ DataFrame 클래스의 pivot_table 메소드에서 values/index/columns/aggfunc 인자를 사용해 피벗 데이터를 만드는 방법을 보여준다. ▶ main.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
import pandas as pd dataFrame1 = pd.read_csv("air_quality_long.csv", index_col = "date.utc", parse_dates = True) # 5272건 dataFrame2 = dataFrame1.pivot_table(values = "value", index = "location", columns = "parameter", aggfunc = "mean") print(dataFrame2) """ parameter no2 pm25 location BETR801 26.950920 23.169492 FR04014 29.374284 NaN London Westminster 29.740050 13.443568 """ |
▶ requirements.txt
|
numpy==2.1.2 pandas==2.2.3 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.16.0 tzdata==2024.2 |
※ pip install
더 읽기