4주차 TIL - 데이터 시각화 시작!

카테고리 없음

4주차 TIL - 데이터 시각화 시작!

게임취업하고싶은 사람 2025. 1. 2. 20:49

# 필요한 라이브러리 임포트
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 1. 데이터 로드
# Seaborn의 타이타닉 데이터셋 사용
df = sns.load_dataset('titanic')

# 2. 데이터 확인
print(df.head()) # 데이터의 상위 5개 행 확인
print(df.info()) # 데이터 구조 및 결측값 확인

# 3. 데이터 전처리
# 결측값 제거
df = df.dropna(subset=['age', 'sex', 'pclass', 'survived'])

# 연령대를 새롭게 정의 (0-10: 어린이, 11-20: 청소년, 21-30: 청년 등)
bins = [0, 10, 20, 30, 40, 50, 60, 70, 80]
labels = ['0-10', '11-20', '21-30', '31-40', '41-50', '51-60', '61-70', '71-80']
df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)

# 4. 성별 생존자 비율 분석
gender_survival = df[df['survived'] == 1]['sex'].value_counts()
plt.figure(figsize=(6, 6))
plt.pie(gender_survival, labels=gender_survival.index,

autopct='%1.1f%%',

startangle=90, colors=['lightblue', 'pink'])
plt.title('Survival Rate by Gender')
plt.show()

# 5. 연령대별 생존자 비율 분석
age_survival = df[df['survived'] == 1]['age_group'].value_counts()
plt.figure(figsize=(8, 8))
plt.pie(age_survival, labels=age_survival.index,

autopct='%1.1f%%', startangle=90,

colors=sns.color_palette('pastel', len(age_survival)))
plt.title('Survival Rate by Age Group')
plt.show()

# 6. 좌석 등급에 따른 생존자 비율 분석
pclass_survival = df[df['survived'] == 1]['pclass'].value_counts()
pclass_labels = ['1st Class', '2nd Class', '3rd Class']
plt.figure(figsize=(6, 6))
plt.pie(pclass_survival, labels=pclass_labels,

autopct='%1.1f%%', startangle=90,

colors=sns.color_palette('cool', len(pclass_survival)))
plt.title('Survival Rate by Pclass')
plt.show()

현재글4주차 TIL - 데이터 시각화 시작!

취업하고싶어요

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

취업하고싶어요

4주차 TIL - 데이터 시각화 시작!

'카테고리 없음'의 다른글

티스토리툴바