## ๐Ÿ“„ train_model.py - Concept ์„ค๋ช… ๋ฌธ์„œ ### โœ… ๋ชฉ์  CDS(Complete Data Set)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ์…‹์„ ์ƒ์„ฑํ•˜๊ณ , XGBoost ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ์ƒ์œ„ ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ชจ๋ธ์„ ์ €์žฅํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ์ž…๋‹ˆ๋‹ค. --- ### ๐Ÿ“‚ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ - `cds_dir`: `_ohlcv.csv` ํ˜•์‹์˜ ์ข…๋ชฉ๋ณ„ CDS๊ฐ€ ์ €์žฅ๋œ ๋””๋ ‰ํ† ๋ฆฌ - ์˜ˆ: `AAPL_ohlcv.csv`, `MSFT_ohlcv.csv` - ํŒŒ์ผ ๊ตฌ์กฐ๋Š” OHLCV(Time Series) ํ˜•ํƒœ --- ### โš™๏ธ ์ฃผ์š” ๊ธฐ๋Šฅ 1. **๋ฐ์ดํ„ฐ ์ ์žฌ ๋ฐ ํ†ตํ•ฉ** - ์ข…๋ชฉ๋ณ„ CDS ํŒŒ์ผ์„ ๋ชจ๋‘ ์ฝ์–ด ํ”ผ์ฒ˜(X), ํƒ€๊นƒ(y)์œผ๋กœ ๋ณ€ํ™˜ - `build_dataset()` ํ˜ธ์ถœ โ†’ ๊ธฐ์ˆ  ์ง€ํ‘œ ํ”ผ์ฒ˜ ๋“ฑ ํฌํ•จ ๊ฐ€๋Šฅ 2. **ํด๋ž˜์Šค ๋ถ„ํฌ ํ™•์ธ ๋ฐ ๋ถˆ๊ท ํ˜• ๋ณด์ •** - `y_total`์˜ ํด๋ž˜์Šค ๋น„์œจ(์ƒ์Šน/ํ•˜๋ฝ) ์ถœ๋ ฅ - `scale_pos_weight` ์ž๋™ ์กฐ์ • โ†’ ๋ถˆ๊ท ํ˜•์— ๊ฐ•ํ•œ ํ•™์Šต ๊ตฌ์กฐ ์ง€์› 3. **XGBoost ๋ชจ๋ธ ํ•™์Šต + ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹** - `GridSearchCV`๋กœ ์ตœ์  ํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ - ํŠœ๋‹ ๋Œ€์ƒ: `max_depth`, `learning_rate`, `n_estimators` 4. **๋ชจ๋ธ ํ‰๊ฐ€ ์ง€ํ‘œ ์ถœ๋ ฅ** - ์ •ํ™•๋„ (`accuracy_score`) - AUC (`roc_auc_score`) - F1 ์ ์ˆ˜ (`f1_score`) - LogLoss (`log_loss`) - Precision@TopN (`precision@TopN`, ์˜ˆ: P@50) 5. **๋ชจ๋ธ ์ €์žฅ (๋ฒ„์ „ ๊ด€๋ฆฌ ํฌํ•จ)** - ์ €์žฅ ๊ฒฝ๋กœ: `data_analysis_engine/models/model_YYYY-MM-DD.json` - ๋‚ ์งœ ๊ธฐ๋ฐ˜ ๋ฒ„์ „ ๊ด€๋ฆฌ ์ž๋™ ์ˆ˜ํ–‰ 6. **ํ•™์Šต ๋กœ๊ทธ ์ž๋™ ๊ธฐ๋ก** - `train_log.csv`์— ๋‚ ์งœ, ์„ฑ๋Šฅ ์ง€ํ‘œ, ํŒŒ๋ผ๋ฏธํ„ฐ, ์ƒ˜ํ”Œ ์ˆ˜ ๊ธฐ๋ก --- ### ๐Ÿงช ์‚ฌ์šฉ ๋ฐฉ๋ฒ• ```bash python -m data_analysis_engine.train_model ``` ๋˜๋Š” ๋‚ด๋ถ€์—์„œ import ํ›„ `train_model("data")` ํ˜ธ์ถœ --- ### ๐Ÿง  ํ–ฅํ›„ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ - Optuna ๊ธฐ๋ฐ˜ ์ž๋™ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰ - K-fold ๊ต์ฐจ ๊ฒ€์ฆ ํ‰๊ฐ€ ๊ตฌ์กฐ ๋„์ž… - ์˜ˆ์ธก ๊ธฐ๋ฐ˜ ROI ํ”ผ์ฒ˜ ํ•™์Šต (Target ๋‹ค์–‘ํ™”) - ์™ธ๋ถ€ ํ‰๊ฐ€ ์„ธํŠธ ์ ์šฉ ๋ฐ ๋ชจ๋ธ ๋น„๊ต ๋ฆฌํฌํŠธ ์ƒ์„ฑ --- ### โš ๏ธ ์ฃผ์˜์‚ฌํ•ญ - ๋ชจ๋“  CDS ํŒŒ์ผ์€ ๋น„์–ด ์žˆ์ง€ ์•Š์•„์•ผ ํ•˜๋ฉฐ, `_ohlcv.csv` ํ™•์žฅ์ž๋ฅผ ๋”ฐ๋ผ์•ผ ํ•จ - ํ”ผ์ฒ˜ ์ˆ˜๊ฐ€ ๋ณ€ํ•˜๋ฉด ๋ชจ๋ธ ๊ตฌ์กฐ๋„ ๋ฐ˜๋“œ์‹œ ์žฌํ•™์Šต ํ•„์š” --- ### ๐Ÿ“Œ ๊ด€๋ จ ํŒŒ์ผ - `dataset_builder.py`: X, y ์ „์ฒ˜๋ฆฌ ์ƒ์„ฑ ๋ฐ ๊ธฐ์ˆ  ์ง€ํ‘œ ํฌํ•จ - `xgboost_model.py`: ๋ชจ๋ธ ํด๋ž˜์Šค ์ •์˜ ๋ฐ ์ €์žฅ/๋ถˆ๋Ÿฌ์˜ค๊ธฐ - `X_total.csv`, `y_total.csv`: ํ†ตํ•ฉ ํ•™์Šต ๋ฐ์ดํ„ฐ - `model_YYYY-MM-DD.json`: ํ•™์Šต๋œ XGBoost ๋ชจ๋ธ - `train_log.csv`: ํ•™์Šต ๊ฒฐ๊ณผ ๋ˆ„์  ๋กœ๊ทธ