Connecting…
Select a session…
Idle
Step — / — —%
📊 Live Metrics
Loss
Current training loss
Learning Rate
Current LR
GPU Util
Compute utilization
GPU Memory
VRAM usage
GPU Temp
Temperature °C
GPU Power
Watts drawn
Epoch
Current / Total
Batch Size
BS × GA = effective
Steps / sec
Training throughput
Tokens / sec
Estimated tok/s
MFU
Model FLOPs util
ETA
Time remaining
Cost
Estimated GPU cost
Progress
Overall completion
Disk
Last Checkpoint
Most recent save
🤖 AI Guardian & Analysis
LATEST ANALYSIS
AI analysis will appear here once training data is available…
ACTIVITY LOG

Loading activity…

💬 Ask Training AI
🎮 Training Controls
⚙️ GPU Memory & Training Settings
80%
70% Safe80% Balanced90% Aggressive
Lower = safer (avoids OOM) · Higher = faster (larger effective batch)
📈 Training Charts
Training Metrics
Training Loss
Loss Stability (Rolling Std Dev)
Learning Rate
GPU Memory
GPU Memory (GB)
GPU Memory %
GPU Memory Breakdown (Model · Grad · Optim · Activ · Cache · Free)
GPU Memory Timeline
GPU Health
GPU Utilization %
GPU Temperature
GPU Power Draw
Performance & Efficiency
Steps / sec
Tokens / sec
MFU %
Cost Tracker ($)
Advanced Analysis
🔬 Training Scattergram (interactive)
vs colored by
Points colored from early → mid → late in training
Gradient Norm
Loss vs Learning Rate
ETA Remaining (hours)
Multi-GPU Balance (Utilization %)
Checkpoint Timeline
🧹 Data Quality Metrics
Dataset Size
Training samples
Avg Sample Len
Tokens per sample
Max Seq Length
Context window
Total Tokens
In training set
Data Quality
Overall score
📜 Training Log
[--:--:--]Waiting for training to start…
💾 Checkpoints

Select a session to view checkpoints…

📦 Model Downloads
Checking status…
Preparing… 0%