Notification System
Relax integrates the Apprise notification library, supporting sending training progress and alerts to various notification services.
Features
1. Training Startup Notification
When training starts, a startup notification is sent containing the project name and experiment name.
2. Training Metrics Updates
Periodically sends training metrics reports, including:
- Current training step
- Current values of key metrics
- Changes compared to the previous update (with sign)
Tracked key metrics:
rollout/raw_reward- Raw rewardrollout/reward- Processed rewardtrain/grad_norm- Gradient normtrain/entropy_loss- Entropy losstrain/policy_loss- Policy losstrain/value_loss- Value losstrain/learning_rate- Learning ratetrain/kl_divergence- KL divergence
3. Training Completion Notification
When training completes normally, a completion notification is sent with a summary of final metrics.
4. Exception Alerts
If training exits abnormally (without calling finish properly), an alert notification is automatically sent.
Usage
Install Dependencies
pip install appriseConfigure Notification URLs
Specify one or more notification service URLs using the --notify-urls parameter (comma-separated):
python relax/entrypoints/train.py \
--notify-urls "redcity://your_webhook_key?msgtype=markdown&freq=10" \
... other training parameters ...Multiple Notification Services
You can configure multiple notification services simultaneously:
--notify-urls "redcity://webhook1?msgtype=markdown,mailto://user:pass@gmail.com,slack://tokenA/tokenB/tokenC"Supported Notification Services
Apprise supports 80+ notification services, including:
Enterprise Internal Services
- RedCity:
redcity://webhook_key?msgtype=markdown&freq=10
Email Services
- Gmail:
mailto://user:pass@gmail.com - Outlook:
mailto://user:pass@outlook.com
Instant Messaging
- Feishu/Lark:
feishu://token_id/token_secret - Slack:
slack://tokenA/tokenB/tokenC - Discord:
discord://webhook_id/webhook_token - Microsoft Teams:
msteams://TokenA/TokenB/TokenC - Telegram:
tgram://bot_token/chat_id
Other Services
- Webhook:
json://hostname/pathorxml://hostname/path - For more services, see Apprise Documentation
Notification Format Examples
Training Startup Notification
# 🚀 Training Started
**Project**: MyProject
**Experiment**: Experiment-001
**Status**: Started
---
*Training metrics will be updated periodically*Training Metrics Report
# 📊 Training Report - Step 100
- **Raw Reward**: 0.75 (+0.05)
- **Grad Norm**: 1.23 (+0.15)
- **Entropy Loss**: 0.45 (-0.02)
- **Policy Loss**: 0.32 (-0.01)Training Completion Notification
# ✅ Training Completed
**Status**: Training completed successfully
**Final Metrics**:
- Raw Reward: 0.85
- Grad Norm: 1.10
- Entropy Loss: 0.40
- Policy Loss: 0.28Exception Alert
# ⚠️ Training Terminated Abnormally
**Status**: Training process exited abnormally
Please check logs for detailed information.Feishu/Lark Integration Example
1. Create a Feishu Custom Bot
- In a Feishu group chat, go to Settings → Group Bots → Add Bot → Custom Bot
- Copy the Webhook URL, which looks like:
https://open.feishu.cn/open-apis/bot/v2/hook/{token} - (Optional) Configure signature verification and obtain the
token_secret
2. Configure Notification URL
# Without signature verification
python relax/entrypoints/train.py \
--notify-urls "feishu://{token}" \
...
# With signature verification
python relax/entrypoints/train.py \
--notify-urls "feishu://{token}/{token_secret}" \
...3. Combine with Other Notification Services
--notify-urls "feishu://{token}/{token_secret}?freq=10,redcity://webhook_key?msgtype=markdown"Configuration Recommendations
Frequency Control
For high-frequency training (multiple updates per second), it is recommended to:
- Use the
freqparameter to control notification frequency (e.g.,freq=10means send 1 out of every 10) - Or control the call frequency at the application level
Message Format
- Services that support Markdown like RedCity should use
msgtype=markdown - Plain text services will automatically convert the format
Security
- Do not hardcode sensitive information (such as passwords, tokens) in code
- It is recommended to pass them through environment variables or configuration files
- Example:
--notify-urls "$NOTIFY_URL"
Troubleshooting
Notifications Not Sent
- Check if Apprise is installed:
pip list | grep apprise - Check logs for error messages
- Verify that the notification URL format is correct
- Test if the notification service is reachable
Notification Frequency Too High
- Use the
freqparameter to limit frequency - Or add conditional checks in code to send only at specific steps
Format Issues
- Ensure the notification service supports Markdown (like RedCity)
- For services that don't support it, Apprise will automatically downgrade to plain text
