Skip to content

Configuration Reference

The parameters are divided into three categories:

  1. Megatron parameters: Relax will read all parameters defined in Megatron from the PYTHONPATH. You can configure them by passing arguments such as --tensor-model-parallel-size 2.
  2. SGLang parameters: All parameters supported by the installed SGLang environment are available. These parameters must be prefixed with --sglang. For example, --mem-fraction-static should be passed as --sglang-mem-fraction-static.
  3. Relax-specific parameters: Please refer to relax/utils/arguments.py.

For common configuration usage and examples, see the Quick Start Guide.


Cluster and Resource Configuration

Ray Launch Parameters

ParameterTypeDefaultDescription
--rollout-num-gpus-per-engineint1GPUs per SGLang inference engine, equivalent to SGLang's tp_size
--num-gpus-per-nodeint8GPUs per node. Set this if using fewer than 8 GPUs per node in colocate mode
--resourcejson-Ray resource configuration in JSON format. Example: '{"actor":[replicas, gpus], "rollout":[replicas, gpus]}'
--colocateflagFalseWhether to colocate inference engines and training Actors on the same GPUs. Automatically sets --offload to True
--offloadflagFalseEquivalent to setting both --offload-train and --offload-rollout
--offload-trainflagNoneWhether to offload training Actor to CPU during training. Always True when --colocate is enabled
--offload-rolloutflagNoneWhether to offload Rollout generator to CPU during training. Always True when --colocate is enabled
--distributed-backendstrncclDistributed backend
--distributed-timeout-minutesint10Distributed timeout in minutes

TransferQueue Data Queue

ParameterTypeDefaultDescription
--num-data-storage-unitsint1Number of TransferQueue SimpleStorageUnit actors
--max-stalenessint0Maximum staleness for TransferQueue data system (0=on-policy)
--polling-modeboolTrueWhether to use polling mode when fetching metadata
--num-iters-per-train-updateint1Number of iterations per global batch in fully async pipeline

Training Backend and Mode

ParameterTypeDefaultOptionsDescription
--train-backendstrmegatronmegatronTraining backend selection
--qkv-formatstrthdthd, bshdQKV layout for Megatron backend. Dynamic batching not supported in bshd mode; must specify --micro-batch-size
--megatron-to-hf-modestrrawraw, bridgeMegatron to HF weight conversion method. bridge uses megatron bridge for automatic conversion
--true-on-policy-modeflagFalse-Whether to enable true on-policy mode
--fully-asyncflagFalse-Whether to use fully asynchronous training pipeline

Checkpoint Configuration

ParameterTypeDefaultDescription
--hf-checkpointstrNoneHuggingFace model checkpoint path. Used to initialize SGLang and provide tokenizer. Need not contain latest parameters, only consistent with training model architecture
--ref-loadstrNoneReference model checkpoint path. Used as initial checkpoint for training when --load is not set
--ref-ckpt-stepintNoneReference model checkpoint step
--loadstrNoneActor model checkpoint load path. Specify for resuming training
--savestrNonePath to save model during training
--save-intervalintNoneModel save interval in steps
--save-hfstrNonePath to save HuggingFace format model for Megatron backend. Path can include {rollout_id} placeholder
--async-saveflagFalseAsynchronous checkpoint saving
--no-save-optimflagFalseDo not save optimizer state in checkpoint. Reduces checkpoint size but prevents resuming training from that checkpoint
--rotate-ckptflagFalseWhether to rotate checkpoints. Requires setting --save, --save-interval, and --async-save
--max-actor-ckpt-to-keepintNoneMaximum number of Actor checkpoints to keep
--checkpoint-engine-backendstrncclCheckpoint engine backend
--critic-loadstrNoneCritic model checkpoint path. When None, equals --load
--critic-savestrNoneCritic model save path

Data Configuration

Dataset

ParameterTypeDefaultDescription
--prompt-datastrNoneTraining prompt dataset path
--input-keystrinputKey for input field in dataset
--label-keystrNoneKey for label field in dataset
--metadata-keystrmetadataKey for metadata field in dataset
--tool-keystrtoolsKey for tools field when applying Chat Template
--apply-chat-templateflagFalseApply Chat Template to input as OpenAI message format
--apply-chat-template-kwargsjson{}Additional parameters for Chat Template
--system-promptstrNoneOptional system prompt added before user input. Final message is <system_prompt> + <dataset_prompt>
--rollout-shuffleflagFalseWhether to shuffle prompt order during Rollout
--rollout-seedint42Random seed for Rollout, used for shuffling prompts and random sampling
--use-streaming-datasetflagFalseUse streaming dataset to save memory
--streaming-buffer-sizeint10000Buffer size for streaming dataset
--prefetch-chunk-sizeint32Number of samples to dispatch to the thread-pool in each prefetch round. Larger values increase throughput but also memory pressure. Only effective when --use-streaming-dataset is set and the dataset contains multimodal data
--prefetch-max-cachedint256Maximum number of pre-loaded samples kept in the prefetch cache. When the cache is full the background prefetch thread pauses until consumers free space. Set to 0 to disable prefetching. Only effective when --use-streaming-dataset is set and the dataset contains multimodal data
--prefetch-num-workersint1Number of parallel worker threads inside the prefetch buffer for I/O-bound media decoding (video/image). Set to 1 to serialise all decoding (safest for FFmpeg which is not fully thread-safe). Higher values increase parallelism but may trigger EAGAIN errors on some platforms. Only effective when prefetching is enabled
--custom-prompt-pathstrNoneDotted import path to a custom function that transforms the prompt before conversation/multimodal processing. Function signature: def custom_fn(prompt, data: dict) -> prompt. Example: my_package.prompt_utils.add_prefix
--data-source-pathstrrelax.engine.rollout.data_source.RolloutDataSourceWithBufferRollout data source class path
--start-rollout-idintNoneStarting Rollout step. If not set, attempts to read from checkpoint specified by --load

Multimodal Data

ParameterTypeDefaultDescription
--multimodal-keysjsonNoneMultimodal data field mapping. Example: '{"image": "image_key"}'
--use-audio-in-videoflagFalseWhether to process audio in video
--image-max-token-numintNoneMaximum token count for image processing. Default is 16384 if not set
--image-min-token-numintNoneMinimum token count for image processing. Default is 4 if not set
--video-min-token-numintNoneMinimum token count for video frame processing. Default is 128 if not set
--video-max-token-numintNoneMaximum token count for video frame processing. Default is 768 if not set
--video-fpsfloatNoneTarget FPS for video processing. Default is 2.0 if not set
--video-fps-min-framesintNoneMinimum frames for video processing. Default is 4 if not set
--video-fps-max-framesintNoneMaximum frames for video processing. Default is 768 if not set
--image-resize-scale-factorintNoneScale factor for image resize dimension alignment. Default uses patch_size * spatial_merge_size (typically 28). Set to 0 to disable alignment
--audio-sample-rateintNoneSample rate for audio processing. Default is 16000 if not set
--frame-factorintNoneFrame alignment factor. Default is 2 if not set
--mm-processor-pool-sizeint0Size of the multimodal processor pool. 0 (default) disables the pool and uses ThreadPoolExecutor. When set to a positive integer, creates a ProcessPoolExecutor with the specified number of workers for true parallelism without GIL contention

Rollout Configuration

Sampling Parameters

ParameterTypeDefaultDescription
--num-rolloutintNoneTotal number of Rollout rounds. Choose either this or --num-epoch
--num-epochintNoneNumber of training epochs. Automatically calculates num_rollout based on dataset size. Ignored if --num-rollout is also set
--rollout-batch-sizeintRequiredNumber of prompts per rollout round. Total data = rollout-batch-size * n-samples-per-prompt
--n-samples-per-promptint1Number of responses generated per prompt
--rollout-temperaturefloat1.0Sampling temperature for inference engine
--rollout-top-pfloat1.0Top-p sampling parameter for inference engine
--rollout-top-kint-1Top-k sampling parameter for inference engine. -1 means not used
--rollout-max-response-lenintNoneMaximum response length, equivalent to SGLang's max_tokens
--rollout-max-prompt-lenintNoneMaximum prompt length. Filters long prompts during dataset initialization if set
--rollout-max-context-lenintNoneMaximum context length for inference engine. Should not exceed max_position_embeddings in HuggingFace model config.json
--rollout-stopstr (list)NoneStop words for Rollout. Can be one or multiple strings
--rollout-stop-token-idsint (list)NoneStop token IDs for Rollout
--rollout-skip-special-tokensflagFalseWhether to skip special tokens in responses

Oversampling and Dynamic Filtering

ParameterTypeDefaultDescription
--over-sampling-batch-sizeintNoneSampling batch granularity. When None, uses rollout-batch-size. Must be >= rollout-batch-size
--dynamic-sampling-filter-pathstrNoneDynamic sampling filter function path. Implements filters like DAPO (e.g., excluding all-correct or all-wrong samples). Example: relax.engine.filters.dynamic_sampling_filters.check_reward_nonzero_std
--buffer-filter-pathstrNoneBuffer filter function path. Function signature: list[list[Sample]] -> list[list[Sample]]

Partial Rollout

ParameterTypeDefaultDescription
--partial-rolloutflagFalseEnable partial Rollout. Incomplete samples are recycled to data buffer, suitable for long response scenarios
--partial-rollout-max-aborted-countintNoneMaximum number of times a sample can be aborted. After reaching threshold, sample is guaranteed to complete
--mask-offpolicy-in-partial-rolloutflagFalseWhether to mask previous generation in partial Rollout. When set, only on-policy generated tokens participate in training

Weight Update

ParameterTypeDefaultDescription
--update-weight-buffer-sizeint512MBBuffer size for weight updates in bytes. Updates weights in chunks, useful for MoE models
--update-weights-intervalint1Weight update interval
--keep-old-actorflagFalseWhether to keep Rollout model during training

External Inference Engine

ParameterTypeDefaultDescription
--rollout-externalflagFalseUse external SGLang instance instead of framework-launched instance
--rollout-external-engine-addrsstr (list)NoneList of external engine addresses and ports

SGLang Engine Parameters

For more parameters, refer to SGLang official documentation.

ParameterTypeDefaultDescription
--sglang-mem-fraction-staticfloat-SGLang static memory allocation ratio
--sglang-profileflagFalseEnable torch profiling on SGLang engines during rollout. Profile traces will be saved per rollout step
--sglang-profile-stepsint (list)NoneList of absolute rollout step IDs (0-indexed) at which to enable SGLang profiling. Takes precedence over --sglang-profile-step-start/end. Example: --sglang-profile-steps 3 10 50
--sglang-profile-step-startintNoneStart of the rollout step range for SGLang profiling (inclusive, 0-indexed). Used with --sglang-profile-step-end to specify a contiguous range. Ignored if --sglang-profile-steps is set
--sglang-profile-step-endintNoneEnd of the rollout step range for SGLang profiling (inclusive, 0-indexed). Used with --sglang-profile-step-start to specify a contiguous range. Ignored if --sglang-profile-steps is set. E.g. start=2, end=4 profiles steps 2, 3, 4
--sglang-profile-output-dirstrNoneOutput directory for SGLang profile traces. Defaults to traces/<tb_experiment_name>/sglang_trace
--sglang-profile-num-stepsint3Number of SGLang forward steps to profile per rollout. -1 profiles the entire rollout step until stop_profile is called
--sglang-profile-activitiesstr (list)["CPU", "GPU"]Activities to profile (e.g., CPU GPU)
--sglang-profile-by-stageflagFalseProfile by stage (prefill/decode) separately
--sglang-profile-with-stackflagFalseRecord call stack in profile traces
--sglang-profile-record-shapesflagFalseRecord tensor shapes in profile traces

Custom Rollout Functions

ParameterTypeDefaultDescription
--rollout-function-pathstrrelax.engine.rollout.sglang_rollout.generate_rolloutRollout generation function path
--custom-generate-function-pathstrNoneCustom generate function to replace default rollout generate function. Suitable for multi-turn dialogue, function calling, etc.
--rollout-data-postprocess-pathstrNoneRollout data postprocessing function, called after all data (including log_probs) is fetched. Can be used to update loss mask
--custom-rollout-log-function-pathstrNoneCustom Rollout logging function
--custom-eval-rollout-log-function-pathstrNoneCustom evaluation Rollout logging function

Batch Configuration

ParameterTypeDefaultDescription
--global-batch-sizeintNoneGlobal batch size. Defines sample count needed for one parameter update (optimizer.step)
--micro-batch-sizeint1Micro batch size. Ignored when --use-dynamic-batch-size is enabled
--num-steps-per-rolloutintNoneTraining steps per Rollout. Equivalent to setting GBS = rollout_batch_size * n_samples_per_prompt / num_steps_per_rollout
--use-dynamic-batch-sizeflagFalseEnable dynamic batching. Dynamically packs samples by length so each micro-batch's total tokens approach --max-tokens-per-gpu limit
--max-tokens-per-gpuintNoneMaximum tokens per GPU. Must be set when dynamic batching is enabled. Should be set to approximately max_response_len / cp_size when using CP
--log-probs-max-tokens-per-gpuintNoneMaximum tokens per GPU when computing log probs. When None, equals max-tokens-per-gpu
--balance-dataflagFalseUse karmarkar_karp algorithm to balance token count across data parallel ranks. Only available in colocate mode; not supported with --fully-async. Note: different responses for the same prompt may be assigned to different training steps

Parallelism Configuration

ParameterTypeDefaultDescription
--tensor-model-parallel-sizeint1Tensor parallelism size
--pipeline-model-parallel-sizeint1Pipeline parallelism size
--sequence-parallelflagFalseEnable sequence parallelism
--context-parallel-sizeint1Context parallelism size
--expert-model-parallel-sizeint1Expert parallelism size (for MoE models)
--expert-tensor-parallel-sizeint1Expert tensor parallelism size

Recomputation

Recomputation parameters use native Megatron parameters. For details, refer to Megatron documentation.

ParameterTypeDefaultDescription
--recompute-granularitystr-Recomputation granularity: full, selective
--recompute-methodstr-Recomputation method: uniform, block
--recompute-num-layersint-Number of layers to recompute

Optimizer Configuration

ParameterTypeDefaultDescription
--lrfloat1e-6Learning rate
--optimizerstr-Optimizer type (native Megatron parameter)
--lr-decay-stylestr-Learning rate decay style (native Megatron parameter)
--weight-decayfloat-Weight decay (native Megatron parameter)
--adam-beta1float-Adam beta1 (native Megatron parameter)
--adam-beta2float-Adam beta2 (native Megatron parameter)
--clip-gradfloat1.0Gradient clipping
--seedint1234Random seed
--optimizer-cpu-offloadflag-Enable CPU offload for optimizer state (native Megatron parameter)
--overlap-cpu-optimizer-d2h-h2dflag-Overlap CPU optimizer D2H/H2D communication (native Megatron parameter)
--use-precision-aware-optimizerflag-Use precision-aware optimizer (native Megatron parameter)
--use-distributed-optimizerflag-Shard optimizer state, ZeRO-1 style (native Megatron parameter)
--overlap-grad-reduceflag-Overlap backward compute with grad reduce-scatter (native Megatron parameter)
--overlap-param-gatherflag-Overlap reduce-scatter with next-step param all-gather; requires --overlap-grad-reduce (native Megatron parameter)
--calculate-per-token-lossflagFalseCalculate loss per token (native Megatron parameter)

Optimizer Flag Compatibility

Scenario--use-distributed-optimizer--overlap-grad-reduce / --overlap-param-gather
Text-only dense
Dense VL, CP = 1
Dense VL, CP > 1
MoE

Algorithm Configuration

Advantage Estimation

ParameterTypeDefaultOptionsDescription
--advantage-estimatorstrgrpogrpo, gspo, on_policy_distillation, sapoAdvantage estimator. Note: OPD is now independent of advantage estimator; enable OPD on any estimator with --opd-kl-coef > 0
--normalize-advantagesflagFalse-Whether to normalize advantages
--disable-grpo-std-normalizationflag--Disable GRPO standard deviation normalization (from Dr.GRPO)
--disable-rewards-normalizationflag--Disable reward normalization
--disable-compute-advantages-and-returnsflag--Disable advantage and return computation. Used for SFT or custom loss functions

Loss Function

ParameterTypeDefaultOptionsDescription
--loss-typestrpolicy_losspolicy_loss, custom_lossLoss type. When custom_loss, must set --custom-loss-function-path
--custom-loss-function-pathstrNone-Custom loss function path
--eps-clipfloat0.2-PPO clipping range (lower bound)
--eps-clip-highfloatNone-PPO clipping upper bound. When None, equals --eps-clip
--eps-clip-cfloatNone-Dual-clip PPO value lower bound (paper)
--value-clipfloat0.2-Value function clipping range
--entropy-coeffloat0.0-Entropy loss coefficient
ParameterTypeDefaultDescription
--kl-coeffloat0.0KL penalty coefficient for reward shaping (applied to reward signal before advantage calculation). Cannot be non-zero simultaneously with --kl-loss-coef
--use-kl-lossflagFalseWhether to use KL loss in GRPO
--kl-loss-coeffloat0.0KL penalty coefficient added to final PPO loss. Cannot be non-zero simultaneously with --kl-coef
--kl-loss-typestrk1k1, k2, k3, low_var_kl
--use-unbiased-klflagFalseEnable unbiased KL estimation
--ref-update-intervalintNoneReference model update interval in Rollout steps. None means no update

SAPO Parameters

ParameterTypeDefaultDescription
--sapo-tau-posfloat1.0SAPO positive advantage temperature
--sapo-tau-negfloat1.05SAPO negative advantage temperature

Critic Configuration

ParameterTypeDefaultDescription
--num-critic-only-stepsint0Number of steps to train Critic only
--critic-train-onlyflagFalseTrain Critic model only
--critic-lrfloatNoneCritic learning rate. When None, equals --lr
--critic-lr-warmup-itersint0Number of iterations for linear warmup of Critic model

Off-Policy Correction

ParameterTypeDefaultDescription
--use-rollout-logprobsflagFalseUse Rollout's logprobs when computing importance sampling ratio. When not set, uses Actor model's logprobs
--use-tisflagFalseEnable TIS (Truncated Importance Sampling) off-policy importance sampling
--tis-clipfloat2.0Upper clipping threshold for importance sampling ratio
--tis-clip-lowfloat0Lower clipping threshold for importance sampling ratio
--custom-tis-function-pathstrNoneCustom TIS/RS function path
--custom-pg-loss-reducer-function-pathstrNoneCustom pg_loss reducer function path. When set, pg_loss uses custom reducer while other metrics use default sum_of_sample_mean

Routing Replay and OPSM

ParameterTypeDefaultDescription
--use-routing-replayflagFalseEnable Routing Replay (paper)
--use-rollout-routing-replayflagFalseEnable Rollout Routing Replay (paper). Automatically enables --use-routing-replay
--use-opsmflagFalseEnable Off-Policy Sequence Masking (OPSM)
--opsm-deltafloat1e-4OPSM threshold

Other Training Options

ParameterTypeDefaultDescription
--reset-optimizer-statesflagFalseWhether to reset optimizer state after each Rollout
--use-rollout-entropyflagFalseWhether to compute entropy when calculating logprobs. Used for special loss mask
--get-mismatch-metricsflagFalseWhether to compute mismatch metrics. Requires setting --custom-tis-function-path

Parameter Freezing and Selective Training

ParameterTypeDefaultDescription
--only-train-params-name-liststr (list)NoneList of regex patterns for parameters to train. Other parameters are frozen. Cannot be used simultaneously with --freeze-params-name-list. Example: --only-train-params-name-list experts
--freeze-params-name-liststr (list)NoneList of regex patterns for parameters to freeze. Other parameters remain trainable. Example: --freeze-params-name-list embedding output_layer

Evaluation Configuration

ParameterTypeDefaultDescription
--eval-intervalintNoneEvaluation interval in Rollout rounds
--eval-prompt-datastr (list)NoneEvaluation datasets in format: dataset_name /path/to/data.jsonl. Can specify multiple pairs
--eval-configstrNoneOmegaConf YAML/JSON evaluation config file path. When set, overrides --eval-prompt-data
--eval-function-pathstrNoneEvaluation generation function path. When None, uses --rollout-function-path
--skip-eval-before-trainflagFalseWhether to skip evaluation before training
--eval-input-keystrNoneKey for input field in evaluation data. When None, uses --input-key
--eval-label-keystrNoneKey for label field in evaluation data
--eval-tool-keystrNoneKey for tool field in evaluation data
--n-samples-per-eval-promptint1Number of samples per evaluation prompt
--eval-temperaturefloatNoneSampling temperature for evaluation
--eval-top-pfloatNoneTop-p parameter for evaluation
--eval-top-kintNoneTop-k parameter for evaluation
--eval-max-response-lenintNoneMaximum response length for evaluation
--eval-max-prompt-lenintNoneMaximum prompt length for evaluation
--eval-min-new-tokensintNoneMinimum new tokens generated for evaluation
--eval-max-context-lenintNoneMaximum context length for evaluation. When None, equals --rollout-max-context-len

Reward Configuration

ParameterTypeDefaultDescription
--rm-typestrNoneBuilt-in reward model type
--custom-rm-pathstrNoneCustom reward function path. Function signature: def custom_rm(args, sample) -> float
--reward-keystrNoneKey to extract reward value when reward function returns dict
--eval-reward-keystrNoneReward key for evaluation. When None, equals --reward-key
--group-rmflagFalseWhether to compute reward for entire group
--rm-urlstrNoneRemote reward model service URL (for --rm-type remote_rm)
--custom-reward-post-process-pathstrNoneCustom reward postprocessing function path. Default is GRPO normalization
--custom-convert-samples-to-train-data-pathstrNoneCustom function to convert samples to training data. Signature: def convert_samples_to_train_data(args, samples) -> dict

GenRM (Generative Reward Model)

ParameterTypeDefaultDescription
--genrm-model-pathstrNoneGenRM model path. Enables GenRM when set
--genrm-num-gpusint1Total GPUs for GenRM
--genrm-num-gpus-per-engineint1GPUs per GenRM engine instance
--genrm-engine-configjsonNoneGenRM engine initialization parameters. Example: '{"dp_size": 1, "pp_size": 1, "max_total_tokens": 8192}'
--genrm-sampling-configjsonNoneGenRM sampling parameters. Available keys: temperature (default 0.1), top_p (default 1.0), top_k (default -1), max_response_len (default 4096)

On-Policy Distillation (OPD)

ParameterTypeDefaultOptionsDescription
--use-opdflagFalse-Enable On-Policy Distillation. Must also specify --opd-type
--opd-typestrNonesglang, megatronOPD type. sglang: fetch teacher model logprobs from external SGLang server; megatron: load teacher model via --opd-teacher-load
--opd-kl-coeffloat1.0-OPD KL penalty coefficient
--opd-teacher-loadstrNone-OPD teacher model checkpoint path. Required when --opd-type=megatron
--opd-teacher-ckpt-stepintNone-OPD teacher model checkpoint step

Fault Tolerance Configuration

ParameterTypeDefaultDescription
--use-fault-toleranceflagFalseWhether to enable fault tolerance during Rollout
--use-health-checkflagFalseWhether to enable global health check system. Controller's HealthManager monitors all services and triggers automatic restart on failure
--max-global-restartint3Maximum number of global restarts allowed. Training terminates after exceeding. Only effective when --use-health-check is enabled
--rollout-health-check-intervalfloat30.0Rollout engine health check interval in seconds
--rollout-health-check-timeoutfloat30.0Rollout engine health check timeout in seconds
--rollout-health-check-first-waitfloat0Initial wait time before starting health check in seconds. Increase this value when using deepgemm

Elastic Scaling Configuration

Autoscaler

ParameterTypeDefaultDescription
--autoscaler-configstrNonePath to autoscaler YAML configuration file. Enables autoscaling when set, disabled when not set. Example: --autoscaler-config relax/utils/autoscaler/autoscaler.yaml

For autoscaler YAML configuration details, see relax/utils/autoscaler/autoscaler.yaml.

Scale-Out Operation Parameters

ParameterTypeDefaultOptionsDescription
--scale-out-timeoutfloat300.0-Timeout for all scale-out operations (engine startup, connect, health check, weight sync) in seconds
--scale-out-partial-success-policystrrollback_allrollback_all, keep_partialPolicy for partial success during scale-out. rollback_all reverts all engines on any failure; keep_partial keeps successfully scaled engines

Scale-In Operation Parameters

ParameterTypeDefaultDescription
--scale-in-drain-timeoutfloat30.0Timeout to wait for in-flight requests to drain before force-aborting (seconds)
--scale-in-shutdown-timeoutfloat30.0Timeout for graceful SGLang engine shutdown; ray.kill is used if exceeded (seconds)

Logging and Monitoring Configuration

TensorBoard

ParameterTypeDefaultDescription
--use-tensorboardflagFalseEnable TensorBoard logging
--tb-project-namestrNoneTensorBoard log directory. Defaults to environment variable TENSORBOARD_DIR
--tb-experiment-namestrNoneTensorBoard experiment name

ClearML

ParameterTypeDefaultDescription
--use-clearmlflagFalseEnable ClearML logging

Metrics Service

ParameterTypeDefaultDescription
--use-metrics-serviceflagFalseEnable centralized metrics collection and reporting service
--timeline-dump-dirstrNoneTimeline trace event export directory (Chrome Trace format). Timeline tracing disabled if not set

Logging Options

ParameterTypeDefaultDescription
--log-passrateflagFalseEnable pass@n pass rate logging
--log-multi-turnflagFalseEnable multi-turn Rollout information logging
--log-correct-samplesflagFalseLog correct samples
--log-reward-categorystrNoneLog reward category statistics. Specify key in reward dict

Notifications

ParameterTypeDefaultDescription
--notify-urlsstrNoneApprise notification URL list (comma-separated)

Router Configuration

ParameterTypeDefaultDescription
--use-slime-routerflagFalseUse SlimeRouter for text-based routing instead of SGLang's token-based routing
--slime-router-middleware-pathsstr (list)""List of middleware paths
--slime-router-timeoutfloatNoneSlimeRouter HTTP request timeout in seconds
--slime-router-max-connectionsintNoneSlimeRouter HTTP client maximum connections
--slime-router-health-check-failure-thresholdint3Mark worker as unhealthy after this many consecutive failures

Other Training Parameters

ParameterTypeDefaultDescription
--train-env-varsjson{}Additional environment variables for training process
--train-memory-margin-bytesint1GB (1024³)Reserved space for training memory allocation
--disable-weights-backuperflag-Disable weight backup to save host memory
--custom-model-provider-pathstrNoneCustom model provider function path
--recompute-loss-functionflagFalseWhether to recompute loss function to save VRAM
--log-probs-chunk-sizeint-1Chunk size for computing log probs. Used to save VRAM
--allgather-cpflagFalse-
--model-namestrNoneModel name for Megatron to HF weight conversion. Inferred from AutoConfig.from_pretrained(hf_checkpoint) if not set
--only-load-weightflagFalseReference model and actor fwd only load weights
--rlsp-server-portint8234RLSP Server HTTP port
--custom-config-pathstrNoneCustom parameter YAML config file path. Key-value pairs in file override existing parameters

Debug & Profiling Parameters

Debug

ParameterTypeDefaultDescription
--debug-rollout-onlyflagFalseRun Rollout only, no training
--debug-train-onlyflagFalseRun training only, no Rollout
--load-debug-rollout-datastrNoneLoad debug Rollout data. Automatically enables --debug-train-only when set
--load-debug-rollout-data-subsamplefloatNoneSubsample debug Rollout data to accelerate debugging
--save-debug-rollout-datastrNoneSave Rollout data. Path supports {rollout_id} placeholder
--save-debug-train-datastrNoneSave training data. Path supports {rollout_id} placeholder
--dump-detailsstrNoneExport all training details for post-hoc analysis
--check-weight-update-equalflagFalseCheck if weight updates are equal
--enable-cuda-memory-checkflagFalseEnable memory check around low-level NCCL communication calls. Logs available GPU memory before each collective and attaches memory info to exceptions on failure

Training Performance Profiling

These parameters control the PyTorch Profiler for training steps. Trace files are saved to traces/<tb_experiment_name>/train_trace/ by default.

ParameterTypeDefaultDescription
--use-pytorch-profilerflagFalseEnable PyTorch's built-in profiler to record CUDA kernels, CPU ops, and communication during training (from Megatron)
--profile-step-startint10Step offset at which to start profiling (inclusive, from Megatron). Counts from 0 since the current training launch, not absolute rollout ID; resets on checkpoint resumption
--profile-step-endint12Step offset at which to stop profiling (inclusive, from Megatron). Same counting semantics as above. E.g. start=10, end=12 profiles steps 10, 11, 12 (3 steps)
--profile-targetstr (list)train_overallProfiling targets: train_overall, train_actor, train_log_probs
--profile-with-stackflagFalseRecord stack information in profiler traces
--profile-with-memoryflagFalseRecord memory information in profiler traces
--profile-with-flopsflagFalseEstimate FLOPs in profiler traces

GPU Memory Profiling

These parameters control GPU memory snapshot collection for diagnosing memory leaks and OOM issues. Snapshot files can be viewed with PyTorch Memory Viz tools (torch.cuda.memory._viz).

ParameterTypeDefaultDescription
--record-memory-historyflagFalseEnable CUDA memory allocation history recording (from Megatron). Records call stacks and tensor info for each allocation/deallocation, and auto-dumps a snapshot on OOM
--memory-snapshot-pathstrsnapshot.pickleMemory snapshot filename (from Megatron)
--memory-snapshot-dirstrNoneMemory snapshot output directory. Defaults to traces/<tb_experiment_name>/memory_snapshot
--memory-snapshot-num-stepsintNoneProactively dump a memory snapshot after the specified number of steps (0-indexed, i.e., setting 3 means dump after step 2)
--memory-recorderstrtorchMemory recorder backend: torch (PyTorch built-in), memray (requires pip install memray)

Network

ParameterTypeDefaultDescription
--http-proxystrNoneHTTP proxy address
--use-distributed-postflagFalseUse distributed POST requests

Environment Configuration

Relax uses configs/env.yaml to configure runtime environment variables:

yaml
env_vars:
  TOKENIZERS_PARALLELISM: 'true'
  NCCL_DEBUG: 'WARN'
  CUDA_DEVICE_MAX_CONNECTIONS: '1'
  GLOO_SOCKET_IFNAME: "eth0"
  TP_SOCKET_IFNAME: "eth0"

Released under the Apache 2.0 License.