ActorFwd Service API

The ActorFwd service computes actor/reference log-probabilities using a forward-only copy of the policy model. It is deployed as a Ray Serve deployment with a FastAPI ingress.

Overview

Property	Value
Module	`relax.components.actor_fwd`
Deployment	`@serve.deployment`
Ingress	FastAPI

Purpose

ActorFwd runs a forward-only replica of the policy model to compute log-probabilities for rollout data. This is used in:

KL divergence computation: Computing log π(a|s) for KL penalty
Reference log-probs: Computing log π_ref(a|s) for the reference model
Fully-async mode: Receives weight updates from the Actor via NCCL

Lifecycle

The ActorFwd runs a background loop that:

Waits for rollout data to become available
Computes actor or reference log-probabilities (depending on role)
Publishes results back to TransferQueue
In fully-async mode, receives weight updates via /recv_weight_fully_async

HTTP Endpoints

Source

Implementation: relax/components/actor_fwd.py
Base class: relax/components/base.py

ActorFwd Service API ​

Overview ​

Purpose ​

Lifecycle ​

HTTP Endpoints ​

Source ​