job.autoscaler.backlog-processing.lag-threshold |
5 min |
Duration |
Lag threshold which will prevent unnecessary scalings while removing the pending messages responsible for the lag. |
job.autoscaler.catch-up.duration |
30 min |
Duration |
The target duration for fully processing any backlog after a scaling operation. Set to 0 to disable backlog based scaling. |
job.autoscaler.enabled |
false |
Boolean |
Enable job autoscaler module. |
job.autoscaler.excluded.periods |
|
List<String> |
A (semicolon-separated) list of expressions indicate excluded periods during which autoscaling execution is forbidden, the expression consist of two optional subexpressions concatenated with &&, one is cron expression in Quartz format (6 or 7 positions), for example, * * 9-11,14-16 * * ? means exclude from 9:00:00am to 11:59:59am and from 2:00:00pm to 4:59:59pm every day, * * * ? * 2-6 means exclude every weekday, etc.see http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html for the usage of cron expression.Caution: in most case cron expression is enough, we introduce the other subexpression: daily expression, because cron can only represent integer hour period without minutes and seconds suffix, daily expression's formation is startTime-endTime, such as 9:30:30-10:50:20, when exclude from 9:30:30-10:50:20 in Monday and Thursday we can express it as 9:30:30-10:50:20 && * * * ? * 2,5 |
job.autoscaler.flink.rest-client.timeout |
10 s |
Duration |
The timeout for waiting the flink rest client to return. |
job.autoscaler.history.max.age |
1 d |
Duration |
Maximum age for past scaling decisions to retain. |
job.autoscaler.history.max.count |
3 |
Integer |
Maximum number of past scaling decisions to retain per vertex. |
job.autoscaler.memory.gc-pressure.threshold |
1.0 |
Double |
Max allowed GC pressure (percentage spent garbage collecting) during scaling operations. Autoscaling will be paused if the GC pressure exceeds this limit. |
job.autoscaler.memory.heap-usage.threshold |
1.0 |
Double |
Max allowed percentage of heap usage during scaling operations. Autoscaling will be paused if the heap usage exceeds this threshold. |
job.autoscaler.memory.tuning.enabled |
false |
Boolean |
If enabled, the initial amount of memory specified for TaskManagers will be reduced/increased according to the observed needs. |
job.autoscaler.memory.tuning.maximize-managed-memory |
false |
Boolean |
If enabled and managed memory is used (e.g. RocksDB turned on), any reduction of heap, network, or metaspace memory will increase the managed memory. |
job.autoscaler.memory.tuning.overhead |
0.2 |
Double |
Overhead to add to tuning decisions (0-1). This ensures spare capacity and allows the memory to grow beyond the dynamically computed limits, but never beyond the original memory limits. |
job.autoscaler.memory.tuning.scale-down-compensation.enabled |
true |
Boolean |
If this option is enabled and memory tuning is enabled, TaskManager memory will be increased when scaling down. This ensures that after applying memory tuning there is sufficient memory when running with fewer TaskManagers. |
job.autoscaler.metrics.busy-time.aggregator |
MAX |
Enum |
Metric aggregator to use for busyTime metrics. This affects how true processing/output rate will be computed. Using max allows us to handle jobs with data skew more robustly, while avg may provide better stability when we know that the load distribution is even.
Possible values: |
job.autoscaler.metrics.window |
15 min |
Duration |
Scaling metrics aggregation window size. |
job.autoscaler.observed-scalability.coefficient-min |
0.5 |
Double |
Minimum allowed value for the observed scalability coefficient. Prevents aggressive scaling by clamping low coefficient estimates. If the estimated coefficient falls below this value, it is capped at the configured minimum. |
job.autoscaler.observed-scalability.enabled |
false |
Boolean |
Enables the use of an observed scalability coefficient when computing target parallelism. If enabled, the system will estimate the scalability coefficient based on historical scaling data instead of assuming perfect linear scaling. This helps account for real-world inefficiencies such as network overhead and coordination costs. |
job.autoscaler.observed-scalability.min-observations |
3 |
Integer |
Defines the minimum number of historical scaling observations required to estimate the scalability coefficient. If the number of available observations is below this threshold, the system falls back to assuming linear scaling. Note: To effectively use a higher minimum observation count, you need to increase job.autoscaler.history.max.count. Avoid setting job.autoscaler.history.max.count to a very high value, as the number of retained data points is limited by the size of the state store—particularly when using Kubernetes-based state store. |
job.autoscaler.observed-true-processing-rate.lag-threshold |
30 s |
Duration |
Lag threshold for enabling observed true processing rate measurements. |
job.autoscaler.observed-true-processing-rate.min-observations |
2 |
Integer |
Minimum nr of observations used when estimating / switching to observed true processing rate. |
job.autoscaler.observed-true-processing-rate.switch-threshold |
0.15 |
Double |
Percentage threshold for switching to observed from busy time based true processing rate if the measurement is off by at least the configured fraction. For example 0.15 means we switch to observed if the busy time based computation is at least 15% higher during catchup. |
job.autoscaler.quota.cpu |
(none) |
Double |
Quota of the CPU count. When scaling would go beyond this number the the scaling is not going to happen. |
job.autoscaler.quota.memory |
(none) |
MemorySize |
Quota of the memory size. When scaling would go beyond this number the the scaling is not going to happen. |
job.autoscaler.restart.time |
5 min |
Duration |
Expected restart time to be used until the operator can determine it reliably from history. |
job.autoscaler.restart.time-tracking.enabled |
false |
Boolean |
Whether to use the actual observed rescaling restart times instead of the fixed 'job.autoscaler.restart.time' configuration. If set to true, the maximum restart duration over a number of samples will be used. The value of 'job.autoscaler.restart.time-tracking.limit' will act as an upper bound, and the value of 'job.autoscaler.restart.time' will still be used when there are no rescale samples. |
job.autoscaler.restart.time-tracking.limit |
15 min |
Duration |
Maximum cap for the observed restart time when 'job.autoscaler.restart.time-tracking.enabled' is set to true. |
job.autoscaler.scale-down.interval |
1 h |
Duration |
The delay time for scale down to be executed. If it is greater than 0, the scale down will be delayed. Delayed rescale can merge multiple scale downs within `scale-down.interval` into a scale down, thereby reducing the number of rescales. Reducing the frequency of job restarts can improve job availability. Scale down can be executed directly if it's less than or equal 0. |
job.autoscaler.scale-down.max-factor |
0.6 |
Double |
Max scale down factor. 1 means no limit on scale down, 0.6 means job can only be scaled down with 60% of the original parallelism. |
job.autoscaler.scale-up.max-factor |
100000.0 |
Double |
Max scale up factor. 2.0 means job can only be scaled up with 200% of the current parallelism. |
job.autoscaler.scaling.effectiveness.detection.enabled |
false |
Boolean |
Whether to enable detection of ineffective scaling operations and allowing the autoscaler to block further scale ups. |
job.autoscaler.scaling.effectiveness.threshold |
0.1 |
Double |
Processing rate increase threshold for detecting ineffective scaling threshold. 0.1 means if we do not accomplish at least 10% of the desired capacity increase with scaling, the action is marked ineffective. |
job.autoscaler.scaling.enabled |
true |
Boolean |
Enable vertex scaling execution by the autoscaler. If disabled, the autoscaler will only collect metrics and evaluate the suggested parallelism for each vertex but will not upgrade the jobs. |
job.autoscaler.scaling.event.interval |
30 min |
Duration |
Time interval to resend the identical event |
job.autoscaler.scaling.key-group.partitions.adjust.mode |
EVENLY_SPREAD |
Enum |
How to adjust the parallelism of Source vertex or upstream shuffle is keyBy
Possible values:- "EVENLY_SPREAD": This mode ensures that the parallelism adjustment attempts to evenly distribute data across subtasks. It is particularly effective for source vertices that are aware of partition counts or vertices after 'keyBy' operation. The goal is to have the number of key groups or partitions be divisible by the set parallelism, ensuring even data distribution and reducing data skew.
- "MAXIMIZE_UTILISATION": This model is to maximize resource utilization. In this mode, an attempt is made to set the parallelism that meets the current consumption rate requirements. It is not enforced that the number of key groups or partitions is divisible by the parallelism.
|
job.autoscaler.stabilization.interval |
5 min |
Duration |
Stabilization period in which no new scaling will be executed |
job.autoscaler.utilization.max |
(none) |
Double |
Max vertex utilization |
job.autoscaler.utilization.min |
(none) |
Double |
Min vertex utilization |
job.autoscaler.utilization.target |
0.7 |
Double |
Target vertex utilization |
job.autoscaler.vertex.exclude.ids |
|
List<String> |
A (semicolon-separated) list of vertex ids in hexstring for which to disable scaling. Caution: For non-sink vertices this will still scale their downstream operators until https://issues.apache.org/jira/browse/FLINK-31215 is implemented. |
job.autoscaler.vertex.max-parallelism |
200 |
Integer |
The maximum parallelism the autoscaler can use. Note that this limit will be ignored if it is higher than the max parallelism configured in the Flink config or directly on each operator. |
job.autoscaler.vertex.min-parallelism |
1 |
Integer |
The minimum parallelism the autoscaler can use. |