Skip to content

Conversation

@Shekharrajak
Copy link

@Shekharrajak Shekharrajak commented Dec 27, 2025

What changes were proposed in this pull request?

This PR adds support for Kafka 4.x Share Groups (KIP-932) in Spark Structured Streaming, enabling queue semantics where multiple consumers can receive records from the same partition concurrently.

Key components:

  • New kafka-share data source format
  • Non-sequential offset tracking via ShareStateBatch
  • Acknowledgment-based delivery (ACCEPT/RELEASE/REJECT)
  • Three exactly-once strategies: idempotent sink, two-phase commit, checkpoint-dedup
  • Recovery support via acquisition lock expiry and checkpointing

Why are the changes needed?

Traditional Kafka consumer groups provide pub/sub semantics where each partition is assigned to exactly one consumer. Share Groups (KIP-932) introduce queue semantics enabling:

  1. Load balancing across consumers without partition limits
  2. Automatic redelivery on failure without manual offset management
  3. Per-record acknowledgment for fine-grained processing control

This is essential for workloads requiring high fan-out consumption patterns.

Does this PR introduce any user-facing change?

Yes. Adds new kafka-share data source:

spark.readStream
.format("kafka-share")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("kafka.share.group.id", "my-share-group")
.option("subscribe", "topic1")
.load()New configuration options:

  • kafka.share.group.id (required)
  • kafka.share.acknowledgment.mode (implicit/explicit)
  • kafka.share.exactly.once.strategy (none/idempotent/two-phase-commit/checkpoint-dedup)

How was this patch tested?

  • Unit tests for ShareStateBatch, KafkaShareSourceOffset, ShareInFlightRecord
  • Integration tests for fault tolerance and recovery scenarios
  • Tests for checkpoint-based deduplication and two-phase commit coordinator

Was this patch authored or co-authored using generative AI tooling?

Cursor IDE

Updated the section on Idempotent Sink and removed the comparison table with traditional Kafka source.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant