Task Collaboration Agent - Stage 3: Secure¶

Path: examples/a2a_task_collab_example/stage3_secure

Overview¶

Stage 3 demonstrates production-grade security for multi-agent task collaboration systems. This stage implements comprehensive, layered security controls that meet enterprise requirements.

Security Rating: ✅ 10/10 - PRODUCTION READY

Status: ✅ Suitable for production deployment

Key Learning Focus¶

This stage focuses on comprehensive security implementation and zero-trust architecture for real-world distributed systems.

What You'll Learn¶

Production session management
TLS 1.3 mutual authentication
MFA implementation (TOTP)
Nonce-based replay protection
Token bucket rate limiting
Full RBAC authorization
State encryption
Comprehensive security monitoring

Architecture¶

Task Collaboration Architecture

Client (TLS 1.3)
  ↓
Rate Limiter ✅
  ↓
MFA Authentication ✅
  ↓
Session Manager ✅
  ├─ 256-bit random IDs
  ├─ Full session binding
  ├─ State encryption
  └─ Dual timeouts
  ↓
RBAC Authorization ✅
  ↓
Coordinator (TLS)
  ├─ Nonce validator ✅
  ├─ Message integrity ✅
  └─ Audit logger ✅
  ↓
Worker Agents (mTLS) ✅
  ├─ Certificate auth
  ├─ Input validation
  └─ Secure execution

Components¶

task_coordinator.py: Production coordinator with full security
session_manager.py: Complete session security
auth_manager.py: MFA authentication (TOTP)
rate_limiter.py: Token bucket implementation
nonce_validator.py: Replay protection
rbac_manager.py: Role-based access control
state_encryptor.py: Session state encryption
audit_logger.py: Comprehensive tamper-proof logging
tls_config.py: TLS 1.3 configuration
security_monitor.py: Real-time security monitoring

🛡️ Complete Security Controls¶

1. Cryptographically Random Session IDs¶

import secrets

class SessionManager:
    """
    ✅ Production session management
    """
    def create_session(self, user_id, client_context):
        # 256-bit cryptographically random ID
        session_id = secrets.token_urlsafe(32)

        session_data = {
            'user_id': user_id,
            'created_at': time.time(),
            'last_activity': time.time(),
            'ip_address': client_context['ip'],
            'user_agent': client_context['user_agent'],
            'client_cert_hash': client_context['cert_hash'],
            'mfa_verified': True,
            'permissions': self._get_user_permissions(user_id)
        }

        # Encrypt session state
        encrypted_state = self.encryptor.encrypt(session_data)

        # Store encrypted
        self.sessions[session_id] = encrypted_state

        self.audit.log('session_created', user_id, session_id)

        return session_id

Benefits: - 2^256 possible values (unguessable) - URL-safe encoding - Encrypted storage - Comprehensive binding

2. MFA Authentication (TOTP)¶

import pyotp
import bcrypt

class AuthManager:
    """
    ✅ Two-factor authentication
    """
    def authenticate(self, username, password, mfa_token):
        # Rate limit check
        if not self.rate_limiter.check(username):
            self.audit.log('rate_limit_exceeded', username)
            raise AuthenticationError("Too many attempts")

        # Step 1: Verify password
        if not self._verify_password(username, password):
            self.audit.log('auth_failed_password', username)
            self._increment_failed_attempts(username)
            raise AuthenticationError("Invalid credentials")

        # Step 2: Verify MFA token (TOTP)
        user = self.users[username]
        totp = pyotp.TOTP(user['mfa_secret'])

        if not totp.verify(mfa_token, valid_window=1):
            self.audit.log('auth_failed_mfa', username)
            raise AuthenticationError("Invalid MFA token")

        # Success - reset failed attempts
        self._reset_failed_attempts(username)
        self.audit.log('auth_success', username)

        return user['id']

    def _verify_password(self, username, password):
        """Constant-time password verification"""
        if username not in self.users:
            # Prevent timing attacks
            bcrypt.checkpw(b'dummy', bcrypt.gensalt())
            return False

        user = self.users[username]
        return bcrypt.checkpw(
            password.encode(),
            user['password_hash'].encode()
        )

Benefits: - Two-factor security - Timing attack prevention - Rate limiting integration - Comprehensive audit trail

3. Full Session Binding¶

class SessionValidator:
    """
    ✅ Comprehensive session validation
    """
    def validate_session(self, session_id, request_context):
        # Decrypt session state
        encrypted = self.sessions.get(session_id)
        if not encrypted:
            return False

        try:
            session = self.encryptor.decrypt(encrypted)
        except DecryptionError:
            self.audit.log('session_decrypt_failed', session_id)
            return False

        # Check idle timeout (30 minutes)
        idle_time = time.time() - session['last_activity']
        if idle_time > self.IDLE_TIMEOUT:
            self._invalidate_session(session_id, 'idle_timeout')
            return False

        # Check absolute timeout (8 hours)
        session_age = time.time() - session['created_at']
        if session_age > self.ABSOLUTE_TIMEOUT:
            self._invalidate_session(session_id, 'absolute_timeout')
            return False

        # Enforce binding factors
        if not self._validate_bindings(session, request_context):
            self._invalidate_session(session_id, 'binding_violation')
            self.security_monitor.alert('session_binding_violation', {
                'session_id': session_id,
                'expected': session,
                'actual': request_context
            })
            return False

        # Check MFA verified
        if not session.get('mfa_verified'):
            return False

        # Update last activity
        session['last_activity'] = time.time()
        self.sessions[session_id] = self.encryptor.encrypt(session)

        return True

    def _validate_bindings(self, session, context):
        """Validate all binding factors"""
        # IP address binding
        if session['ip_address'] != context['ip']:
            return False

        # User agent binding
        if session['user_agent'] != context['user_agent']:
            return False

        # Client certificate binding
        if session['client_cert_hash'] != context['cert_hash']:
            return False

        return True

Benefits: - Multiple binding factors - Dual timeouts (idle + absolute) - State encryption - Tamper detection - Security alerting

4. Nonce-Based Replay Protection¶

import hashlib
from collections import deque
from threading import Lock

class NonceValidator:
    """
    ✅ Prevent replay attacks
    """
    def __init__(self, window_size=300):  # 5 minutes
        self.window_size = window_size
        self.seen_nonces = deque(maxlen=10000)
        self.nonce_timestamps = {}
        self.lock = Lock()

    def validate(self, nonce, timestamp):
        """
        Validate nonce is unique and recent
        """
        with self.lock:
            current_time = time.time()

            # Check timestamp is recent
            if abs(current_time - timestamp) > self.window_size:
                return False

            # Check nonce hasn't been seen
            nonce_key = hashlib.sha256(
                f"{nonce}:{timestamp}".encode()
            ).hexdigest()

            if nonce_key in self.seen_nonces:
                # Replay attack detected!
                self.audit.log('replay_attack_detected', {
                    'nonce': nonce,
                    'timestamp': timestamp
                })
                return False

            # Record nonce
            self.seen_nonces.append(nonce_key)
            self.nonce_timestamps[nonce_key] = timestamp

            # Cleanup old nonces
            self._cleanup_old_nonces(current_time)

            return True

    def _cleanup_old_nonces(self, current_time):
        """Remove nonces outside time window"""
        cutoff = current_time - self.window_size

        to_remove = [
            nonce for nonce, ts in self.nonce_timestamps.items()
            if ts < cutoff
        ]

        for nonce in to_remove:
            del self.nonce_timestamps[nonce]

# Usage in message handler
def handle_message(message, signature):
    nonce = message.get('nonce')
    timestamp = message.get('timestamp')

    # Verify nonce (prevent replay)
    if not nonce_validator.validate(nonce, timestamp):
        return error("Invalid or replayed nonce")

    # Verify HMAC signature
    if not verify_hmac(message, signature):
        return error("Invalid signature")

    # Process message (safe from replay)
    return process_message(message)

Benefits: - Prevents replay attacks - Time-window based - Automatic cleanup - Thread-safe - Audit trail

5. Token Bucket Rate Limiting¶

from collections import defaultdict
from threading import Lock

class TokenBucketRateLimiter:
    """
    ✅ Production rate limiting
    """
    def __init__(self, rate=10, burst=20):
        self.rate = rate          # Tokens per second
        self.burst = burst        # Max tokens
        self.buckets = defaultdict(lambda: {
            'tokens': burst,
            'last_update': time.time()
        })
        self.lock = Lock()

    def check_limit(self, identifier, cost=1):
        """Check if request allowed"""
        with self.lock:
            bucket = self.buckets[identifier]
            now = time.time()

            # Add tokens based on elapsed time
            elapsed = now - bucket['last_update']
            bucket['tokens'] = min(
                self.burst,
                bucket['tokens'] + elapsed * self.rate
            )
            bucket['last_update'] = now

            # Check if enough tokens
            if bucket['tokens'] >= cost:
                bucket['tokens'] -= cost
                return True

            # Log rate limit exceeded
            self.audit.log('rate_limit_exceeded', identifier)
            return False

    def get_retry_after(self, identifier, cost=1):
        """Calculate retry-after time"""
        bucket = self.buckets[identifier]
        tokens_needed = cost - bucket['tokens']

        if tokens_needed <= 0:
            return 0

        return tokens_needed / self.rate

# Usage in endpoints
@require_auth
def handle_request(user_id, request):
    # Check rate limit
    if not rate_limiter.check_limit(user_id, cost=5):
        retry_after = rate_limiter.get_retry_after(user_id, cost=5)
        return {
            'error': 'Rate limit exceeded',
            'retry_after': retry_after
        }, 429

    return process_request(request)

Benefits: - Smooth rate limiting - Burst handling - Per-user limits - Retry-After headers - DoS prevention

6. TLS 1.3 with Mutual Authentication¶

import ssl

class TLSConfig:
    """
    ✅ Production TLS configuration
    """
    @staticmethod
    def create_server_context(certfile, keyfile, cafile):
        """Create TLS 1.3 server context"""
        context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)

        # Require TLS 1.3
        context.minimum_version = ssl.TLSVersion.TLSv1_3
        context.maximum_version = ssl.TLSVersion.TLSv1_3

        # Load server certificate
        context.load_cert_chain(certfile, keyfile)

        # Enable mutual TLS (client certs required)
        context.verify_mode = ssl.CERT_REQUIRED
        context.load_verify_locations(cafile)

        # Strong ciphers only
        context.set_ciphers('TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256')

        return context

    @staticmethod
    def create_client_context(certfile, keyfile, cafile):
        """Create TLS 1.3 client context"""
        context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT)

        # Require TLS 1.3
        context.minimum_version = ssl.TLSVersion.TLSv1_3

        # Load client certificate
        context.load_cert_chain(certfile, keyfile)

        # Verify server certificate
        context.verify_mode = ssl.CERT_REQUIRED
        context.load_verify_locations(cafile)

        # Strong ciphers only
        context.set_ciphers('TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256')

        return context

# Usage
ssl_context = TLSConfig.create_server_context(
    certfile='certs/coordinator.crt',
    keyfile='certs/coordinator.key',
    cafile='certs/ca.crt'
)

# Wrap socket with TLS
secure_socket = ssl_context.wrap_socket(
    raw_socket,
    server_side=True
)

Benefits: - Latest TLS version - Mutual authentication - Strong ciphersuites only - Certificate validation - Perfect forward secrecy

7. Full RBAC Authorization¶

from enum import Enum
from typing import Set

class Permission(Enum):
    """Fine-grained permissions"""
    CREATE_PROJECT = "create_project"
    DELETE_PROJECT = "delete_project"
    VIEW_PROJECT = "view_project"
    MODIFY_PROJECT = "modify_project"
    CREATE_TASK = "create_task"
    ASSIGN_TASK = "assign_task"
    COMPLETE_TASK = "complete_task"
    VIEW_AUDIT_LOG = "view_audit_log"
    MANAGE_USERS = "manage_users"

class Role:
    """Role definitions"""
    VIEWER = {Permission.VIEW_PROJECT}
    CONTRIBUTOR = {
        Permission.VIEW_PROJECT,
        Permission.CREATE_TASK,
        Permission.COMPLETE_TASK
    }
    PROJECT_ADMIN = {
        Permission.CREATE_PROJECT,
        Permission.DELETE_PROJECT,
        Permission.VIEW_PROJECT,
        Permission.MODIFY_PROJECT,
        Permission.CREATE_TASK,
        Permission.ASSIGN_TASK,
        Permission.COMPLETE_TASK
    }
    SYSTEM_ADMIN = set(Permission)  # All permissions

class RBACManager:
    """
    ✅ Role-based access control
    """
    def check_permission(self, user_id, permission, resource_id=None):
        """Verify user has required permission"""
        user = self.users[user_id]
        user_permissions = self._get_effective_permissions(user)

        # Check permission exists
        if permission not in user_permissions:
            self.audit.log('permission_denied', {
                'user_id': user_id,
                'permission': permission.value,
                'resource_id': resource_id
            })
            return False

        # Resource-level checks
        if resource_id:
            if not self._check_resource_access(user_id, resource_id):
                self.audit.log('resource_access_denied', {
                    'user_id': user_id,
                    'resource_id': resource_id
                })
                return False

        return True

    def _get_effective_permissions(self, user) -> Set[Permission]:
        """Get all user permissions (role + explicit)"""
        permissions = set()

        # Add role permissions
        for role in user['roles']:
            permissions.update(Role[role].value)

        # Add explicit permissions
        permissions.update(user.get('explicit_permissions', []))

        return permissions

# Usage
@require_auth
def create_project(session_id, project_data):
    session = get_session(session_id)

    # Check permission
    if not rbac.check_permission(
        session['user_id'],
        Permission.CREATE_PROJECT
    ):
        return {'error': 'Access denied'}, 403

    # Create project
    project = create_project_internal(project_data)
    return {'status': 'success', 'project': project}

Benefits: - Fine-grained permissions - Role inheritance - Resource-level access - Audit trail - Scalable model

8. State Encryption¶

from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import secrets

class StateEncryptor:
    """
    ✅ AES-256-GCM state encryption
    """
    def __init__(self, master_key):
        self.cipher = AESGCM(master_key)  # 256-bit key

    def encrypt(self, data):
        """Encrypt session/state data"""
        # Serialize data
        plaintext = json.dumps(data).encode()

        # Generate unique nonce (96 bits)
        nonce = secrets.token_bytes(12)

        # Encrypt with authentication
        ciphertext = self.cipher.encrypt(nonce, plaintext, None)

        # Return nonce + ciphertext
        return base64.b64encode(nonce + ciphertext).decode()

    def decrypt(self, encrypted_data):
        """Decrypt and verify state data"""
        try:
            # Decode
            data = base64.b64decode(encrypted_data)

            # Split nonce and ciphertext
            nonce = data[:12]
            ciphertext = data[12:]

            # Decrypt and verify
            plaintext = self.cipher.decrypt(nonce, ciphertext, None)

            # Deserialize
            return json.loads(plaintext.decode())

        except Exception as e:
            # Decryption failure = tampering
            raise DecryptionError("State tampered or corrupted")

Benefits: - Authenticated encryption - Tamper detection - Unique nonces - State confidentiality

9. Comprehensive Audit Logging¶

import hashlib
import json

class TamperProofAuditLogger:
    """
    ✅ Tamper-proof audit logging with chaining
    """
    def __init__(self):
        self.last_hash = None
        self.log_file = 'audit.log'

    def log(self, event_type, user_id, details):
        """Log security event with chain integrity"""
        event = {
            'timestamp': time.time(),
            'event_type': event_type,
            'user_id': user_id,
            'details': details,
            'previous_hash': self.last_hash
        }

        # Calculate hash
        event_json = json.dumps(event, sort_keys=True)
        event_hash = hashlib.sha256(event_json.encode()).hexdigest()
        event['hash'] = event_hash

        # Store event
        with open(self.log_file, 'a') as f:
            f.write(json.dumps(event) + '\n')

        # Update chain
        self.last_hash = event_hash

        return event_hash

    def verify_integrity(self):
        """Verify audit log hasn't been tampered"""
        with open(self.log_file, 'r') as f:
            events = [json.loads(line) for line in f]

        prev_hash = None
        for event in events:
            # Check chain
            if event['previous_hash'] != prev_hash:
                return False, f"Chain broken at {event['timestamp']}"

            # Verify hash
            stored_hash = event.pop('hash')
            calculated = hashlib.sha256(
                json.dumps(event, sort_keys=True).encode()
            ).hexdigest()

            if stored_hash != calculated:
                return False, f"Tampered event at {event['timestamp']}"

            prev_hash = stored_hash

        return True, "Audit log intact"

Benefits: - Tamper detection - Chain integrity - Complete audit trail - Forensic capability

Attack Prevention Matrix¶

Attack Type	Stage 1	Stage 2	Stage 3
Session Guessing	✅ Succeeds	❌ Blocked	❌ Blocked
Session Sniffing	✅ Succeeds	✅ Succeeds	❌ Blocked (TLS)
Identity Spoofing	✅ Succeeds	❌ Blocked	❌ Blocked
Weak Passwords	N/A	⚠️ Possible	❌ Blocked (MFA)
Replay Attack	✅ Succeeds	✅ Succeeds	❌ Blocked (nonce)
Brute Force	✅ Succeeds	✅ Succeeds	❌ Blocked (rate limit)
Session Theft	✅ Succeeds	✅ Succeeds	❌ Blocked (binding)
Privilege Escalation	✅ Succeeds	⚠️ Partial	❌ Blocked (RBAC)
Message Tampering	✅ Succeeds	❌ Blocked	❌ Blocked
State Tampering	✅ Succeeds	⚠️ Possible	❌ Blocked (encryption)
DoS (No Rate Limit)	✅ Succeeds	✅ Succeeds	❌ Blocked
Stale Sessions	✅ Succeeds	⚠️ Reduced	❌ Blocked (dual timeout)
Audit Tampering	✅ Succeeds	⚠️ Possible	❌ Blocked (chaining)

Result: All known attacks blocked ✅

Running the Example¶

Setup¶

cd examples/a2a_task_collab_example/stage3_secure

# Install dependencies
pip install -r requirements.txt

# Generate TLS certificates
./scripts/generate_certs.sh

# Generate master encryption key
python scripts/generate_master_key.py

# Setup users with MFA
python scripts/setup_users.py

# Start coordinator
python server/task_coordinator.py

# In separate terminals:
python server/worker_agent.py --port 8001
python server/worker_agent.py --port 8002
python server/audit_agent.py --port 8003

Configuration¶

# .env file
TLS_CERT_FILE=certs/coordinator.crt
TLS_KEY_FILE=certs/coordinator.key
TLS_CA_FILE=certs/ca.crt
MASTER_ENCRYPTION_KEY=<generated-key>
SESSION_IDLE_TIMEOUT=1800
SESSION_ABSOLUTE_TIMEOUT=28800
RATE_LIMIT_RATE=10
RATE_LIMIT_BURST=20

Try the Protections¶

# All previous attacks now fail
python demos/attack_stage3.py

# Output shows all attacks blocked:
# ❌ Session sniffing: Encrypted (TLS)
# ❌ Replay attack: Nonce validation failed
# ❌ Brute force: Rate limit exceeded
# ❌ Session theft: Binding violation
# ❌ All attacks blocked successfully!

Production Deployment Checklist¶

Security¶

TLS certificates from trusted CA
Strong master encryption key generated
MFA secrets properly provisioned
Rate limits tuned for workload
Session timeouts configured appropriately
All secrets in secure vault (not files)

Monitoring¶

Security monitoring alerts configured
Audit log aggregation setup
Failed auth alerts enabled
Rate limit alerts configured
Session anomaly detection active

Operations¶

Certificate rotation procedure documented
Key rotation schedule established
Backup encryption enabled
Disaster recovery plan tested
Incident response procedures ready

Key Takeaways¶

Comprehensive security is achievable: With systematic approach
Zero-trust architecture works: Verify everything, trust nothing
Defense-in-depth is essential: Multiple layers prevent all attacks
Sessions are complex: Require binding, encryption, timeouts, monitoring
Multi-agent systems need careful design: Each agent is an attack surface
Production-ready requires completeness: Partial solutions fail

Resources¶

Time to Complete: 8-12 hours
Difficulty: ⭐⭐⭐ Advanced
Prerequisites: Stage 1-2 complete, TLS knowledge, cryptography basics

Version: 1.0
Last Updated: January 2026
Status: Production-Ready Implementation