Smoke Test Design Comparison: Monolithic vs Modular¶

Research and decision analysis comparing monolithic and modular smoke test designs for optimal CI/CD visibility and debugging experience.

Context¶

With isolated pytest sessions now working correctly (using -m smoke marker), we can revisit our smoke test design approach without the database state pollution issues that originally forced us to adopt a monolithic design.

Current State¶

File: tests/smoke/test_complete_auth_flow.py

Single monolithic test function: test_complete_authentication_flow()
18 sequential steps executed in one test
4 additional independent tests (health check, API docs, invalid login, weak password)
Simple fixture setup with just client and caplog
All assertions consolidated in one function

Pytest Output:

tests/smoke/test_complete_auth_flow.py::test_complete_authentication_flow PASSED

Desired State¶

Clear identification of which step fails in smoke test flow
Granular test progress visibility in CI/CD (GitHub Actions)
Ability to run/debug individual test steps
Better test discovery for developers
Improved coverage reporting showing distinct validations

Constraints¶

Tests must maintain sequential ordering (steps depend on previous steps)
Must work in isolated pytest session (separate from main tests)
Cannot introduce database state pollution
Must preserve caplog-based token extraction pattern
Should match original shell script design (scripts/test-api-flows.sh)

Problem Statement¶

Question: Should we maintain the current monolithic smoke test design or switch back to the original modular design?

The monolithic design was adopted to work around database state pollution issues. Now that pytest sessions are properly isolated, we need to determine which design provides better developer experience, CI/CD visibility, and maintainability.

Why This Matters¶

Debugging efficiency: Failed smoke tests can block deployments; quick identification of failures is critical
CI/CD visibility: GitHub Actions should clearly show which authentication steps pass/fail
Developer experience: New developers should easily understand the 18-step authentication flow
Maintenance: Code should be easy to update as authentication flow evolves

Research Questions¶

Clarity: Which design makes test failures easier to identify and debug?
Visibility: Which design provides better CI/CD progress visualization?
Maintainability: Which design is easier for developers to understand and modify?
Isolation: Can the modular design work reliably with isolated pytest sessions?
Industry Practice: What do other projects use for sequential smoke test flows?

Options Considered¶

Option 1: Monolithic Design (Current)¶

Description: Single test function containing all 18 sequential steps as assertions within one test.

Structure:

def test_complete_authentication_flow(client, caplog):
    """Test complete 18-step authentication journey."""
    # Step 1: Register
    response = client.post("/api/v1/auth/register", json={...})
    assert response.status_code == 200

    # Step 2: Extract token
    token = extract_token_from_caplog(caplog, "verify-email?token=")

    # ... Steps 3-18 ...

Pros:

✅ Clear flow: Easy to see complete user journey in one place
✅ Fewer fixtures: Simple setup with just client and caplog
✅ Consolidated: All assertions in one place
✅ Documentation: Acts as living documentation of complete flow
✅ Simple state management: Uses local variables

Cons:

❌ Single failure point: If step 5 fails, steps 6-18 never run
❌ Unclear test output: Just shows "1 test failed" instead of which step
❌ Hard to debug: Must read through entire function to find failure point
❌ Poor pytest output: Can't see progress through individual steps
❌ No test isolation: Can't run individual steps independently
❌ Coverage reporting: Shows as 1 test instead of 18 distinct validations
❌ CI visibility: GitHub Actions shows "1/1" instead of progress

Complexity: Low

Pytest Output (Failure at Step 10):

tests/smoke/test_complete_auth_flow.py::test_complete_authentication_flow FAILED

Doesn't indicate WHICH step failed - must read traceback to understand where.

Option 2: Modular Design (Original)¶

Description: Class-based organization with 18 separate test functions using sequential numbering and shared state fixture.

Structure:

_smoke_test_user_data = {}  # Module-level shared state

@pytest.fixture(scope="function")
def smoke_test_user(client, unique_test_email, test_password, caplog):
    """Initialize smoke test user with shared state."""
    if _smoke_test_user_data:
        return _smoke_test_user_data
    # ... registration and state initialization ...

class TestSmokeCompleteAuthFlow:
    def test_01_user_registration(self, client, smoke_test_user):
        """Test step 1: User registration."""
        # Registration logic

    def test_02_email_verification_token_extracted(self, smoke_test_user):
        """Test step 2: Verification token extraction."""
        # Token extraction logic

    # ... Tests 03-18 ...

Pros:

✅ Clear failure identification: Pytest output shows exactly which step failed
✅ Better test isolation: Can run/debug individual test functions
✅ Granular output: See progress: "Step 1 PASS, Step 2 PASS, Step 3 FAIL"
✅ Coverage reporting: 18 distinct tests in coverage/CI reports
✅ Pytest markers: Can use pytest's built-in ordering/dependency features
✅ Debugging: Can run just one test: pytest -k test_07_token_refresh
✅ CI visibility: GitHub Actions shows 18/18 tests (not 1/1)
✅ Test discovery: Other developers can see individual steps
✅ Matches shell script: Original scripts/test-api-flows.sh had 17 separate steps

Cons:

❌ Shared state complexity: Requires module-level dictionary for state
❌ More fixtures: Need unique_test_email, test_password, smoke_test_user
❌ State management: Must carefully manage state across test functions
❌ Order dependency: Tests MUST run in order (numbered 01-18)

Complexity: Medium

Pytest Output (Failure at Step 10):

tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_01_user_registration PASSED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_02_email_verification_token_extracted PASSED
...
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_09_password_reset_request PASSED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_10_extract_reset_token FAILED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_11_verify_reset_token SKIPPED
...

Immediately clear: Step 10 failed. Remaining steps automatically skipped after failure. CI output shows visual progress bar through steps.

Historical Context: Why Original Design Failed¶

Problem: Database State Pollution:

When smoke tests ran in the same pytest session as main tests:

Main tests created test@example.com user
Smoke tests also tried to use fixed emails
SQLAlchemy session cache caused conflicts
Cleanup fixtures interfered with each other

Solution: Isolated Pytest Sessions:

Now that smoke tests run with -m smoke in separate pytest session:

✅ Fresh database state
✅ No session cache conflicts
✅ No fixture interference
✅ Complete isolation from main tests

The original problem is SOLVED with isolated sessions.

Analysis¶

**Comparison Matrix

Aspect	Monolithic (Current)	Modular (Original)	Weight
Failure clarity	❌ Low (1 test failed)	✅ High (step X failed)	Critical
CI visibility	❌ "1/1 passed"	✅ "18/18 passed"	High
Debugging	❌ Hard (read entire function)	✅ Easy (run specific step)	High
Test isolation	❌ None	✅ Per-step	High
Code clarity	✅ Single function	⚠️ Multiple functions	Medium
State management	✅ Simple (locals)	⚠️ Shared dict	Medium
Pytest output	❌ Unclear	✅ Clear	High
Coverage reporting	❌ 1 test	✅ 18 tests	Medium
Run individual step	❌ No	✅ Yes (`pytest -k test_07`)	High
Matches shell script	❌ No	✅ Yes (17+1 tests)	Low

Score Summary:

Monolithic: 2 ✅, 8 ❌, 0 ⚠️
Modular: 8 ✅, 0 ❌, 2 ⚠️

Test Clarity and Debugging:

Monolithic:

Failure message: "test_complete_authentication_flow FAILED"
Developer must read traceback to find which of 18 steps failed
Cannot run individual steps for debugging
Must re-run entire 18-step flow to test fixes

Modular:

Failure message: "test_10_extract_reset_token FAILED"
Immediately clear which step has the issue
Can run just that step: pytest -k test_10_extract_reset_token
Faster debugging cycle

Winner: Modular (significantly better debugging experience)

CI/CD Visibility:

Monolithic:

GitHub Actions output:

✓ tests/smoke/test_complete_auth_flow.py::test_complete_authentication_flow (1/1)

Modular:

GitHub Actions output:

✓ test_01_user_registration (1/18)
✓ test_02_email_verification_token_extracted (2/18)
...
✗ test_10_extract_reset_token (10/18)
⊘ test_11_verify_reset_token (11/18)  [SKIPPED]

Visual progress bar shows exactly where failure occurred.

Winner: Modular (much better CI visualization)

Test Isolation:

Monolithic:

Single test failure means no information about remaining 13 steps
Cannot verify if steps 11-18 would pass or fail
No way to test individual steps in isolation

Modular:

Each step is a separate test
Can run any step independently (with proper fixture state)
Failed step doesn't prevent seeing results of subsequent steps (if run individually)
Better test coverage granularity

Winner: Modular (proper test isolation)

Code Maintainability:

Monolithic:

Single function is easy to read top-to-bottom
All logic in one place
Easier to see dependencies between steps
Simpler for newcomers to understand flow

Modular:

Multiple functions require jumping between tests
Shared state dictionary adds complexity
Requires understanding fixture scoping
Better separation of concerns

Winner: Mixed (monolithic for simplicity, modular for separation)

State Management Complexity:

Monolithic:

# Simple local variables
user_email = "test@example.com"
verification_token = extract_token(...)
access_token = response.json()["access_token"]

Modular:

# Module-level shared state
_smoke_test_user_data = {
    "email": "...",
    "verification_token": "...",
    "access_token": "..."
}

Winner: Monolithic (simpler state management)

Industry Research: Real-World Examples:

Django: Uses separate test methods for sequential flows (e.g., registration → login → profile)
FastAPI documentation: Examples show individual test functions for API flow steps
pytest best practices: Recommends granular tests over monolithic tests for better failure identification

Best Practices:

Test one thing per test function (even in flows)
Use fixtures for shared state management
Number tests when order matters (test_01_, test_02_)
Prioritize debuggability over code simplicity
CI output should show granular progress

Consensus: Industry favors modular design for smoke tests with sequential dependencies.

Decision¶

Decision: Switch back to the modular design with 18 separate test functions.

Rationale:

The original problem that forced us to adopt the monolithic design (database state pollution between pytest sessions) has been completely solved by running smoke tests in an isolated pytest session using the -m smoke marker.

With this isolation in place, we can now benefit from the modular design's superior debugging experience, CI/CD visibility, and alignment with pytest best practices without any of the previous drawbacks.

Key Factors:

Primary issue is resolved: State pollution was the blocker, now fixed with isolated sessions
Better pytest practices: Modular design aligns with industry standards for sequential test flows
CI/CD visibility: GitHub Actions shows clear 18/18 progress instead of opaque 1/1
Debugging experience: Can run individual steps (pytest -k test_07) for faster debugging
Test discovery: Other developers can immediately see all 18 steps in test file
Matches original design: Shell script (scripts/test-api-flows.sh) had 17 separate steps

Decision Criteria Met:

✅ Failure identification: Test output clearly shows which step failed
✅ CI visibility: GitHub Actions displays granular progress (18/18)
✅ Debugging capability: Individual steps can be run independently
✅ Test isolation: Each step is a separate pytest test
✅ Maintainability: Better test organization and separation of concerns
✅ Industry alignment: Follows pytest and FastAPI best practices

Consequences¶

Positive Consequences:

✅ Improved debugging: Failures immediately identifiable by test name
✅ Better CI output: Visual progress through all 18 steps
✅ Test coverage clarity: 18 distinct tests in coverage reports
✅ Developer experience: Can run specific steps during development
✅ Onboarding: New developers can see complete flow at a glance
✅ Pytest integration: Works with pytest's built-in features (markers, ordering)

Negative Consequences:

⚠️ State management complexity: Requires module-level shared dictionary
Mitigation: Well-documented fixture pattern, clear docstrings
⚠️ More fixtures needed: unique_test_email, test_password, smoke_test_user
Mitigation: Fixtures improve test reusability and clarity
⚠️ Order dependency: Tests must run sequentially (01-18)
Mitigation: Test numbering makes order explicit, pytest runs alphabetically

Risks:

Risk: Shared state dictionary could become hard to manage as tests grow
Mitigation: Keep state dictionary minimal, document clearly, consider refactoring if it exceeds 10-15 keys
Risk: Order dependency could break if tests run in parallel
Mitigation: Smoke tests already run sequentially by design, use pytest-ordering if needed

Implementation¶

Phase 1: Restore Original Structure:

Create comparison branch: feature/smoke-test-modular-design
Copy .backup file to temporary location for reference
Update pytest markers from @pytest.mark.smoke_test to @pytest.mark.smoke
Restore class-based structure (TestSmokeCompleteAuthFlow)
Restore 18 separate test functions (test_01_* through test_18_*)
Restore smoke_test_user fixture with module-level state
Test locally: make test-smoke
Verify all 18 tests pass with clear output

Phase 2: Refinements:

Add comprehensive Google-style docstrings to all 18 tests
Improve smoke_test_user fixture clarity and documentation
Add descriptive failure messages to assertions
Verify pytest ordering works correctly (alphabetical 01-18)
Test individual step execution: pytest -k test_07_token_refresh
Compare pytest outputs side-by-side (monolithic vs modular)

Phase 3: Documentation:

Update tests/smoke/README.md with modular design explanation
Document the 18-step flow with clear descriptions
Add troubleshooting section for common failure points
Document how to run individual steps for debugging
Update WARP.md with smoke test design decision
Add CI integration examples (GitHub Actions output)

Migration Strategy:

Transition Approach:

Keep current monolithic design as test_complete_auth_flow.py.monolithic
Implement modular design in test_complete_auth_flow.py
Run both in CI for 1-2 days to compare outputs
If modular design proves superior, delete monolithic backup
If issues arise, can quickly revert to monolithic design

Testing Strategy:

Run modular design locally until stable (5-10 successful runs)
Deploy to CI and monitor for 2-3 days
Compare CI failure identification clarity
Get developer feedback on debugging experience

Rollback Plan:

If modular design proves problematic:

Rename current implementation to .backup
Restore monolithic design from .monolithic file
Update pytest markers back to original
Document why modular design didn't work

Rollback Trigger Conditions:

Smoke tests become flaky or intermittent
State management issues cause test pollution
CI/CD pipeline breaks or becomes unreliable

Success Metrics:

How we'll measure success:

Metric 1: Failure identification time
Target: < 30 seconds to identify which step failed (vs 2-5 minutes with monolithic)
Measurement: Developer feedback and CI log review
Metric 2: CI output clarity
Target: 100% of developers can identify failed step from GitHub Actions output without reading logs
Measurement: Developer survey after 2 weeks
Metric 3: Test stability
Target: 0 flaky test runs attributed to state management
Measurement: Track test failure patterns over 30 days
Metric 4: Debug time reduction
Target: 50% reduction in time to fix smoke test failures
Measurement: Compare average time to resolution before/after switch

Follow-Up¶

Future Considerations:

Test ordering plugin: Consider pytest-ordering if alphabetical ordering proves insufficient
Parallel execution: If smoke tests ever need to run in parallel, will need refactoring
State management alternatives: Could explore pytest-dependency for explicit test dependencies
Additional smoke tests: Extend to provider operations (OAuth flow) when implemented

Review Schedule:

First review: 2 weeks after deployment (2025-10-21)
Evaluate CI output clarity
Gather developer feedback on debugging experience
Check for any flaky test patterns
Regular review: Quarterly
Assess if design still meets needs as authentication flow evolves
Evaluate if state management complexity has increased
Consider any new pytest features that could improve design

References¶

Project Files:

Original shell script: scripts/test-api-flows.sh
Backup file: tests/smoke/test_complete_auth_flow.py.backup
Current monolithic design: tests/smoke/test_complete_auth_flow.py
Smoke test README: tests/smoke/README.md

Related Documentation:

Smoke Test Caplog Solution - Token extraction implementation
Smoke Test CI Debugging - Session isolation fixes
Testing Strategy - Overall testing approach

External Resources:

pytest best practices - Test organization
FastAPI testing guide - API testing patterns
Django testing docs - Sequential test flows

Document Information¶

Template: research-template.md Created: 2025-10-07 Last Updated: 2025-10-18