Skip to content

Smoke Test Design Comparison: Monolithic vs Modular

Research and decision analysis comparing monolithic and modular smoke test designs for optimal CI/CD visibility and debugging experience.

Context

With isolated pytest sessions now working correctly (using -m smoke marker), we can revisit our smoke test design approach without the database state pollution issues that originally forced us to adopt a monolithic design.

Current State

File: tests/smoke/test_complete_auth_flow.py

  • Single monolithic test function: test_complete_authentication_flow()
  • 18 sequential steps executed in one test
  • 4 additional independent tests (health check, API docs, invalid login, weak password)
  • Simple fixture setup with just client and caplog
  • All assertions consolidated in one function

Pytest Output:

tests/smoke/test_complete_auth_flow.py::test_complete_authentication_flow PASSED

Desired State

  • Clear identification of which step fails in smoke test flow
  • Granular test progress visibility in CI/CD (GitHub Actions)
  • Ability to run/debug individual test steps
  • Better test discovery for developers
  • Improved coverage reporting showing distinct validations

Constraints

  • Tests must maintain sequential ordering (steps depend on previous steps)
  • Must work in isolated pytest session (separate from main tests)
  • Cannot introduce database state pollution
  • Must preserve caplog-based token extraction pattern
  • Should match original shell script design (scripts/test-api-flows.sh)

Problem Statement

Question: Should we maintain the current monolithic smoke test design or switch back to the original modular design?

The monolithic design was adopted to work around database state pollution issues. Now that pytest sessions are properly isolated, we need to determine which design provides better developer experience, CI/CD visibility, and maintainability.

Why This Matters

  • Debugging efficiency: Failed smoke tests can block deployments; quick identification of failures is critical
  • CI/CD visibility: GitHub Actions should clearly show which authentication steps pass/fail
  • Developer experience: New developers should easily understand the 18-step authentication flow
  • Maintenance: Code should be easy to update as authentication flow evolves

Research Questions

  1. Clarity: Which design makes test failures easier to identify and debug?
  2. Visibility: Which design provides better CI/CD progress visualization?
  3. Maintainability: Which design is easier for developers to understand and modify?
  4. Isolation: Can the modular design work reliably with isolated pytest sessions?
  5. Industry Practice: What do other projects use for sequential smoke test flows?

Options Considered

Option 1: Monolithic Design (Current)

Description: Single test function containing all 18 sequential steps as assertions within one test.

Structure:

def test_complete_authentication_flow(client, caplog):
    """Test complete 18-step authentication journey."""
    # Step 1: Register
    response = client.post("/api/v1/auth/register", json={...})
    assert response.status_code == 200

    # Step 2: Extract token
    token = extract_token_from_caplog(caplog, "verify-email?token=")

    # ... Steps 3-18 ...

Pros:

  • ✅ Clear flow: Easy to see complete user journey in one place
  • ✅ Fewer fixtures: Simple setup with just client and caplog
  • ✅ Consolidated: All assertions in one place
  • ✅ Documentation: Acts as living documentation of complete flow
  • ✅ Simple state management: Uses local variables

Cons:

  • ❌ Single failure point: If step 5 fails, steps 6-18 never run
  • ❌ Unclear test output: Just shows "1 test failed" instead of which step
  • ❌ Hard to debug: Must read through entire function to find failure point
  • ❌ Poor pytest output: Can't see progress through individual steps
  • ❌ No test isolation: Can't run individual steps independently
  • ❌ Coverage reporting: Shows as 1 test instead of 18 distinct validations
  • ❌ CI visibility: GitHub Actions shows "1/1" instead of progress

Complexity: Low

Pytest Output (Failure at Step 10):

tests/smoke/test_complete_auth_flow.py::test_complete_authentication_flow FAILED

Doesn't indicate WHICH step failed - must read traceback to understand where.

Option 2: Modular Design (Original)

Description: Class-based organization with 18 separate test functions using sequential numbering and shared state fixture.

Structure:

_smoke_test_user_data = {}  # Module-level shared state

@pytest.fixture(scope="function")
def smoke_test_user(client, unique_test_email, test_password, caplog):
    """Initialize smoke test user with shared state."""
    if _smoke_test_user_data:
        return _smoke_test_user_data
    # ... registration and state initialization ...

class TestSmokeCompleteAuthFlow:
    def test_01_user_registration(self, client, smoke_test_user):
        """Test step 1: User registration."""
        # Registration logic

    def test_02_email_verification_token_extracted(self, smoke_test_user):
        """Test step 2: Verification token extraction."""
        # Token extraction logic

    # ... Tests 03-18 ...

Pros:

  • ✅ Clear failure identification: Pytest output shows exactly which step failed
  • ✅ Better test isolation: Can run/debug individual test functions
  • ✅ Granular output: See progress: "Step 1 PASS, Step 2 PASS, Step 3 FAIL"
  • ✅ Coverage reporting: 18 distinct tests in coverage/CI reports
  • ✅ Pytest markers: Can use pytest's built-in ordering/dependency features
  • ✅ Debugging: Can run just one test: pytest -k test_07_token_refresh
  • ✅ CI visibility: GitHub Actions shows 18/18 tests (not 1/1)
  • ✅ Test discovery: Other developers can see individual steps
  • ✅ Matches shell script: Original scripts/test-api-flows.sh had 17 separate steps

Cons:

  • ❌ Shared state complexity: Requires module-level dictionary for state
  • ❌ More fixtures: Need unique_test_email, test_password, smoke_test_user
  • ❌ State management: Must carefully manage state across test functions
  • ❌ Order dependency: Tests MUST run in order (numbered 01-18)

Complexity: Medium

Pytest Output (Failure at Step 10):

tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_01_user_registration PASSED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_02_email_verification_token_extracted PASSED
...
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_09_password_reset_request PASSED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_10_extract_reset_token FAILED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_11_verify_reset_token SKIPPED
...

Immediately clear: Step 10 failed. Remaining steps automatically skipped after failure. CI output shows visual progress bar through steps.

Historical Context: Why Original Design Failed

Problem: Database State Pollution:

When smoke tests ran in the same pytest session as main tests:

  • Main tests created test@example.com user
  • Smoke tests also tried to use fixed emails
  • SQLAlchemy session cache caused conflicts
  • Cleanup fixtures interfered with each other

Solution: Isolated Pytest Sessions:

Now that smoke tests run with -m smoke in separate pytest session:

  • ✅ Fresh database state
  • ✅ No session cache conflicts
  • ✅ No fixture interference
  • ✅ Complete isolation from main tests

The original problem is SOLVED with isolated sessions.

Analysis

**Comparison Matrix

Aspect Monolithic (Current) Modular (Original) Weight
Failure clarity ❌ Low (1 test failed) ✅ High (step X failed) Critical
CI visibility ❌ "1/1 passed" ✅ "18/18 passed" High
Debugging ❌ Hard (read entire function) ✅ Easy (run specific step) High
Test isolation ❌ None ✅ Per-step High
Code clarity ✅ Single function ⚠️ Multiple functions Medium
State management ✅ Simple (locals) ⚠️ Shared dict Medium
Pytest output ❌ Unclear ✅ Clear High
Coverage reporting ❌ 1 test ✅ 18 tests Medium
Run individual step ❌ No ✅ Yes (pytest -k test_07) High
Matches shell script ❌ No ✅ Yes (17+1 tests) Low

Score Summary:

  • Monolithic: 2 ✅, 8 ❌, 0 ⚠️
  • Modular: 8 ✅, 0 ❌, 2 ⚠️

Test Clarity and Debugging:

Monolithic:

  • Failure message: "test_complete_authentication_flow FAILED"
  • Developer must read traceback to find which of 18 steps failed
  • Cannot run individual steps for debugging
  • Must re-run entire 18-step flow to test fixes

Modular:

  • Failure message: "test_10_extract_reset_token FAILED"
  • Immediately clear which step has the issue
  • Can run just that step: pytest -k test_10_extract_reset_token
  • Faster debugging cycle

Winner: Modular (significantly better debugging experience)

CI/CD Visibility:

Monolithic:

GitHub Actions output:

 tests/smoke/test_complete_auth_flow.py::test_complete_authentication_flow (1/1)

Modular:

GitHub Actions output:

 test_01_user_registration (1/18)
 test_02_email_verification_token_extracted (2/18)
...
 test_10_extract_reset_token (10/18)
 test_11_verify_reset_token (11/18)  [SKIPPED]

Visual progress bar shows exactly where failure occurred.

Winner: Modular (much better CI visualization)

Test Isolation:

Monolithic:

  • Single test failure means no information about remaining 13 steps
  • Cannot verify if steps 11-18 would pass or fail
  • No way to test individual steps in isolation

Modular:

  • Each step is a separate test
  • Can run any step independently (with proper fixture state)
  • Failed step doesn't prevent seeing results of subsequent steps (if run individually)
  • Better test coverage granularity

Winner: Modular (proper test isolation)

Code Maintainability:

Monolithic:

  • Single function is easy to read top-to-bottom
  • All logic in one place
  • Easier to see dependencies between steps
  • Simpler for newcomers to understand flow

Modular:

  • Multiple functions require jumping between tests
  • Shared state dictionary adds complexity
  • Requires understanding fixture scoping
  • Better separation of concerns

Winner: Mixed (monolithic for simplicity, modular for separation)

State Management Complexity:

Monolithic:

# Simple local variables
user_email = "test@example.com"
verification_token = extract_token(...)
access_token = response.json()["access_token"]

Modular:

# Module-level shared state
_smoke_test_user_data = {
    "email": "...",
    "verification_token": "...",
    "access_token": "..."
}

Winner: Monolithic (simpler state management)

Industry Research: Real-World Examples:

  • Django: Uses separate test methods for sequential flows (e.g., registration → login → profile)
  • FastAPI documentation: Examples show individual test functions for API flow steps
  • pytest best practices: Recommends granular tests over monolithic tests for better failure identification

Best Practices:

  • Test one thing per test function (even in flows)
  • Use fixtures for shared state management
  • Number tests when order matters (test_01_, test_02_)
  • Prioritize debuggability over code simplicity
  • CI output should show granular progress

Consensus: Industry favors modular design for smoke tests with sequential dependencies.

Decision

Decision: Switch back to the modular design with 18 separate test functions.

Rationale:

The original problem that forced us to adopt the monolithic design (database state pollution between pytest sessions) has been completely solved by running smoke tests in an isolated pytest session using the -m smoke marker.

With this isolation in place, we can now benefit from the modular design's superior debugging experience, CI/CD visibility, and alignment with pytest best practices without any of the previous drawbacks.

Key Factors:

  1. Primary issue is resolved: State pollution was the blocker, now fixed with isolated sessions
  2. Better pytest practices: Modular design aligns with industry standards for sequential test flows
  3. CI/CD visibility: GitHub Actions shows clear 18/18 progress instead of opaque 1/1
  4. Debugging experience: Can run individual steps (pytest -k test_07) for faster debugging
  5. Test discovery: Other developers can immediately see all 18 steps in test file
  6. Matches original design: Shell script (scripts/test-api-flows.sh) had 17 separate steps

Decision Criteria Met:

  • Failure identification: Test output clearly shows which step failed
  • CI visibility: GitHub Actions displays granular progress (18/18)
  • Debugging capability: Individual steps can be run independently
  • Test isolation: Each step is a separate pytest test
  • Maintainability: Better test organization and separation of concerns
  • Industry alignment: Follows pytest and FastAPI best practices

Consequences

Positive Consequences:

  • Improved debugging: Failures immediately identifiable by test name
  • Better CI output: Visual progress through all 18 steps
  • Test coverage clarity: 18 distinct tests in coverage reports
  • Developer experience: Can run specific steps during development
  • Onboarding: New developers can see complete flow at a glance
  • Pytest integration: Works with pytest's built-in features (markers, ordering)

Negative Consequences:

  • ⚠️ State management complexity: Requires module-level shared dictionary
  • Mitigation: Well-documented fixture pattern, clear docstrings
  • ⚠️ More fixtures needed: unique_test_email, test_password, smoke_test_user
  • Mitigation: Fixtures improve test reusability and clarity
  • ⚠️ Order dependency: Tests must run sequentially (01-18)
  • Mitigation: Test numbering makes order explicit, pytest runs alphabetically

Risks:

  • Risk: Shared state dictionary could become hard to manage as tests grow
  • Mitigation: Keep state dictionary minimal, document clearly, consider refactoring if it exceeds 10-15 keys
  • Risk: Order dependency could break if tests run in parallel
  • Mitigation: Smoke tests already run sequentially by design, use pytest-ordering if needed

Implementation

Phase 1: Restore Original Structure:

  • Create comparison branch: feature/smoke-test-modular-design
  • Copy .backup file to temporary location for reference
  • Update pytest markers from @pytest.mark.smoke_test to @pytest.mark.smoke
  • Restore class-based structure (TestSmokeCompleteAuthFlow)
  • Restore 18 separate test functions (test_01_* through test_18_*)
  • Restore smoke_test_user fixture with module-level state
  • Test locally: make test-smoke
  • Verify all 18 tests pass with clear output

Phase 2: Refinements:

  • Add comprehensive Google-style docstrings to all 18 tests
  • Improve smoke_test_user fixture clarity and documentation
  • Add descriptive failure messages to assertions
  • Verify pytest ordering works correctly (alphabetical 01-18)
  • Test individual step execution: pytest -k test_07_token_refresh
  • Compare pytest outputs side-by-side (monolithic vs modular)

Phase 3: Documentation:

  • Update tests/smoke/README.md with modular design explanation
  • Document the 18-step flow with clear descriptions
  • Add troubleshooting section for common failure points
  • Document how to run individual steps for debugging
  • Update WARP.md with smoke test design decision
  • Add CI integration examples (GitHub Actions output)

Migration Strategy:

Transition Approach:

  1. Keep current monolithic design as test_complete_auth_flow.py.monolithic
  2. Implement modular design in test_complete_auth_flow.py
  3. Run both in CI for 1-2 days to compare outputs
  4. If modular design proves superior, delete monolithic backup
  5. If issues arise, can quickly revert to monolithic design

Testing Strategy:

  • Run modular design locally until stable (5-10 successful runs)
  • Deploy to CI and monitor for 2-3 days
  • Compare CI failure identification clarity
  • Get developer feedback on debugging experience

Rollback Plan:

If modular design proves problematic:

  1. Rename current implementation to .backup
  2. Restore monolithic design from .monolithic file
  3. Update pytest markers back to original
  4. Document why modular design didn't work

Rollback Trigger Conditions:

  • Smoke tests become flaky or intermittent
  • State management issues cause test pollution
  • CI/CD pipeline breaks or becomes unreliable

Success Metrics:

How we'll measure success:

  • Metric 1: Failure identification time
  • Target: < 30 seconds to identify which step failed (vs 2-5 minutes with monolithic)
  • Measurement: Developer feedback and CI log review

  • Metric 2: CI output clarity

  • Target: 100% of developers can identify failed step from GitHub Actions output without reading logs
  • Measurement: Developer survey after 2 weeks

  • Metric 3: Test stability

  • Target: 0 flaky test runs attributed to state management
  • Measurement: Track test failure patterns over 30 days

  • Metric 4: Debug time reduction

  • Target: 50% reduction in time to fix smoke test failures
  • Measurement: Compare average time to resolution before/after switch

Follow-Up

Future Considerations:

  • Test ordering plugin: Consider pytest-ordering if alphabetical ordering proves insufficient
  • Parallel execution: If smoke tests ever need to run in parallel, will need refactoring
  • State management alternatives: Could explore pytest-dependency for explicit test dependencies
  • Additional smoke tests: Extend to provider operations (OAuth flow) when implemented

Review Schedule:

  • First review: 2 weeks after deployment (2025-10-21)
  • Evaluate CI output clarity
  • Gather developer feedback on debugging experience
  • Check for any flaky test patterns

  • Regular review: Quarterly

  • Assess if design still meets needs as authentication flow evolves
  • Evaluate if state management complexity has increased
  • Consider any new pytest features that could improve design

References

Project Files:

  • Original shell script: scripts/test-api-flows.sh
  • Backup file: tests/smoke/test_complete_auth_flow.py.backup
  • Current monolithic design: tests/smoke/test_complete_auth_flow.py
  • Smoke test README: tests/smoke/README.md

Related Documentation:

External Resources:


Document Information

Template: research-template.md Created: 2025-10-07 Last Updated: 2025-10-18