Smoke Test Design Comparison: Monolithic vs Modular¶
Research and decision analysis comparing monolithic and modular smoke test designs for optimal CI/CD visibility and debugging experience.
Context¶
With isolated pytest sessions now working correctly (using -m smoke marker), we can revisit our smoke test design approach without the database state pollution issues that originally forced us to adopt a monolithic design.
Current State¶
File: tests/smoke/test_complete_auth_flow.py
- Single monolithic test function:
test_complete_authentication_flow() - 18 sequential steps executed in one test
- 4 additional independent tests (health check, API docs, invalid login, weak password)
- Simple fixture setup with just
clientandcaplog - All assertions consolidated in one function
Pytest Output:
Desired State¶
- Clear identification of which step fails in smoke test flow
- Granular test progress visibility in CI/CD (GitHub Actions)
- Ability to run/debug individual test steps
- Better test discovery for developers
- Improved coverage reporting showing distinct validations
Constraints¶
- Tests must maintain sequential ordering (steps depend on previous steps)
- Must work in isolated pytest session (separate from main tests)
- Cannot introduce database state pollution
- Must preserve caplog-based token extraction pattern
- Should match original shell script design (
scripts/test-api-flows.sh)
Problem Statement¶
Question: Should we maintain the current monolithic smoke test design or switch back to the original modular design?
The monolithic design was adopted to work around database state pollution issues. Now that pytest sessions are properly isolated, we need to determine which design provides better developer experience, CI/CD visibility, and maintainability.
Why This Matters¶
- Debugging efficiency: Failed smoke tests can block deployments; quick identification of failures is critical
- CI/CD visibility: GitHub Actions should clearly show which authentication steps pass/fail
- Developer experience: New developers should easily understand the 18-step authentication flow
- Maintenance: Code should be easy to update as authentication flow evolves
Research Questions¶
- Clarity: Which design makes test failures easier to identify and debug?
- Visibility: Which design provides better CI/CD progress visualization?
- Maintainability: Which design is easier for developers to understand and modify?
- Isolation: Can the modular design work reliably with isolated pytest sessions?
- Industry Practice: What do other projects use for sequential smoke test flows?
Options Considered¶
Option 1: Monolithic Design (Current)¶
Description: Single test function containing all 18 sequential steps as assertions within one test.
Structure:
def test_complete_authentication_flow(client, caplog):
"""Test complete 18-step authentication journey."""
# Step 1: Register
response = client.post("/api/v1/auth/register", json={...})
assert response.status_code == 200
# Step 2: Extract token
token = extract_token_from_caplog(caplog, "verify-email?token=")
# ... Steps 3-18 ...
Pros:
- ✅ Clear flow: Easy to see complete user journey in one place
- ✅ Fewer fixtures: Simple setup with just
clientandcaplog - ✅ Consolidated: All assertions in one place
- ✅ Documentation: Acts as living documentation of complete flow
- ✅ Simple state management: Uses local variables
Cons:
- ❌ Single failure point: If step 5 fails, steps 6-18 never run
- ❌ Unclear test output: Just shows "1 test failed" instead of which step
- ❌ Hard to debug: Must read through entire function to find failure point
- ❌ Poor pytest output: Can't see progress through individual steps
- ❌ No test isolation: Can't run individual steps independently
- ❌ Coverage reporting: Shows as 1 test instead of 18 distinct validations
- ❌ CI visibility: GitHub Actions shows "1/1" instead of progress
Complexity: Low
Pytest Output (Failure at Step 10):
Doesn't indicate WHICH step failed - must read traceback to understand where.
Option 2: Modular Design (Original)¶
Description: Class-based organization with 18 separate test functions using sequential numbering and shared state fixture.
Structure:
_smoke_test_user_data = {} # Module-level shared state
@pytest.fixture(scope="function")
def smoke_test_user(client, unique_test_email, test_password, caplog):
"""Initialize smoke test user with shared state."""
if _smoke_test_user_data:
return _smoke_test_user_data
# ... registration and state initialization ...
class TestSmokeCompleteAuthFlow:
def test_01_user_registration(self, client, smoke_test_user):
"""Test step 1: User registration."""
# Registration logic
def test_02_email_verification_token_extracted(self, smoke_test_user):
"""Test step 2: Verification token extraction."""
# Token extraction logic
# ... Tests 03-18 ...
Pros:
- ✅ Clear failure identification: Pytest output shows exactly which step failed
- ✅ Better test isolation: Can run/debug individual test functions
- ✅ Granular output: See progress: "Step 1 PASS, Step 2 PASS, Step 3 FAIL"
- ✅ Coverage reporting: 18 distinct tests in coverage/CI reports
- ✅ Pytest markers: Can use pytest's built-in ordering/dependency features
- ✅ Debugging: Can run just one test:
pytest -k test_07_token_refresh - ✅ CI visibility: GitHub Actions shows 18/18 tests (not 1/1)
- ✅ Test discovery: Other developers can see individual steps
- ✅ Matches shell script: Original
scripts/test-api-flows.shhad 17 separate steps
Cons:
- ❌ Shared state complexity: Requires module-level dictionary for state
- ❌ More fixtures: Need
unique_test_email,test_password,smoke_test_user - ❌ State management: Must carefully manage state across test functions
- ❌ Order dependency: Tests MUST run in order (numbered 01-18)
Complexity: Medium
Pytest Output (Failure at Step 10):
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_01_user_registration PASSED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_02_email_verification_token_extracted PASSED
...
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_09_password_reset_request PASSED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_10_extract_reset_token FAILED
tests/smoke/test_complete_auth_flow.py::TestSmokeCompleteAuthFlow::test_11_verify_reset_token SKIPPED
...
Immediately clear: Step 10 failed. Remaining steps automatically skipped after failure. CI output shows visual progress bar through steps.
Historical Context: Why Original Design Failed¶
Problem: Database State Pollution:
When smoke tests ran in the same pytest session as main tests:
- Main tests created
test@example.comuser - Smoke tests also tried to use fixed emails
- SQLAlchemy session cache caused conflicts
- Cleanup fixtures interfered with each other
Solution: Isolated Pytest Sessions:
Now that smoke tests run with -m smoke in separate pytest session:
- ✅ Fresh database state
- ✅ No session cache conflicts
- ✅ No fixture interference
- ✅ Complete isolation from main tests
The original problem is SOLVED with isolated sessions.
Analysis¶
**Comparison Matrix
| Aspect | Monolithic (Current) | Modular (Original) | Weight |
|---|---|---|---|
| Failure clarity | ❌ Low (1 test failed) | ✅ High (step X failed) | Critical |
| CI visibility | ❌ "1/1 passed" | ✅ "18/18 passed" | High |
| Debugging | ❌ Hard (read entire function) | ✅ Easy (run specific step) | High |
| Test isolation | ❌ None | ✅ Per-step | High |
| Code clarity | ✅ Single function | ⚠️ Multiple functions | Medium |
| State management | ✅ Simple (locals) | ⚠️ Shared dict | Medium |
| Pytest output | ❌ Unclear | ✅ Clear | High |
| Coverage reporting | ❌ 1 test | ✅ 18 tests | Medium |
| Run individual step | ❌ No | ✅ Yes (pytest -k test_07) |
High |
| Matches shell script | ❌ No | ✅ Yes (17+1 tests) | Low |
Score Summary:
- Monolithic: 2 ✅, 8 ❌, 0 ⚠️
- Modular: 8 ✅, 0 ❌, 2 ⚠️
Test Clarity and Debugging:
Monolithic:
- Failure message: "test_complete_authentication_flow FAILED"
- Developer must read traceback to find which of 18 steps failed
- Cannot run individual steps for debugging
- Must re-run entire 18-step flow to test fixes
Modular:
- Failure message: "test_10_extract_reset_token FAILED"
- Immediately clear which step has the issue
- Can run just that step:
pytest -k test_10_extract_reset_token - Faster debugging cycle
Winner: Modular (significantly better debugging experience)
CI/CD Visibility:
Monolithic:
GitHub Actions output:
Modular:
GitHub Actions output:
✓ test_01_user_registration (1/18)
✓ test_02_email_verification_token_extracted (2/18)
...
✗ test_10_extract_reset_token (10/18)
⊘ test_11_verify_reset_token (11/18) [SKIPPED]
Visual progress bar shows exactly where failure occurred.
Winner: Modular (much better CI visualization)
Test Isolation:
Monolithic:
- Single test failure means no information about remaining 13 steps
- Cannot verify if steps 11-18 would pass or fail
- No way to test individual steps in isolation
Modular:
- Each step is a separate test
- Can run any step independently (with proper fixture state)
- Failed step doesn't prevent seeing results of subsequent steps (if run individually)
- Better test coverage granularity
Winner: Modular (proper test isolation)
Code Maintainability:
Monolithic:
- Single function is easy to read top-to-bottom
- All logic in one place
- Easier to see dependencies between steps
- Simpler for newcomers to understand flow
Modular:
- Multiple functions require jumping between tests
- Shared state dictionary adds complexity
- Requires understanding fixture scoping
- Better separation of concerns
Winner: Mixed (monolithic for simplicity, modular for separation)
State Management Complexity:
Monolithic:
# Simple local variables
user_email = "test@example.com"
verification_token = extract_token(...)
access_token = response.json()["access_token"]
Modular:
# Module-level shared state
_smoke_test_user_data = {
"email": "...",
"verification_token": "...",
"access_token": "..."
}
Winner: Monolithic (simpler state management)
Industry Research: Real-World Examples:
- Django: Uses separate test methods for sequential flows (e.g., registration → login → profile)
- FastAPI documentation: Examples show individual test functions for API flow steps
- pytest best practices: Recommends granular tests over monolithic tests for better failure identification
Best Practices:
- Test one thing per test function (even in flows)
- Use fixtures for shared state management
- Number tests when order matters (
test_01_,test_02_) - Prioritize debuggability over code simplicity
- CI output should show granular progress
Consensus: Industry favors modular design for smoke tests with sequential dependencies.
Decision¶
Decision: Switch back to the modular design with 18 separate test functions.
Rationale:
The original problem that forced us to adopt the monolithic design (database state pollution between pytest sessions) has been completely solved by running smoke tests in an isolated pytest session using the -m smoke marker.
With this isolation in place, we can now benefit from the modular design's superior debugging experience, CI/CD visibility, and alignment with pytest best practices without any of the previous drawbacks.
Key Factors:
- Primary issue is resolved: State pollution was the blocker, now fixed with isolated sessions
- Better pytest practices: Modular design aligns with industry standards for sequential test flows
- CI/CD visibility: GitHub Actions shows clear 18/18 progress instead of opaque 1/1
- Debugging experience: Can run individual steps (
pytest -k test_07) for faster debugging - Test discovery: Other developers can immediately see all 18 steps in test file
- Matches original design: Shell script (
scripts/test-api-flows.sh) had 17 separate steps
Decision Criteria Met:
- ✅ Failure identification: Test output clearly shows which step failed
- ✅ CI visibility: GitHub Actions displays granular progress (18/18)
- ✅ Debugging capability: Individual steps can be run independently
- ✅ Test isolation: Each step is a separate pytest test
- ✅ Maintainability: Better test organization and separation of concerns
- ✅ Industry alignment: Follows pytest and FastAPI best practices
Consequences¶
Positive Consequences:
- ✅ Improved debugging: Failures immediately identifiable by test name
- ✅ Better CI output: Visual progress through all 18 steps
- ✅ Test coverage clarity: 18 distinct tests in coverage reports
- ✅ Developer experience: Can run specific steps during development
- ✅ Onboarding: New developers can see complete flow at a glance
- ✅ Pytest integration: Works with pytest's built-in features (markers, ordering)
Negative Consequences:
- ⚠️ State management complexity: Requires module-level shared dictionary
- Mitigation: Well-documented fixture pattern, clear docstrings
- ⚠️ More fixtures needed:
unique_test_email,test_password,smoke_test_user - Mitigation: Fixtures improve test reusability and clarity
- ⚠️ Order dependency: Tests must run sequentially (01-18)
- Mitigation: Test numbering makes order explicit, pytest runs alphabetically
Risks:
- Risk: Shared state dictionary could become hard to manage as tests grow
- Mitigation: Keep state dictionary minimal, document clearly, consider refactoring if it exceeds 10-15 keys
- Risk: Order dependency could break if tests run in parallel
- Mitigation: Smoke tests already run sequentially by design, use pytest-ordering if needed
Implementation¶
Phase 1: Restore Original Structure:
- Create comparison branch:
feature/smoke-test-modular-design - Copy
.backupfile to temporary location for reference - Update pytest markers from
@pytest.mark.smoke_testto@pytest.mark.smoke - Restore class-based structure (
TestSmokeCompleteAuthFlow) - Restore 18 separate test functions (
test_01_*throughtest_18_*) - Restore
smoke_test_userfixture with module-level state - Test locally:
make test-smoke - Verify all 18 tests pass with clear output
Phase 2: Refinements:
- Add comprehensive Google-style docstrings to all 18 tests
- Improve
smoke_test_userfixture clarity and documentation - Add descriptive failure messages to assertions
- Verify pytest ordering works correctly (alphabetical 01-18)
- Test individual step execution:
pytest -k test_07_token_refresh - Compare pytest outputs side-by-side (monolithic vs modular)
Phase 3: Documentation:
- Update
tests/smoke/README.mdwith modular design explanation - Document the 18-step flow with clear descriptions
- Add troubleshooting section for common failure points
- Document how to run individual steps for debugging
- Update WARP.md with smoke test design decision
- Add CI integration examples (GitHub Actions output)
Migration Strategy:
Transition Approach:
- Keep current monolithic design as
test_complete_auth_flow.py.monolithic - Implement modular design in
test_complete_auth_flow.py - Run both in CI for 1-2 days to compare outputs
- If modular design proves superior, delete monolithic backup
- If issues arise, can quickly revert to monolithic design
Testing Strategy:
- Run modular design locally until stable (5-10 successful runs)
- Deploy to CI and monitor for 2-3 days
- Compare CI failure identification clarity
- Get developer feedback on debugging experience
Rollback Plan:
If modular design proves problematic:
- Rename current implementation to
.backup - Restore monolithic design from
.monolithicfile - Update pytest markers back to original
- Document why modular design didn't work
Rollback Trigger Conditions:
- Smoke tests become flaky or intermittent
- State management issues cause test pollution
- CI/CD pipeline breaks or becomes unreliable
Success Metrics:
How we'll measure success:
- Metric 1: Failure identification time
- Target: < 30 seconds to identify which step failed (vs 2-5 minutes with monolithic)
-
Measurement: Developer feedback and CI log review
-
Metric 2: CI output clarity
- Target: 100% of developers can identify failed step from GitHub Actions output without reading logs
-
Measurement: Developer survey after 2 weeks
-
Metric 3: Test stability
- Target: 0 flaky test runs attributed to state management
-
Measurement: Track test failure patterns over 30 days
-
Metric 4: Debug time reduction
- Target: 50% reduction in time to fix smoke test failures
- Measurement: Compare average time to resolution before/after switch
Follow-Up¶
Future Considerations:
- Test ordering plugin: Consider
pytest-orderingif alphabetical ordering proves insufficient - Parallel execution: If smoke tests ever need to run in parallel, will need refactoring
- State management alternatives: Could explore pytest-dependency for explicit test dependencies
- Additional smoke tests: Extend to provider operations (OAuth flow) when implemented
Review Schedule:
- First review: 2 weeks after deployment (2025-10-21)
- Evaluate CI output clarity
- Gather developer feedback on debugging experience
-
Check for any flaky test patterns
-
Regular review: Quarterly
- Assess if design still meets needs as authentication flow evolves
- Evaluate if state management complexity has increased
- Consider any new pytest features that could improve design
References¶
Project Files:
- Original shell script:
scripts/test-api-flows.sh - Backup file:
tests/smoke/test_complete_auth_flow.py.backup - Current monolithic design:
tests/smoke/test_complete_auth_flow.py - Smoke test README:
tests/smoke/README.md
Related Documentation:
- Smoke Test Caplog Solution - Token extraction implementation
- Smoke Test CI Debugging - Session isolation fixes
- Testing Strategy - Overall testing approach
External Resources:
- pytest best practices - Test organization
- FastAPI testing guide - API testing patterns
- Django testing docs - Sequential test flows
Document Information¶
Template: research-template.md Created: 2025-10-07 Last Updated: 2025-10-18