Quality Assurance

Building a Test Automation Strategy for EdTech Platforms: From Manual QA to CI/CD

12 Min Read

How to transition from manual testing to automated test pipelines for educational software, covering the EdTech test pyramid, cross-browser testing, parallel execution, and CI/CD integration from 5+ years automating testing for Lexia Learning.

The Automation Challenge

When Lexia Learning approached us in 2018, their release cycle was constrained by manual QA. Every release required a 3-day regression testing sprint with 4 QA engineers manually executing 600+ test cases across 8 browser/device combinations. They wanted to ship weekly; manual testing made that impossible.

The goal: automate 80% of regression testing, reduce test execution from 3 days to under 2 hours, and maintain (or improve) defect detection rates. The path wasn't straightforward, test automation projects fail more often than they succeed, usually because teams focus on tools rather than strategy.

This post covers the test automation strategy we built for Lexia (and have since deployed for Cengage, Triumph Learning, and Meteor Education). It's not about Selenium vs. Playwright, it's about the architectural decisions that make automation sustainable over years, not just months.

The EdTech Test Pyramid

The classic test pyramid (lots of unit tests, fewer integration tests, minimal UI tests) doesn't map cleanly to educational software. EdTech products are integration-heavy, their value comes from connecting content, assessments, analytics, and LMS platforms, not from isolated component logic.

Our EdTech test pyramid has four layers, not three. Layer 1 (Base): API contract tests that validate integration points, LTI launches, SCORM communication, grade passback, analytics event payloads. These run in milliseconds, catch breaking changes in external integrations immediately, and form 60% of our automated test count.

Layer 2: Component integration tests that validate learning workflows end-to-end at the business logic layer without rendering UI, assignment creation → student completion → grade calculation → LMS passback. These tests exercise the full stack but skip the slowest part (browser rendering). They represent 25% of test count and catch logic bugs reliably.

Layer 3: Critical path UI tests covering the 20% of user workflows that represent 80% of usage, login → navigate to course → start assessment → submit → see results. We test these in actual browsers across device/browser combinations. These are 10% of test count but critical for release confidence.

Layer 4: Visual regression tests on key pages, lesson landing pages, assessment questions, results screens, teacher dashboards. These catch CSS regressions and content rendering issues that functional tests miss. We use Percy for pixel-perfect comparison against approved baselines. These represent 5% of test count.

test-pyramid-implementation.py
# Example test pyramid distribution for EdTech platform
import pytest
from playwright.sync_api import sync_playwright

# Layer 1: API Contract Tests (60% of tests, ~5 min execution)
def test_lti_launch_parameters():
    response = requests.post('/lti/launch', data=lti_params)
    assert response.status_code == 200
    assert 'user_id' in response.json()
    assert 'context_id' in response.json()

def test_scorm_grade_tracking():
    scorm_api.set_score(85)
    assert scorm_api.get_value('cmi.core.score.raw') == '85'

# Layer 2: Integration Tests (25%, ~15 min execution)
def test_assignment_workflow():
    assignment = Assignment.create(title="Quiz 1", points=100)
    submission = student.submit_assignment(assignment.id, answers)
    grade = grader.calculate_grade(submission.id)
    assert grade.score == 85
    assert lms_api.grade_posted(assignment.id, student.id)

# Layer 3: UI Critical Path (10%, ~30 min execution)
@pytest.mark.ui
def test_student_takes_quiz():
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page()

        page.goto('https://platform.test/login')
        page.fill('[name="email"]', 'student@test.com')
        page.fill('[name="password"]', 'password')
        page.click('button[type="submit"]')

        page.click('text=Course 101')
        page.click('text=Quiz 1')
        page.click('text=Start Quiz')

        page.click('[data-question="1"] [data-answer="B"]')
        page.click('[data-question="2"] [data-answer="A"]')
        page.click('button:has-text("Submit")')

        assert page.locator('.score').text_content() == '100%'
        browser.close()

# Layer 4: Visual Regression (5%, ~10 min execution)
@pytest.mark.visual
def test_quiz_question_rendering(page, percy_snapshot):
    page.goto('https://platform.test/quiz/1/question/5')
    percy_snapshot(page, 'Quiz Question - Multiple Choice')

Cross-Browser Testing Priorities

The K-12 device landscape is fundamentally different from consumer tech. Chromebooks dominate (over 50% market share in U.S. K-12), followed by iPads. In higher education, the mix is more diverse, but Chrome is still 70%+ of student traffic.

This reality shapes our browser testing matrix. Priority 1: Chrome on Windows and Chrome on ChromeOS (we treat these as separate test targets because ChromeOS has unique quirks with file uploads, clipboard access, and fullscreen). Priority 2: Safari on iPad (test both portrait and landscape orientations). Priority 3: Firefox on Windows. Priority 4: Edge. Internet Explorer is finally dead in education, though you occasionally see IE11 requirements from legacy contracts.

We don't test every feature in every browser. API integration tests and business logic tests run in one browser (Chrome headless) for speed. UI critical path tests run in all Priority 1 and 2 browsers (Chrome Windows, Chrome ChromeOS, Safari iPad). Visual regression tests run in Chrome only (pixel-perfect comparison across browsers is impractical and rarely finds real bugs).

Device testing: we maintain a physical device lab with actual Chromebooks (low-end and mid-range), iPads (9th gen and iPad Pro), and Android tablets. Cloud device testing (BrowserStack, Sauce Labs) supplements for breadth, but physical devices catch touch interaction bugs and performance issues that emulators miss.

💡Pro-Tip: Test With Real Student Devices

The Chrome DevTools device emulator is useful for quick checks but doesn't accurately simulate real Chromebook performance (especially low-end $200 models used in many schools). Invest in a few actual student devices, you'll catch performance bottlenecks and interaction bugs that never appear on developer laptops.

Parallel Test Execution

A test suite that takes 90 minutes sequentially is useless for continuous delivery. Parallel execution is not optional, it's the foundation of fast feedback loops that let developers ship confidently multiple times per week.

We run UI tests in parallel across a Selenium Grid with 16 nodes: 4 VMs, each running 4 Chrome instances. Each test gets an isolated session with a fresh database state via test fixtures that create and tear down data per test. API and integration tests run in parallel on a separate Jenkins agent pool, they're CPU-bound (not I/O-bound like UI tests), so they scale differently.

The test execution flow: on every pull request, we run a smoke suite (120 tests, 15 minutes) covering critical paths only. On merge to develop, we run the full regression suite (820 tests, parallelized across 16 nodes, completes in 90 minutes). Nightly, we run extended tests including visual regression and cross-browser validation.

Parallelization challenges: managing test data (each parallel test needs isolated data), handling flaky tests (retries can mask real issues), and debugging failures (when tests run in parallel, logs get interleaved). We use test run UUIDs to correlate logs, screenshots on failure, and video recordings for UI tests to make debugging tractable.

pytest-parallel-config.ini
# pytest.ini - Parallel execution configuration
[pytest]
markers =
    smoke: Smoke tests (fast, run on every PR)
    regression: Full regression (run on merge to develop)
    visual: Visual regression tests (run nightly)
    slow: Slow tests (run in parallel only)

addopts =
    -n auto
    --dist loadgroup
    --maxfail=10
    --tb=short
    --strict-markers

[pytest-xdist]
looponfailroots = tests
rsyncdir = tests fixtures

[pytest-rerunfailures]
reruns = 1
reruns_delay = 5

CI/CD Pipeline Integration

Test automation only delivers value when integrated into CI/CD pipelines that developers actually use. If test results take hours to arrive or are ignored because of noise, the automation investment is wasted.

Our CI/CD test strategy: On PR: Run smoke tests (120 tests, 15 min). Block merge if any fail. Include code coverage report (require 80%+ coverage for new code). On merge to develop: Run full regression suite (820 tests, 90 min). If failures occur, create a rollback deploy and alert team. On deploy to staging: Run smoke tests in staging environment (validates deployment succeeded). Nightly: Run extended suite including cross-browser, visual regression, and performance tests.

Critical insight: developer trust in automated tests is fragile. One week of flaky tests that fail randomly and get ignored, and developers stop believing test results. We enforce strict policies: any test that fails twice without code changes gets quarantined immediately. Tests quarantined for more than 1 week get deleted. Flaky tests are treated as P1 bugs and fixed within 48 hours.

Maintaining Test Suites

Test automation is not 'set it and forget it', test suites require ongoing maintenance. As features change, tests break. As the application grows, test execution time grows. As team members leave, test knowledge gets lost.

Our maintenance strategy: Page Object Model: Encapsulate UI locators and interactions in page object classes. When the UI changes, update one page object instead of 50 tests. Test data factories: Use factory functions to create test data programmatically rather than hardcoded fixtures. Makes tests resilient to schema changes. Regular test cleanup: Every quarter, review test suite metrics (execution time, failure rates, code coverage) and delete or consolidate redundant tests.

We measure test suite health with three metrics: Flake rate (percentage of tests that fail without code changes, target <1%), Test execution time trend (track over time, investigate if growing), Test coverage vs. defect detection (are we catching bugs before users do? If not, we have a coverage gap).

Conclusion

Test automation done well is a force multiplier. Lexia Learning now ships updates twice per week (up from monthly), with higher confidence and fewer production defects than in the manual testing era. The keys to sustainable automation: match your test pyramid to your architecture, parallelize aggressively, integrate into CI/CD, don't tolerate flaky tests, and remember that the goal is faster feedback, not 100% automation at any cost.

After building test automation frameworks for Lexia, Cengage, Triumph Learning, and Meteor Education, we've refined our approach to what works: prioritize API and integration tests over UI tests, test in real student browsers and devices, and maintain ruthless quality standards for test reliability. If you're facing the transition from manual to automated testing or need to scale your existing automation, we've solved these problems and would be happy to share detailed strategies.

#TestAutomation#QA#EdTech#CI/CD#Selenium

Related Projects

Ready to Harness the Power of AI?

Whether you're optimizing operations, enhancing customer experiences, or exploring automation, our team at TechiZen is ready to bring your vision to life with 20+ years of software excellence. Let's start building your AI advantage today.