TL;DR
- We added a dedicated search layer alongside our existing database for a 2M-device IoT platform.
- Dashboard load times dropped from 1 min 33 sec to under 1 second — 94× faster.
- Our database remained the system of record. No application rewrite. No data migration risk.
Slow dashboards are a silent tax on every team that uses them. A 90-second load time doesn’t just annoy users, it trains them to stop checking, stop trusting the data, and start making decisions without it.
What Was Broken and Why It Mattered
Our platform dashboards showed device status, regional breakdowns, alert summaries, and filter panels — the kind of operational view that teams rely on throughout the day.
With 2 million devices and growing, those dashboards became unusable:
- Loads took 1–2 minutes on common queries, longer on complex filters
- Auto-refresh had to be disabled — the database couldn’t keep up with repeated queries
- Database CPU spiked every time someone opened a dashboard, slowing down other operations
- Users stopped trusting the data because they assumed slow meant stale
Throwing larger servers at the problem bought days, not months. The real issue was architectural: we were asking our transactional database to do something it was never designed for.
The Numbers
We ran identical queries against both systems with 2 million device records. These are production query models, measured end-to-end — not lab benchmarks.
DASHBOARD SUMMARY METRICS
The aggregate counts and status breakdowns shown at the top of every dashboard.
| SCENARIO | MYSQL | ELASTICSEARCH | IMPROVEMENT |
|---|---|---|---|
| All data (no filter) | 1 min 33 sec | 0.99 sec | 94× faster |
| With filters | 40.3 sec | 0.95 sec | 42× faster |
| With search | 55.6 sec | 0.96 sec | 58× faster |
FILTER PANEL AGGREGATIONS
Multi-dimensional breakdowns by region, firmware version, operational state, and customer tag.
| SCENARIO | MYSQL | ELASTICSEARCH | IMPROVEMENT |
|---|---|---|---|
| All data (no filter) | 59.5 sec | 0.97 sec | 61× faster |
| With filters applied | 46.2 sec | 0.98 sec | 47× faster |
| With text search | 32.6 sec | 1.02 sec | 32× faster |
DEVICE LISTINGS WITH SORTING
Fetching and sorting paginated lists across the full 2M-device dataset.
| SCENARIO | MYSQL | ELASTICSEARCH | IMPROVEMENT |
|---|---|---|---|
| 25rows, no filter | 30 sec | 1.19 sec | 25× faster |
| 250 rows, no filter | 36.4 sec | 3.5 sec | 10`× faster |
| Sort by operational_state | 3 min 7 sec | 3.21 sec | 58× faster |
Why the Original Setup Was the Wrong Tool
Our existing database (MySQL) is excellent at what it was designed for. Dashboard queries are a fundamentally different workload.
| WHAT DASHBOARDS NEED | WHAT A TRANSACTIONAL DB DOES WELL |
|---|---|
| Aggregate counts across millions of rows | Look up individual records by ID |
| Filter by any combination of fields | Join across normalized tables |
| Sort by arbitrary columns | Maintain ACID transactions |
| Refresh every 30 seconds | Enforce data consistency |
What We Built
Rather than replacing our database, we added a dedicated read layer alongside it. Each system does what it’s good at.
MYSQL — TRANSACTIONAL WRITES
- All device creates, updates, and deletes
- Configuration and ownership records
- Audit records and compliance data
- Individual device detail pages (fast single-record lookups by ID)
ELASTICSEARCH — ANALYTICAL READS
- Dashboard summary metrics
- Filter panel aggregations
- Paginated listings with sorting
- Any query pattern that scans the full dataset
HOW THEY STAY IN SYNC
Every write still goes to MySQL first — it remains the single source of truth. A lightweight sync job runs every 30 seconds, pushing only changed records into the search layer. Dashboard reads go to Elasticsearch; detail pages still hit MySQL directly to ensure real-time accuracy.
When This Change Is Worth Considering
These are the signals we’d look for, based on our experience:
| SIGNAL | THRESHOLD TO WATCH |
|---|---|
| Page load time | Consistently above 30 seconds |
| Database CPU during use | Sustained above 70% |
| Growing filter complexity | New filter dimensions added monthly |
| Corpus size | Approaching 500k+ searchable records |
| User requests | Faceted counts, real-time aggregations, geo queries |
If two or more of these apply, a dedicated search layer is worth a serious look.
When It’s Not the Right Move
This architecture adds real operational complexity. It’s not always the right answer:
- Under 100k records with simple queries — a properly indexed relational database handles this fine
- Dashboards show pre-computed summaries that refresh every few minutes, caching may be sufficient
- Your team can’t operate a distributed search cluster — managed services reduce this burden, but it’s still a new system to learn
- You need real-time data consistency — Elasticsearch is eventually consistent; for financial or compliance views it may not work
The Trade-offs We Accepted
| TRADE-OFF | HOW WE HANDLED IT |
|---|---|
| Eventual consistency | 30-second lag is fine for dashboards; detail pages still read MySQL live |
| No joins | Denormalized region, customer, and status fields into each document |
| Operational overhead | Used managed AWS OpenSearch to minimize infrastructure burden |
| Sync complexity | Idempotent upserts keyed on device ID — safe to replay, easy to recover |
Results
| METRIC | BEFORE | AFTER |
|---|---|---|
| Dashboard load (unfiltered) | 1 min 33 sec | < 1 sec |
| Filter panel aggregations | 60 sec | < 1 sec |
| User complaints about speed | Frequent | Rare |
| Dashboard auto-refresh | Disabled (too slow) | Every 30 seconds |
We went from disabling auto-refresh because the database couldn’t keep up, to live dashboards refreshing every 30 seconds. That’s the difference between a reporting tool and an operational one.
How to Approach This Migration
- Measure before you move. Capture actual query times at the application layer — P95 and P99, not just averages. Averages obscure the worst user experiences.
- Find the expensive patterns. COUNT queries with GROUP BY across large tables, sorts on columns that can’t be fully indexed, and multi-table joins that force large intermediate scans.
- Build a shadow index first. Sync data into Elasticsearch without routing any traffic to it. Validate correctness before changing anything user-facing.
- Dark-launch a small slice. Route 5–10% of dashboard traffic to Elasticsearch while keeping MySQL as the fallback. Monitor latency, error rates, and result correctness.
- Keep the fallback alive. Don’t cut over entirely until Elasticsearch has handled at least a week of clean production traffic.
- Define your freshness requirement early. Our 30-second lag worked. If you need tighter consistency, consider CDC tools — Debezium is worth evaluating — rather than polling-based sync.
The Core Principle
MySQL and Elasticsearch aren’t competing products — they solve different problems. The mistake is forcing one to do the other’s job.
| KEEP IN TRANSACTIONAL DB | MOVE TO SEARCH LAYER |
|---|---|
| All writes and updates | Aggregations across large datasets |
| Record-level lookups by ID | Multi-dimensional filtering and faceted counts |
| Anything requiring strict data consistency | Full-text search |
| The authoritative source of truth | Any query scanning millions of rows repeatedly |
For our platform, choosing the right tool for each workload reduced our worst dashboard load times by 94×. The 30-second sync delay was a small price for dashboards that actually work.