Data Architecture for real-time customer growth
SocialHub runs CRM and CDP workloads on one governed customer data foundation: SQL keeps business operations reliable, Kafka and Flink move and compute live signals, and StarRocks makes profiles, segments and BI fast enough for activation.
API ingestion, consent checks and field mapping.
Views, clicks, search, cart and conversion events.
Transactional writes remain reliable in SQL.
Authenticate, validate, isolate bad payloads and split state from event streams.
CRM truth
Binlog / WAL
Buffer, replay, fan-out and absorb traffic spikes without coupling producers to consumers.
Clean, dedupe, join, identify, compute labels, test segments and fire journey triggers.
Profiles, tag wide tables, event detail, cohorts, funnels, attribution and BI.
Same customer model for sales, marketing and operations.
Actions can trigger as soon as the signal is computed.
Separate the transaction path from the intelligence path.
CRM data is stateful: who the customer is, which sales owner is assigned, what order was placed, and whether a contract or membership changed. CDP data is eventful: what the customer viewed, clicked, searched, ignored and responded to. The architecture keeps those two workloads separate where they need different guarantees, then unifies them into one real-time customer model for analysis and action.
Three ways data enters the customer model.
CRM state data
CRM apps -> SQL -> CDC -> Kafka -> Flink -> StarRocksOrders, members, contracts, tickets and sales activity stay transactionally correct in SQL first, then move downstream as audited change events.
CDP behavior data
Web/App SDK -> Event Gateway -> Kafka -> Flink -> StarRocks / ActivationPage views, clicks, searches, add-to-cart, login and conversion signals are validated at the edge and computed as a live behavioral stream.
Third-party systems
API Layer -> Data Router -> SQL or Kafka -> Flink -> Customer intelligenceERP, commerce, service, ads, email and messaging platforms are routed by business meaning: state into CRM truth, events into the real-time bus.
Each layer has one job.
The stack is intentionally layered so operational reliability, streaming computation, analytics performance and activation can scale independently.
API layer
Auth, rate limits, validation, field mapping and system-specific error handling.
SQL database
The CRM transaction center for customer, lead, order, contract and membership records.
CDC
Captures inserts, updates and deletes from SQL as incremental change streams.
Kafka
The real-time data bus for buffering, replay, fan-out and decoupled consumers.
Flink
The live computation and decision layer: clean, dedupe, join, merge identities, tag, segment and trigger.
StarRocks
The high-performance OLAP and profile service layer for 360 profiles, funnels, BI and audience selection.
Activation
Email, SMS, WeChat, sales alerts, webhooks, ad sync and journey orchestration.
Apps
CRM workbench, CDP audience tools, dashboards and AI agents consume the same governed customer model.
Flink is more than ETL.
The stream layer is where data becomes a decision. It cleans and joins events, merges identities, recognizes sessions, computes rolling behavior windows, updates live labels and decides whether a journey or sales alert should fire.
price-page visit - no form submit within 10 min - high-intent unresolved label - sales alert + nurture journey
What this unlocks
- Unified customer 360 across CRM records, behavior events, transactions, service history and channel responses.
- Real-time labels and segments built from both state changes and behavioral windows.
- Journey triggers that react to intent signals without waiting for batch jobs.
- BI and operating dashboards that refresh from the same semantic customer foundation.
- AI-ready features for intent scoring, churn risk, next-best-action and content generation.
Built to become an AI-ready customer growth system.
Complete, real-time and structured customer data is the prerequisite for useful AI. The same foundation that powers CRM, CDP, BI and activation can also support intent scoring, churn prediction, next-best-action, audience recommendations and AI-generated campaign operations.
Related: platform engine and web tracking SDK.