Celonis Technical Interview Prep

🔬

Process Mining Fundamentals

Core Concept

What is Process Mining? A data science technique that reconstructs, visualizes, and analyzes how business processes actually execute — by mining the digital event footprints left in enterprise IT systems (ERP, CRM, SCM, etc.). Unlike process modeling (how you think processes work), process mining reveals ground truth from real data.

📥 Event Log

The raw input. A structured dataset with three minimum fields: Case ID (unique business object identifier), Activity (what happened), and Timestamp (when it happened). Optionally enriched with attributes like user, cost, location.

🔍 Discovery

The most foundational technique. Takes an event log and automatically generates a process model (Petri net, BPMN, DFG) showing all actual execution paths, frequencies, and durations — without any prior model definition.

✅ Conformance Checking

Compares the actual process (from event log) against an expected/normative process model. Identifies deviations — missing activities, wrong order, unauthorized paths — and quantifies fitness and precision metrics.

⚡ Enhancement

Takes an existing process model and enriches it with real performance data from the event log — adding timing, bottleneck analysis, resource utilization, and KPI overlays to improve or extend the model.

🔄 Process Variant

A unique sequence of activities observed for a subset of cases. E.g., "PO → GR → Invoice → Payment" vs. "PO → Invoice → GR → Payment" are two distinct variants. High variant count = high process complexity.

🕸️ Directly-Follows Graph (DFG)

The most common process model representation. Nodes = activities, edges = "activity A was directly followed by B." Edge weights show frequency and performance metrics. Foundation of Celonis's process visualization.

🌐

Object-Centric Process Mining (OCPM)

Advanced

Dimension	Traditional Case-Centric	Object-Centric (OCPM)
Event Log Structure	One case ID per event (forced flattening)	Events linked to multiple objects simultaneously
Business Object	Single object type (e.g., only "Order")	Multiple types (Order + Invoice + Delivery in parallel)
Key Problem	Convergence/divergence artifacts ("spaghetti")	Naturally handles many-to-many relationships
Use Case Fit	Simple, linear, single-object processes	O2C, P2P, complex supply chains with parallel flows
AI Accuracy	Biased by flattened/duplicated events	Higher-accuracy ML on true object relationships

Key Insight: In Order-to-Cash, one Sales Order can have 3 deliveries and 4 invoices. Traditional mining forces you to pick ONE object as the "case" — distorting the process view. OCPM lets all objects co-exist, revealing the true complexity of cross-object dependencies and bottlenecks.

🏗️

EMS Platform Architecture

🟢 Layer 3 — Execution & Application Layer (Surface)

Where users interact and where value is realized. Includes:
Celonis Studio Process Copilots Business Miner Action Flows Solution Suites Annotation Builder Partner Apps
Low-code builder for custom analyses and apps. Action Flows trigger automated fixes in source systems. Process Copilots push recommendations to Slack/Teams.

🟠 Layer 2 — Process Intelligence Graph (Core Brain)

The heart of the platform — a dynamic digital twin of enterprise operations. Combines raw system data with business context:
KPIs & Benchmarks Industry Standards Business Rules Enterprise Architecture Object Relationships
System-agnostic (SAP + Oracle + Salesforce + custom simultaneously). Supports OCPM. Feeds the AI/ML layer with operational context. Living structure — updates in real time.

🔵 Layer 1 — Data Core (Foundation)

High-performance data infrastructure for ingesting and querying enterprise event data at scale.
SaolaDB (in-memory columnar) 100+ pre-built connectors Continuous extraction Real-time CDC Billions of events
The proprietary SaolaDB is a column-store in-memory database purpose-built for process mining queries — not a generic RDBMS. Handles SAP change data capture at enterprise scale with incremental extraction of only modified records.

⌨️

Process Query Language (PQL)

Must Know

PQL vs. SQL: PQL is a domain-specific language layered on top of Celonis's in-memory engine. It adds 180+ operators specifically for process analytics — things SQL cannot natively express: case-level aggregations, process-flow filters, activity sequence detection, conformance metrics, and ML operators.

PROCESS_ORDER()

Returns the position/order of an activity within a case's execution path. Used to filter cases where an activity occurred before/after another.

CALC_THROUGHPUT()

Calculates time between activities (or first/last activity) for each case. Supports TOTAL, AVG, MIN, MAX aggregations. Critical for cycle time analysis.

PROCESS_EQUALS()

Returns cases where the actual execution path exactly matches a specified sequence. Used for conformance KPIs and "happy path" rate calculations.

REWORK_COUNT()

Counts the number of times an activity repeats within a case. Identifies rework loops — a major source of cost inefficiency in O2C, P2P, and ITSM processes.

RUNNING_SUM()

Computes cumulative totals over ordered case events. Useful for aging analyses, open item tracking, and working capital calculations.

KU() / KM()

KU = "Knowledge Unique" — returns the distinct lookup value for a case. KM = "Knowledge Multiple" — returns comma-separated values when a case has multiple. Handle many-to-one and many-to-many relationships.

PQL — Cycle Time by Variant (Example)

-- Calculate average throughput time grouped by process variant
SELECT
    VARIANT("ACTIVITIES"."ACTIVITY") AS "Process Variant",
    AVG(CALC_THROUGHPUT(
        CASE_START, CASE_END,
        HOURS
    )) AS "Avg Cycle Time (hrs)",
    COUNT(DISTINCT "ACTIVITIES"."CASE_ID") AS "Case Count"
WHERE
    PROCESS_EQUALS(
        "Create PO" -> "Goods Receipt" -> "Invoice Received"
    ) = 0  -- Exclude exact happy-path cases
ORDER BY "Avg Cycle Time (hrs)" DESC
    

PQL — Rework Rate KPI (Example)

-- % of cases with at least one "Change PO" rework activity
SELECT
    SUM(CASE
        WHEN REWORK_COUNT(
            "ACTIVITIES"."ACTIVITY",
            'Change PO'
        ) > 0 THEN 1
        ELSE 0
    END) * 100.0 /
    COUNT(DISTINCT "ACTIVITIES"."CASE_ID")
    AS "PO Rework Rate (%)"
    

🔌

SAP Integration & Data Extraction

Component	Description
Continuous Extractor	On-premise middleware that monitors SAP tables for changes via Change Data Capture (CDC). Extracts only delta records — not full table dumps. Outputs CSV files for transport.
RFC Module	Remote Function Call module installed in SAP. Extracts data from SAP ABAP tables via background jobs. Works without direct DB access.
Key SAP Tables	`EKKO/EKPO` (PO headers/items), `EKBE` (PO history/GR events), `RBKP/RSEG` (invoices), `BKPF/BSEG` (accounting docs), `AUFK` (orders)
Event Construction	SAP doesn't natively store "process events." Celonis constructs them by joining change document tables (`CDHDR/CDPOS`) with master data. Each document status change becomes an event.
Transport Options	Direct DB connection · SAP Message Server · PI/PO mediation · Cloud connector. Enterprise setups use dedicated SAP Basis team coordination.
Non-SAP Sources	100+ pre-built connectors: Oracle EBS, Salesforce, ServiceNow, Workday, SAP S/4HANA, SAP ECC. Custom connectors via REST API and JDBC drivers.

Interview Tip: Understand that Celonis is not a traditional ETL tool. Its extractor is purpose-built to reconstruct process events from operational tables — often requiring domain knowledge (like knowing EKBE contains GR/IR events in P2P) rather than just table-level extraction.

🛠️

Data Engineering Topics

Event Log Construction

Given raw transactional tables (no explicit event log), reconstruct Case ID / Activity / Timestamp triplets. Requires SQL joins, UNION ALL across multiple event sources, and timestamp normalization.

Data Modeling in Celonis

Celonis data models use an Activity Table (events) as the spine, linked via foreign keys to Case Table (master data) and Activity Attribute Tables. Star-schema style, optimized for SaolaDB columnar queries.

SQL for Process Analytics

Advanced SQL skills critical: window functions (LAG/LEAD, ROW_NUMBER, RANK), self-joins for activity sequencing, DATEDIFF for cycle times, CTEs for multi-step event reconstruction.

Data Quality & Handling

Real-world event logs have missing timestamps, duplicate events, out-of-order records, and NULL case IDs. Interview may test knowledge of how to detect, filter, or impute these issues.

Performance Optimization

SaolaDB is in-memory columnar — avoid row-by-row operations. Pushdown filtering to extraction layer, minimize cross-joins, use aggregate tables for large datasets. Columnar storage = fast aggregations on individual fields.

Real-Time vs. Batch

Celonis supports both: batch extraction (periodic full/delta loads) and real-time continuous extraction with near-real-time process monitoring. Architecture choice depends on SLA requirements and source system load tolerance.

SQL — Reconstruct P2P Event Log from SAP Tables

-- Reconstruct a Procure-to-Pay event log from SAP tables
WITH po_events AS (
    -- PO Creation from EKKO
    SELECT
        EBELN AS case_id,
        'Create PO' AS activity,
        AEDAT AS event_time,
        ERNAM AS user_id
    FROM EKKO

    UNION ALL

    -- Goods Receipt from EKBE (movement type 101)
    SELECT
        EBELN AS case_id,
        'Goods Receipt' AS activity,
        BUDAT AS event_time,
        USNAM AS user_id
    FROM EKBE
    WHERE BEWTP = 'E' AND VGABE = '1'  -- GR events

    UNION ALL

    -- Invoice Receipt from RBKP
    SELECT
        e.EBELN AS case_id,
        'Invoice Received' AS activity,
        r.BUDAT AS event_time,
        r.USNAM AS user_id
    FROM RBKP r
    JOIN RSEG s ON r.BELNR = s.BELNR
    JOIN EKBE e ON s.EBELN = e.EBELN
)
SELECT case_id, activity, event_time, user_id
FROM po_events
WHERE event_time IS NOT NULL
ORDER BY case_id, event_time;
      

🤖

AI & ML in the Celonis Platform

Process Copilots (GA)

LLM-powered next-best-action recommendations delivered directly in Slack and Microsoft Teams. Grounded in Process Intelligence Graph data — not hallucinations. Each recommendation is traceable to a specific process event.

AgentC (MCP Server)

Model Context Protocol server that exposes Celonis Process Intelligence Graph to any AI agent (Claude, GPT, etc.). Enables AI agents to answer questions grounded in real operational data — e.g., "Why is PO-12345 late?"

Annotation Builder

No-code GenAI tool for business experts to encode domain rules into recommendation models. Combines human domain knowledge with ML — addressing the "last mile" problem of AI adoption without data scientists.

ML Operators in PQL

Native ML functions integrated into PQL — anomaly detection, clustering, predictive scoring. Run directly in the query engine without requiring external ML pipelines. E.g., predict payment delay risk per invoice.

Natural Language → PQL

LLM-powered translation of plain English questions into valid PQL queries (Beta). E.g., "Show me all cases with late GR in the last 30 days" → generates PQL automatically. Democratizes process analysis.

"No AI Without PI" Thesis

AI agents without process context produce generic, often wrong outputs. The Process Intelligence Graph provides the enterprise operational context layer that makes AI actionable — grounding LLMs in real business state.

📋

Core Process Use Cases

Order-to-Cash (O2C)

Order creation → Credit check → Delivery → Goods issue → Invoice → Payment receipt. Key KPIs: Order fulfillment time, on-time delivery rate, invoice accuracy, DSO (Days Sales Outstanding).

Procure-to-Pay (P2P)

PR → PO → GR → Invoice → Payment. Key KPIs: PO cycle time, early payment discount capture, 3-way match rate, duplicate invoice rate, DPO (Days Payable Outstanding).

IT Service Management (ITSM)

Incident → Triage → Assign → Resolve → Close. Key KPIs: MTTR, SLA breach rate, reopened tickets, escalation rate, and first contact resolution.

Accounts Receivable / Payable

Invoice processing, dispute management, dunning, cash application. Key KPIs: Days past due, collection effectiveness index, automated match rate, working capital impact.

💬

Technical Interview Questions & Answers

Practice

ALGO Explain how process discovery algorithms work. What is the difference between a DFG and a Petri net?

Directly-Follows Graph (DFG): The simplest model — nodes are activities, directed edges represent "A was directly followed by B." Edge weights show frequency or average time. Fast to compute, easy to understand, but doesn't capture concurrency or exclusive choices explicitly.

Petri Net: A formal mathematical model with places (circles) and transitions (rectangles) connected by arcs. Can express concurrency (AND-splits/joins) and exclusive choices (XOR). More expressive but harder to read. Discovery algorithms like the Alpha Miner, Inductive Miner, and Heuristic Miner produce Petri nets from event logs with varying noise tolerance and structural guarantees.

SQL Given a table of transaction events, how would you calculate the average time between "PO Created" and "Invoice Paid" per vendor?

Use window functions with conditional aggregation or self-joins:

WITH events AS (SELECT case_id, vendor_id, activity, event_time FROM activities)
Join or use CASE WHEN activity = 'PO Created' THEN event_time END with MIN() and MAX() over case partitions. Then: AVG(DATEDIFF(day, po_time, payment_time)) grouped by vendor. In Celonis PQL this would be AVG(CALC_THROUGHPUT(CASE_START, CASE_END, DAYS)) with a dimension filter on vendor.

ARCH Why does Celonis use a custom in-memory database (SaolaDB) rather than a standard RDBMS like PostgreSQL?

Process mining queries have very specific access patterns that differ from OLTP workloads: (1) Most queries aggregate over single columns (activities, timestamps) across billions of rows — columnar storage is dramatically faster than row storage for this. (2) Process-specific operations (variant detection, conformance checking, sequence analysis) require custom operators that don't exist in SQL. (3) In-memory storage eliminates disk I/O latency for interactive analytics. (4) The data model (event log) is append-only and analytical — optimized for reads, not transactional writes. A general-purpose RDBMS would be 10–100x slower and would require complex workarounds.

DATA What is "spaghetti" in process mining and how does Object-Centric Process Mining help?

"Spaghetti" refers to process models that are visually incomprehensible — hundreds of crossing edges, activities connected in all directions — because real-world processes have many variants. It often occurs in O2C/P2P when you force multi-object processes into a single case ID. For example, if one Sales Order has 3 deliveries, flattening to case = Sales Order creates artificial event duplication (each delivery event appears 3 times), generating spurious paths. OCPM solves this by allowing events to belong to multiple objects simultaneously — the model stays interpretable because each object type has its own natural flow, and cross-object dependencies are shown explicitly rather than as tangled single-case paths.

DESIGN How would you design a system to detect payment fraud using process mining?

Step 1 — Event Log: Extract AP payment process events (Invoice Received → Approved → Paid) with attributes: vendor, amount, approver, timestamp, payment method.
Step 2 — Conformance: Define the "normal" payment flow. Use conformance checking to flag cases that deviate (e.g., payment without GR, skipped approval step).
Step 3 — Anomaly Detection: Apply PQL ML operators to score each case by deviation from statistical norms — unusual payment amount, speed, vendor-approver pair, time-of-day.
Step 4 — Execution: Action Flows auto-route high-risk cases to a fraud review queue. Process Copilot pushes alerts to the AP team in real time. This is the full Celonis loop: Detect → Decide → Act.

PQL What is the difference between KU() and KM() in PQL?

Both are "Knowledge" functions for handling multi-value relationships in the data model.
KU() (Knowledge Unique): Used when each case maps to exactly ONE value (1:1 or many-to-1). E.g., each PO has one vendor → KU("EKKO"."LIFNR"). Returns a single scalar per case.
KM() (Knowledge Multiple): Used when a case can have multiple values (1:many or many:many). E.g., one PO with multiple line items across different cost centers → KM("EKPO"."KOSTL"). Returns a comma-separated string. If you use KU() on a many-valued column, it will pick an arbitrary value and silently lose data — a common bug.

📝

Typical Celonis Technical Interview Process

1️⃣

Take-Home Technical Assessment (1 week)

PostgreSQL-focused. Typically involves reconstructing an event log from raw transactional tables, computing KPIs with window functions, and writing a data pipeline. Quality of SQL and documentation matters.

2️⃣

Technical Interview — Assignment Presentation

Walk through your take-home solution. Defend your design choices. Expect deep questions: "Why did you use a window function here vs. a subquery?", "How would this scale to 1B rows?"

3️⃣

System Design & Architecture Discussion

Design a data pipeline or analytics system. Questions focus on scalability, data modeling for process mining, handling real-world data quality issues, and integration patterns.

4️⃣

Behavioral + Technical Cross-Over

Celonis interviewers blend product/business context with technical depth. You may be asked: "How would you explain OCPM to a CFO?" — testing both technical understanding and communication skills.

✅

Technical Preparation Checklist

Pro Tip: Celonis Academy (academy.celonis.com) has free courses on process mining fundamentals and PQL basics. Even completing the "Process Mining Fundamentals" certification signals genuine interest and is a strong conversation starter in interviews.