SQL Crash Course for Fraud Analysts
Essential SQL skills for investigating fraud cases: learn to query transaction data, analyze patterns, and gather evidence
By Benjamin, Fraud Attacks · Updated
SQL is how you ask questions from data. Keep it simple: select rows, filter them, sort them, and count what matters. The examples below are clear and copy-paste friendly.
Why SQL matters: SQL is the core tool for fraud analysts. When you're investigating a case, you need to pull transaction histories, find patterns across accounts, and answer questions like "how many times has this device been used?" or "what other accounts share this email domain?" SQL lets you ask those questions directly. It's also the foundation for data science work, so the skills you learn here transfer far beyond fraud investigation.
A note on SQL dialects: This tutorial uses PostgreSQL / DuckDB syntax. The exercises in this academy run on DuckDB-WASM in your browser, and DuckDB closely tracks PostgreSQL for the syntax we cover here (date functions, EXTRACT, CTEs, window functions). The core concepts (SELECT, WHERE, JOIN, GROUP BY) work the same across all major databases, so the skills you learn here transfer directly to any SQL environment.
Table of Contents
- What Is SQL?
- Database Basics
- Your First SQL Query
- Comparison Operators
- Combining Conditions
- Pattern Matching
- Working with Dates
- Counting & Summing
- GROUP BY
- JOINs
- CTEs (Common Table Expressions)
- ANSI SQL Cheat Sheet
What Is SQL? (In Plain English)
SQL stands for "Structured Query Language" - but don't let that scare you! Think of it as a way to ask questions about data stored in a computer.
Imagine you have a giant filing cabinet with millions of customer records, transaction receipts, and account information. SQL is like having a super-smart assistant who can instantly find exactly what you're looking for.
Database Basics: Think of It Like Excel (But Way Bigger)
What's a Database?
A database is like a collection of Excel spreadsheets, but designed to handle millions of rows without crashing your computer.
What's a Table?
A table is just like one Excel spreadsheet. It has:
- Rows = individual records (like one customer or one transaction)
- Columns = different pieces of information (like name, amount, date)
Example: A "transactions" table might look like this:
transaction_id | customer_id | amount | merchant | date
1001 | 12345 | 50.00 | Amazon | 2024-01-15
1002 | 12345 | 500.00 | Best Buy | 2024-01-15
1003 | 67890 | 25.00 | Starbucks | 2024-01-15
Think of each row as a receipt - it tells you who spent money, how much, where, and when.
Your First SQL Query: Asking Simple Questions
The SELECT Statement (Like Asking "Show Me...")
Basic query:
SELECT * FROM transactions;*= "everything" (all columns)FROM transactions= "from the transactions table";= end of query
Specific columns:
SELECT amount, merchant FROM transactions;The WHERE Clause (Like Adding "But Only...")
Filter by amount:
SELECT * FROM transactions WHERE amount > 100;Filter by merchant:
SELECT * FROM transactions WHERE merchant = 'Amazon';Note: Use single quotes around text values like 'Amazon'.
Comparison Operators (Ways to Compare Things)
Think of these like math class, but for data:
Equals and Not Equals:
=means "equals exactly" →amount = 100(finds transactions of exactly $100)!=means "not equal to" →merchant != 'Amazon'(finds everything except Amazon)
Greater Than and Less Than:
>means "greater than" →amount > 100(finds transactions over $100)<means "less than" →amount < 100(finds transactions under $100)>=means "greater than or equal" →amount >= 100(finds $100 and above)<=means "less than or equal" →amount <= 100(finds $100 and below)
Real examples:
-- Find expensive transactions
SELECT * FROM transactions WHERE amount > 1000;
-- Find cheap transactions
SELECT * FROM transactions WHERE amount < 10;
-- Find everything except Amazon
SELECT * FROM transactions WHERE merchant != 'Amazon';Combining Conditions (AND & OR)
AND (Both Things Must Be True)
In regular English: "Show me transactions over $500 AND from Best Buy"
In SQL:
SELECT * FROM transactions
WHERE amount > 500
AND merchant = 'Best Buy';Think of AND like this: Both conditions must be true, like needing both a driver's license AND insurance to drive.
OR (Either Thing Can Be True)
In regular English: "Show me transactions from Amazon OR Best Buy"
In SQL:
SELECT * FROM transactions
WHERE merchant = 'Amazon'
OR merchant = 'Best Buy';Think of OR like this: Either condition works, like accepting either cash OR credit card.
Using Parentheses for Complex Logic
In regular English: "Show me transactions from customer 12345 where they spent over $1000 OR bought something from a casino"
In SQL:
SELECT * FROM transactions
WHERE customer_id = 12345
AND (amount > 1000 OR merchant LIKE '%Casino%');Why parentheses matter:
-- Wrong (without parentheses) - this finds ALL casino transactions from anyone!
SELECT * FROM transactions
WHERE customer_id = 12345
AND amount > 1000
OR merchant LIKE '%Casino%';
-- Right (with parentheses) - this finds only customer 12345's big purchases OR casino visits
SELECT * FROM transactions
WHERE customer_id = 12345
AND (amount > 1000 OR merchant LIKE '%Casino%');Think of parentheses like math - they control what gets grouped together first.
Pattern Matching with LIKE (Finding Similar Things)
What Is LIKE?
LIKE is for finding things that are similar but not exactly the same.
Wildcards (Special Characters)
%= "any letters/numbers here" (like a joker in cards)_= "exactly one letter/number here"
Examples:
-- Find anything with "Shell" in the name
SELECT * FROM transactions WHERE merchant LIKE '%Shell%';
-- This finds: "Shell Gas", "Shell Station", "Bob's Shell Stop"
-- Find websites (things ending in .com)
SELECT * FROM transactions WHERE merchant LIKE '%.com%';
-- This finds: "Amazon.com", "PayPal.com", etc.
-- Find anything starting with "ATM"
SELECT * FROM transactions WHERE merchant LIKE 'ATM%';
-- This finds: "ATM Withdrawal", "ATM Cash", etc.Working with Dates
Date Functions
DuckDB (and PostgreSQL) date functions:
Get today's date:
SELECT CURRENT_DATE;Find transactions from today:
SELECT * FROM transactions
WHERE CAST(transaction_date AS DATE) = CURRENT_DATE;Find transactions from the last 7 days:
SELECT * FROM transactions
WHERE transaction_date >= CURRENT_DATE - INTERVAL '7 days';Find transactions from a specific date:
SELECT * FROM transactions
WHERE CAST(transaction_date AS DATE) = '2024-01-15';Time-Based Fraud Patterns
Weekend transactions (often suspicious):
SELECT * FROM transactions
WHERE EXTRACT(DOW FROM transaction_date) IN (0, 6);
-- 0 = Sunday, 6 = SaturdayLate-night transactions:
SELECT * FROM transactions
WHERE EXTRACT(HOUR FROM transaction_date) >= 22
OR EXTRACT(HOUR FROM transaction_date) <= 6;
-- After 10 PM or before 6 AMCounting and Adding Things Up
COUNT (How Many?)
In regular English: "How many transactions did customer 12345 make?"
In SQL:
SELECT COUNT(*) FROM transactions WHERE customer_id = 12345;SUM (Add Them Up)
In regular English: "What's the total amount customer 12345 spent?"
In SQL:
SELECT SUM(amount) FROM transactions WHERE customer_id = 12345;AVG (Average)
In regular English: "What's the average transaction amount for customer 12345?"
In SQL:
SELECT AVG(amount) FROM transactions WHERE customer_id = 12345;MIN and MAX (Smallest and Largest)
-- Smallest transaction
SELECT MIN(amount) FROM transactions WHERE customer_id = 12345;
-- Largest transaction
SELECT MAX(amount) FROM transactions WHERE customer_id = 12345;GROUP BY (Organizing Your Results)
What Does GROUP BY Do?
GROUP BY is like sorting your clothes into piles - all the shirts together, all the pants together, etc.
Example: "How many transactions did each customer make?"
SELECT customer_id, COUNT(*)
FROM transactions
GROUP BY customer_id;This creates one row per customer showing their ID and how many transactions they made.
Another example: "How much did each customer spend in total?"
SELECT customer_id, SUM(amount) as total_spent
FROM transactions
GROUP BY customer_id;HAVING (Filtering Groups)
HAVING is like WHERE, but for groups.
Example: "Show me customers who made more than 10 transactions"
SELECT customer_id, COUNT(*) as transaction_count
FROM transactions
GROUP BY customer_id
HAVING COUNT(*) > 10;Think of it this way:
- WHERE filters individual rows (before grouping)
- HAVING filters groups (after grouping)
JOINs (Combining Information from Different Tables)
Why Do We Need JOINs?
Think of it like having information in different notebooks:
- Notebook 1: Customer names and addresses
- Notebook 2: Their purchases and spending
To answer "Who bought what?" you need info from both notebooks. That's what JOIN does!
JOINs Explained with Friends and Phone Numbers
You're planning a party and have two pieces of paper:
Paper 1: Friends and Their Addresses (Table A)
Name | Address
--------|------------------
Alice | 123 Oak Street
Bob | 456 Pine Avenue
Charlie | 789 Elm Drive
Paper 2: Phone Numbers I Have (Table B)
Name | Phone Number
--------|-------------
Alice | 555-1111
Charlie | 555-3333
David | 555-4444
The problem: You want to call your friends to give them directions to the party, but you need BOTH their address (so you know who to invite) AND their phone number (so you can call them).
INNER JOIN = "Only friends I can actually call"
Question: Which friends can you call to invite?
INNER JOIN Result:
Name | Address | Phone Number
--------|------------------|-------------
Alice | 123 Oak Street | 555-1111 ✓
Charlie | 789 Elm Drive | 555-3333 ✓
Missing:
- Bob (in Table A but not in Table B - no phone number)
- David (in Table B but not in Table A - not a friend)
LEFT JOIN = "All my friends, whether I can call them or not"
Question: Show me all my friends and their numbers (if I have them)
LEFT JOIN Result:
Name | Address | Phone Number
--------|------------------|-------------
Alice | 123 Oak Street | 555-1111 ✓
Bob | 456 Pine Avenue | NULL ✓
Charlie | 789 Elm Drive | 555-3333 ✓
The difference: LEFT JOIN includes everyone from your friends table (Table A), even Bob who has no phone number.
The Super Simple Rule
- INNER JOIN = Only matches that exist in BOTH lists
- LEFT JOIN = Everything from the LEFT list + matches from the right list
Now with Real Data
CUSTOMERS Table:
customer_id | Name
------------|------
12345 | John
67890 | Mary
11111 | Bob
TRANSACTIONS Table:
transaction_id | customer_id | Amount
---------------|-------------|--------
1001 | 12345 | $50
1002 | 67890 | $100
1003 | 99999 | $25
INNER JOIN = customers who actually bought something:
INNER JOIN Result:
Name | Amount
-----|-------
John | $50
Mary | $100
LEFT JOIN = all customers, with purchases if they made any:
LEFT JOIN Result:
Name | Amount
-----|-------
John | $50
Mary | $100
Bob | NULL
The SQL Code
INNER JOIN (only customers who bought something):
SELECT name, amount
FROM customers
INNER JOIN transactions ON customers.customer_id = transactions.customer_id;LEFT JOIN (all customers, with purchases if they made any):
SELECT name, amount
FROM customers
LEFT JOIN transactions ON customers.customer_id = transactions.customer_id;That's it! JOINs are just matching lists together.
Visual Summary
INNER JOIN = Only the matches
Friends: [Alice, Bob, Charlie]
Phone #s: [Alice, Charlie, David]
Result: [Alice, Charlie] ← Only people in BOTH lists
LEFT JOIN = All from left list + matches
Friends: [Alice, Bob, Charlie] ← Keep ALL of these
Phone #s: [Alice, Charlie, David]
Result: [Alice+phone, Bob+no phone, Charlie+phone]
Key insight: INNER JOIN is stricter (must be in both), LEFT JOIN is more inclusive (keep everything from the left).
CTEs (Common Table Expressions)
What Is a CTE?
CTE stands for "Common Table Expression" - think of it as creating a temporary, named result set that you can reference within your query. It's like saving a calculation to a variable before using it.
Why use CTEs?
- Readability: Break complex queries into logical, named steps
- Reusability: Reference the same result multiple times without repeating code
- Debugging: Test each part of your query independently
Basic CTE Syntax
WITH cte_name AS (
-- Your query goes here
SELECT column1, column2
FROM some_table
WHERE condition
)
SELECT * FROM cte_name;Key points:
- Start with
WITH - Give your CTE a meaningful name
- Put the subquery in parentheses after
AS - Use the CTE name like a regular table in your main query
Simple Example: High-Value Customers
Without CTE (hard to read):
SELECT customer_id, total_spent
FROM (
SELECT customer_id, SUM(amount) as total_spent
FROM transactions
GROUP BY customer_id
) subquery
WHERE total_spent > 10000;With CTE (much clearer):
WITH customer_totals AS (
SELECT customer_id, SUM(amount) as total_spent
FROM transactions
GROUP BY customer_id
)
SELECT customer_id, total_spent
FROM customer_totals
WHERE total_spent > 10000;The story: First, calculate each customer's total spending and save it as "customer_totals". Then, filter to show only those who spent over $10,000.
Multiple CTEs (Chaining Steps)
You can define multiple CTEs separated by commas:
WITH
-- Step 1: Calculate daily totals per customer
daily_totals AS (
SELECT
customer_id,
CAST(transaction_date AS DATE) as tx_date,
SUM(amount) as daily_total
FROM transactions
GROUP BY customer_id, CAST(transaction_date AS DATE)
),
-- Step 2: Find customers with unusually high daily spending
high_spenders AS (
SELECT customer_id, tx_date, daily_total
FROM daily_totals
WHERE daily_total > 5000
)
-- Step 3: Join with customer info
SELECT c.first_name, c.last_name, h.tx_date, h.daily_total
FROM high_spenders h
JOIN customers c ON h.customer_id = c.customer_id
ORDER BY h.daily_total DESC;Think of it like cooking:
- First, prep the ingredients (calculate daily totals)
- Then, filter to the good stuff (find high spenders)
- Finally, plate it nicely (add customer names)
Fraud Detection with CTEs
Example: Find customers whose spending suddenly spiked
WITH
-- Calculate each customer's average transaction
customer_averages AS (
SELECT
customer_id,
AVG(amount) as avg_amount,
COUNT(*) as transaction_count
FROM transactions
WHERE transaction_date < CURRENT_DATE - INTERVAL '7 days'
GROUP BY customer_id
),
-- Find recent transactions that are 5x their average
suspicious_transactions AS (
SELECT
t.transaction_id,
t.customer_id,
t.amount,
t.merchant,
t.transaction_date,
ca.avg_amount
FROM transactions t
JOIN customer_averages ca ON t.customer_id = ca.customer_id
WHERE t.transaction_date >= CURRENT_DATE - INTERVAL '7 days'
AND t.amount > ca.avg_amount * 5
)
SELECT * FROM suspicious_transactions
ORDER BY amount DESC;What this does:
- Calculate each customer's historical average spending
- Find recent transactions that are 5x higher than their average
- These could indicate stolen cards or account takeover
Recursive CTEs (Advanced)
Recursive CTEs can reference themselves - useful for hierarchical data or sequences. This is an advanced topic, but here's a simple example:
-- Generate a sequence of dates for the last 7 days
WITH RECURSIVE date_series AS (
SELECT CURRENT_DATE as report_date
UNION ALL
SELECT report_date - INTERVAL '1 day'
FROM date_series
WHERE report_date > CURRENT_DATE - INTERVAL '7 days'
)
SELECT report_date FROM date_series;Note: Recursive CTEs require the RECURSIVE keyword in DuckDB (and PostgreSQL).
CTE Best Practices
- Name CTEs clearly - use descriptive names like
high_risk_transactionsnotcte1 - One logical step per CTE - each CTE should do one thing well
- Comment complex CTEs - explain what each step does
- Don't overuse - simple queries don't need CTEs
When to Use CTEs vs Subqueries
| Use CTEs When... | Use Subqueries When... |
|---|---|
| Query has multiple logical steps | Query is simple and short |
| You need to reference the same result multiple times | You only use the result once |
| Readability is important | Performance is critical (sometimes) |
| Debugging complex logic | The subquery is just a filter |
Pro tip: In DuckDB (and PostgreSQL), CTEs are typically optimized well, so choose based on readability. If you ever need to force the database to materialize the CTE (compute it once), DuckDB supports MATERIALIZED:
WITH high_value AS MATERIALIZED (
SELECT * FROM transactions WHERE amount > 1000
)
SELECT * FROM high_value;Practical Fraud Investigation Examples
Example 1: Customer Spending Patterns
Goal: Show me all customers who spent more than $5,000 today with their contact info
SELECT
c.first_name,
c.last_name,
c.email,
SUM(t.amount) as total_spent
FROM customers c
INNER JOIN transactions t ON c.customer_id = t.customer_id
WHERE CAST(t.transaction_date AS DATE) = CURRENT_DATE
GROUP BY c.customer_id, c.first_name, c.last_name, c.email
HAVING SUM(t.amount) > 5000;Step-by-Step Breakdown
Think of this like giving instructions to a detective:
Step 1: Find the right filing cabinets
FROM customers c- Go to the customers filing cabinetINNER JOIN transactions t ON c.customer_id = t.customer_id- Connect it to the transactions cabinet using customer ID as the link
Step 2: Filter to today's cases only
WHERE CAST(t.transaction_date AS DATE) = CURRENT_DATE- Only look at transactions that happened today
Step 3: Group related evidence together
GROUP BY c.customer_id, c.first_name, c.last_name, c.email- Put all of each customer's transactions in separate piles
Step 4: Calculate totals for each pile
SUM(t.amount) as total_spent- Add up all the transaction amounts in each customer's pile
Step 5: Only show the big spenders
HAVING SUM(t.amount) > 5000- Only keep the piles where the total is over $5,000
Step 6: Decide what information to show
SELECT c.first_name, c.last_name, c.email, SUM(t.amount) as total_spent- Show the customer's name, email, and their total spending
The story: Go find all customers and their transactions, but only look at today's transactions. Group each customer's transactions together, add up how much each customer spent, then only show me customers who spent more than $5,000 today, along with their contact info.
Why This Matters for Fraud
High spending in one day could indicate a stolen card being used quickly.
Example 2: Velocity Fraud Detection
Goal: Find customers who made more than 5 transactions in the last hour
SELECT
customer_id,
COUNT(*) as transaction_count,
MIN(transaction_date) as first_transaction,
MAX(transaction_date) as last_transaction
FROM transactions
WHERE transaction_date >= NOW() - INTERVAL '1 hour'
GROUP BY customer_id
HAVING COUNT(*) > 5;Step-by-Step Breakdown
Think of this like tracking suspicious activity:
Step 1: Find the transaction records
FROM transactions- Go to the transactions table
Step 2: Filter to recent activity
WHERE transaction_date >= NOW() - INTERVAL '1 hour'- Only look at transactions from the last hourNOW() - INTERVAL '1 hour'means "1 hour ago from right now"
Step 3: Group by customer
GROUP BY customer_id- Put all transactions for each customer in separate piles
Step 4: Count transactions and find time range
COUNT(*) as transaction_count- Count how many transactions in each pileMIN(transaction_date)- Find the earliest transaction timeMAX(transaction_date)- Find the latest transaction time
Step 5: Only show rapid-fire customers
HAVING COUNT(*) > 5- Only keep customers who made more than 5 transactions
Step 6: Show the results
SELECT customer_id, COUNT(*), MIN(transaction_date), MAX(transaction_date)- Display customer ID, count, and time range
The story: Look at the last hour of transactions, group them by customer, count how many each customer made, and show me only customers who made more than 5 transactions.
Why This Matters for Fraud
Multiple rapid transactions could indicate card testing or account takeover.
Example 3: Geographic Impossibility
Goal: Find transactions that happened at different merchants within 2 hours (potentially impossible travel)
SELECT
t1.customer_id,
t1.merchant as first_location,
t1.transaction_date as first_time,
t2.merchant as second_location,
t2.transaction_date as second_time
FROM transactions t1
INNER JOIN transactions t2 ON t1.customer_id = t2.customer_id
WHERE t1.merchant != t2.merchant
AND t2.transaction_date > t1.transaction_date
AND EXTRACT(EPOCH FROM (t2.transaction_date - t1.transaction_date)) / 3600 < 2;Step-by-Step Breakdown
Think of this like comparing locations and times:
Step 1: Set up the comparison
FROM transactions t1- Start with the transactions table (call it "t1" for "first transaction")INNER JOIN transactions t2 ON t1.customer_id = t2.customer_id- Join it to itself (call it "t2" for "second transaction")- This lets us compare each customer's transactions to their other transactions
Step 2: Filter for suspicious patterns
WHERE t1.merchant != t2.merchant- The two transactions must be at different placesAND t2.transaction_date > t1.transaction_date- The second transaction happened after the firstAND EXTRACT(EPOCH FROM ...) / 3600 < 2- They happened less than 2 hours apart
Step 3: Show the evidence
SELECT t1.customer_id, t1.merchant, t1.transaction_date, t2.merchant, t2.transaction_date- Show customer ID, first location and time, second location and time
The story: Compare every transaction to every other transaction for the same customer. Show me cases where someone was at two different places within 2 hours.
The Date Math Explained
EXTRACT(EPOCH FROM ...)converts time intervals to seconds- Subtract timestamps to get the interval
- Divide by 3600 to convert seconds to hours
< 2means less than 2 hours
Real-world example: If someone bought gas in Miami at 2:00 PM and coffee in Seattle at 3:30 PM, that's physically impossible!
Why This Matters for Fraud
If someone bought coffee in New York at 2 PM and gas in California at 3 PM, that's physically impossible!
Note: This query is quite advanced - don't worry if it seems complicated. The important thing is understanding the concept.
Finding Round Dollar Amounts (Often Fraud)
SELECT * FROM transactions
WHERE amount IN (100, 200, 500, 1000, 2000, 5000);Why this works:
- Fraudsters often test cards with round numbers like $100, $500
- Real purchases are usually messy amounts like $47.83, $123.45
IN (100, 200, 500...)means "show me transactions with ANY of these exact amounts"
What to look for: Multiple customers with the same round amounts on the same day.
Finding Card Testing Patterns
-- Small test transactions followed by large ones
SELECT
customer_id,
COUNT(*) as small_transactions,
MAX(amount) as largest_amount
FROM transactions
WHERE CAST(transaction_date AS DATE) = CURRENT_DATE
GROUP BY customer_id
HAVING COUNT(CASE WHEN amount < 10 THEN 1 END) > 0
AND MAX(amount) > 500;What this does: Finds customers who made both small AND large purchases today
Step 1: COUNT(CASE WHEN amount < 10 THEN 1 END) > 0 - They made at least one purchase under $10
CASE WHEN amount < 10 THEN 1 ENDcounts only transactions under $10> 0means "at least one small transaction"
Step 2: MAX(amount) > 500 - Their biggest purchase was over $500
Why this is suspicious: Fraudsters often test stolen cards with small amounts first, then make big purchases if the card works.
Real-world example: Someone makes three $1 purchases, then immediately buys a $2,000 laptop.
Finding Money Transfer Services
SELECT * FROM transactions
WHERE merchant LIKE '%Western Union%'
OR merchant LIKE '%MoneyGram%'
OR merchant LIKE '%Money Transfer%';What this does:
LIKE '%Western Union%'finds any merchant name containing "Western Union"- The
%symbols mean "any text before or after" ORmeans it will find transactions at ANY of these types of places
Why money transfers are risky:
- Hard to trace once money is sent
- Common way for fraudsters to move stolen money
- Often used in romance scams and business email compromise
Real-world example: "Western Union #1234", "Local MoneyGram Station", "Quick Money Transfer LLC"
Common Beginner Mistakes (And How to Avoid Them)
1. Forgetting Quotes Around Text
Wrong:
SELECT * FROM transactions WHERE merchant = Amazon;Right:
SELECT * FROM transactions WHERE merchant = 'Amazon';Remember: Numbers don't need quotes, text does!
2. Mixing Up WHERE and HAVING
Use WHERE for filtering individual rows:
SELECT * FROM transactions WHERE amount > 100;Use HAVING for filtering groups:
SELECT customer_id, COUNT(*)
FROM transactions
GROUP BY customer_id
HAVING COUNT(*) > 5;3. Forgetting the Semicolon
Every SQL statement ends with a semicolon (;)
SELECT * FROM transactions;4. Not Understanding JOIN Requirements
Remember: You need an ON clause to tell SQL how to match tables:
-- Wrong (missing ON clause)
SELECT * FROM customers INNER JOIN transactions;
-- Right
SELECT * FROM customers
INNER JOIN transactions ON customers.customer_id = transactions.customer_id;Practice Exercises (Start Simple!)
Exercise 1: Basic Filtering
Find all transactions over $1,000.
Click to see solution
SELECT * FROM transactions WHERE amount > 1000;Exercise 2: Pattern Matching
Find all transactions at gas stations (merchants containing "Shell", "Exxon", or "BP").
Click to see solution
SELECT * FROM transactions
WHERE merchant LIKE '%Shell%'
OR merchant LIKE '%Exxon%'
OR merchant LIKE '%BP%';Exercise 3: Counting
How many transactions happened today?
Click to see solution
SELECT COUNT(*) FROM transactions
WHERE CAST(transaction_date AS DATE) = CURRENT_DATE;Exercise 4: Grouping
Show how much each customer spent in total, but only customers who spent more than $1,000.
Click to see solution
SELECT customer_id, SUM(amount) as total_spent
FROM transactions
GROUP BY customer_id
HAVING SUM(amount) > 1000;Building Your Confidence
Start Small
- Practice SELECT and WHERE first - get comfortable with basic filtering
- Add one new concept at a time - don't try to learn JOINs and GROUP BY simultaneously
- Use real examples - practice with actual fraud scenarios from your work
- Make mistakes - they're how you learn! SQL won't break if you write bad queries
Common Workflow for Fraud Investigation
Step-by-step process:
- Start with basic SELECT to preview the dataset and confirm column names.
- Add WHERE filters to isolate customers, date ranges, or high-risk amounts.
- Aggregate with GROUP BY to count events or sum exposure by customer, device, or merchant.
- JOIN related tables (e.g.,
customers,devices) to enrich with demographic or device intel. - Apply HAVING to filter aggregated results (e.g.,
HAVING COUNT(*) > 5). - ORDER BY and LIMIT to sort by risk score or amount and keep result sets manageable.
ANSI SQL Cheat Sheet
Click to expand 🌐
| Clause | Purpose | Minimal Example |
|---|---|---|
SELECT | Choose columns | SELECT name, amount |
FROM | Choose table | FROM transactions |
WHERE | Filter rows | WHERE amount > 100 |
GROUP BY | Aggregate rows | GROUP BY merchant |
HAVING | Filter aggregates | HAVING COUNT(*) > 10 |
ORDER BY | Sort output | ORDER BY amount DESC |
LIMIT | Cap rows returned | LIMIT 100 |
JOIN | Combine tables | JOIN customers USING(customer_id) |
WITH (CTE) | Named subquery | WITH cte AS (SELECT...) SELECT * FROM cte |
INSERT | Add rows | INSERT INTO table(col) VALUES(val); |
UPDATE | Modify rows | UPDATE table SET col=val WHERE id=1; |
DELETE | Remove rows | DELETE FROM table WHERE id=1; |
Test Your Knowledge
Ready to test what you've learned? Take the quiz to reinforce your understanding.
Continue learning
- Fraud BasicsFraud 101: What Is Fraud?Absolute basics for someone who has never looked at fraud: what is fraud, how is it different from other crimes, and why does it matter
- Fraud BasicsCommon Fraud Types Every Analyst Should KnowThe most frequent fraud types you will encounter as a fraud analyst: identity theft, payment fraud, account takeover, and business fraud
- Fraud BasicsIntro to Criminal InfrastructureUnderstanding the underground fraud economy: dark web markets, criminal tools, and how fraud operations are organized
- More from Money Movement & Transaction FraudPayment Systems 101: How Money Really MovesEssential foundation for understanding how ACH, wire transfers, card payments, and digital payments actually work - and why criminals target them
- More from Account TakeoverATO FundamentalsEssential foundation every fraud professional needs to know about account takeover attacks
- More from Social EngineeringSocial Engineering FundamentalsThe psychology of manipulation and how attackers exploit human trust