Skip to main content
Learning Center
Fraud BasicsSQL Crash Course for Fraud Analysts

SQL Crash Course for Fraud Analysts

Essential SQL skills for investigating fraud cases: learn to query transaction data, analyze patterns, and gather evidence

By Benjamin, Fraud Attacks · Updated

SQL is how you ask questions from data. Keep it simple: select rows, filter them, sort them, and count what matters. The examples below are clear and copy-paste friendly.

Why SQL matters: SQL is the core tool for fraud analysts. When you're investigating a case, you need to pull transaction histories, find patterns across accounts, and answer questions like "how many times has this device been used?" or "what other accounts share this email domain?" SQL lets you ask those questions directly. It's also the foundation for data science work, so the skills you learn here transfer far beyond fraud investigation.

A note on SQL dialects: This tutorial uses PostgreSQL / DuckDB syntax. The exercises in this academy run on DuckDB-WASM in your browser, and DuckDB closely tracks PostgreSQL for the syntax we cover here (date functions, EXTRACT, CTEs, window functions). The core concepts (SELECT, WHERE, JOIN, GROUP BY) work the same across all major databases, so the skills you learn here transfer directly to any SQL environment.

Table of Contents

  1. What Is SQL?
  2. Database Basics
  3. Your First SQL Query
  4. Comparison Operators
  5. Combining Conditions
  6. Pattern Matching
  7. Working with Dates
  8. Counting & Summing
  9. GROUP BY
  10. JOINs
  11. CTEs (Common Table Expressions)
  12. ANSI SQL Cheat Sheet

What Is SQL? (In Plain English)

SQL stands for "Structured Query Language" - but don't let that scare you! Think of it as a way to ask questions about data stored in a computer.

Imagine you have a giant filing cabinet with millions of customer records, transaction receipts, and account information. SQL is like having a super-smart assistant who can instantly find exactly what you're looking for.


Database Basics: Think of It Like Excel (But Way Bigger)

What's a Database?

A database is like a collection of Excel spreadsheets, but designed to handle millions of rows without crashing your computer.

What's a Table?

A table is just like one Excel spreadsheet. It has:

  • Rows = individual records (like one customer or one transaction)
  • Columns = different pieces of information (like name, amount, date)

Example: A "transactions" table might look like this:

transaction_id | customer_id | amount | merchant  | date      
1001          | 12345       | 50.00  | Amazon    | 2024-01-15
1002          | 12345       | 500.00 | Best Buy  | 2024-01-15  
1003          | 67890       | 25.00  | Starbucks | 2024-01-15

Think of each row as a receipt - it tells you who spent money, how much, where, and when.


Your First SQL Query: Asking Simple Questions

The SELECT Statement (Like Asking "Show Me...")

Basic query:

SELECT * FROM transactions;
  • * = "everything" (all columns)
  • FROM transactions = "from the transactions table"
  • ; = end of query

Specific columns:

SELECT amount, merchant FROM transactions;

The WHERE Clause (Like Adding "But Only...")

Filter by amount:

SELECT * FROM transactions WHERE amount > 100;

Filter by merchant:

SELECT * FROM transactions WHERE merchant = 'Amazon';

Note: Use single quotes around text values like 'Amazon'.


Comparison Operators (Ways to Compare Things)

Think of these like math class, but for data:

Equals and Not Equals:

  • = means "equals exactly" → amount = 100 (finds transactions of exactly $100)
  • != means "not equal to" → merchant != 'Amazon' (finds everything except Amazon)

Greater Than and Less Than:

  • > means "greater than" → amount > 100 (finds transactions over $100)
  • < means "less than" → amount < 100 (finds transactions under $100)
  • >= means "greater than or equal" → amount >= 100 (finds $100 and above)
  • <= means "less than or equal" → amount <= 100 (finds $100 and below)

Real examples:

-- Find expensive transactions SELECT * FROM transactions WHERE amount > 1000; -- Find cheap transactions SELECT * FROM transactions WHERE amount < 10; -- Find everything except Amazon SELECT * FROM transactions WHERE merchant != 'Amazon';

Combining Conditions (AND & OR)

AND (Both Things Must Be True)

In regular English: "Show me transactions over $500 AND from Best Buy"

In SQL:

SELECT * FROM transactions WHERE amount > 500 AND merchant = 'Best Buy';

Think of AND like this: Both conditions must be true, like needing both a driver's license AND insurance to drive.

OR (Either Thing Can Be True)

In regular English: "Show me transactions from Amazon OR Best Buy"

In SQL:

SELECT * FROM transactions WHERE merchant = 'Amazon' OR merchant = 'Best Buy';

Think of OR like this: Either condition works, like accepting either cash OR credit card.

Using Parentheses for Complex Logic

In regular English: "Show me transactions from customer 12345 where they spent over $1000 OR bought something from a casino"

In SQL:

SELECT * FROM transactions WHERE customer_id = 12345 AND (amount > 1000 OR merchant LIKE '%Casino%');

Why parentheses matter:

-- Wrong (without parentheses) - this finds ALL casino transactions from anyone! SELECT * FROM transactions WHERE customer_id = 12345 AND amount > 1000 OR merchant LIKE '%Casino%'; -- Right (with parentheses) - this finds only customer 12345's big purchases OR casino visits SELECT * FROM transactions WHERE customer_id = 12345 AND (amount > 1000 OR merchant LIKE '%Casino%');

Think of parentheses like math - they control what gets grouped together first.


Pattern Matching with LIKE (Finding Similar Things)

What Is LIKE?

LIKE is for finding things that are similar but not exactly the same.

Wildcards (Special Characters)

  • % = "any letters/numbers here" (like a joker in cards)
  • _ = "exactly one letter/number here"

Examples:

-- Find anything with "Shell" in the name SELECT * FROM transactions WHERE merchant LIKE '%Shell%'; -- This finds: "Shell Gas", "Shell Station", "Bob's Shell Stop" -- Find websites (things ending in .com) SELECT * FROM transactions WHERE merchant LIKE '%.com%'; -- This finds: "Amazon.com", "PayPal.com", etc. -- Find anything starting with "ATM" SELECT * FROM transactions WHERE merchant LIKE 'ATM%'; -- This finds: "ATM Withdrawal", "ATM Cash", etc.

Working with Dates

Date Functions

DuckDB (and PostgreSQL) date functions:

Get today's date:

SELECT CURRENT_DATE;

Find transactions from today:

SELECT * FROM transactions WHERE CAST(transaction_date AS DATE) = CURRENT_DATE;

Find transactions from the last 7 days:

SELECT * FROM transactions WHERE transaction_date >= CURRENT_DATE - INTERVAL '7 days';

Find transactions from a specific date:

SELECT * FROM transactions WHERE CAST(transaction_date AS DATE) = '2024-01-15';

Time-Based Fraud Patterns

Weekend transactions (often suspicious):

SELECT * FROM transactions WHERE EXTRACT(DOW FROM transaction_date) IN (0, 6); -- 0 = Sunday, 6 = Saturday

Late-night transactions:

SELECT * FROM transactions WHERE EXTRACT(HOUR FROM transaction_date) >= 22 OR EXTRACT(HOUR FROM transaction_date) <= 6; -- After 10 PM or before 6 AM

Counting and Adding Things Up

COUNT (How Many?)

In regular English: "How many transactions did customer 12345 make?"

In SQL:

SELECT COUNT(*) FROM transactions WHERE customer_id = 12345;

SUM (Add Them Up)

In regular English: "What's the total amount customer 12345 spent?"

In SQL:

SELECT SUM(amount) FROM transactions WHERE customer_id = 12345;

AVG (Average)

In regular English: "What's the average transaction amount for customer 12345?"

In SQL:

SELECT AVG(amount) FROM transactions WHERE customer_id = 12345;

MIN and MAX (Smallest and Largest)

-- Smallest transaction SELECT MIN(amount) FROM transactions WHERE customer_id = 12345; -- Largest transaction SELECT MAX(amount) FROM transactions WHERE customer_id = 12345;

GROUP BY (Organizing Your Results)

What Does GROUP BY Do?

GROUP BY is like sorting your clothes into piles - all the shirts together, all the pants together, etc.

Example: "How many transactions did each customer make?"

SELECT customer_id, COUNT(*) FROM transactions GROUP BY customer_id;

This creates one row per customer showing their ID and how many transactions they made.

Another example: "How much did each customer spend in total?"

SELECT customer_id, SUM(amount) as total_spent FROM transactions GROUP BY customer_id;

HAVING (Filtering Groups)

HAVING is like WHERE, but for groups.

Example: "Show me customers who made more than 10 transactions"

SELECT customer_id, COUNT(*) as transaction_count FROM transactions GROUP BY customer_id HAVING COUNT(*) > 10;

Think of it this way:

  • WHERE filters individual rows (before grouping)
  • HAVING filters groups (after grouping)

JOINs (Combining Information from Different Tables)

Why Do We Need JOINs?

Think of it like having information in different notebooks:

  • Notebook 1: Customer names and addresses
  • Notebook 2: Their purchases and spending

To answer "Who bought what?" you need info from both notebooks. That's what JOIN does!

JOINs Explained with Friends and Phone Numbers

You're planning a party and have two pieces of paper:

Paper 1: Friends and Their Addresses (Table A)

Name    | Address
--------|------------------
Alice   | 123 Oak Street
Bob     | 456 Pine Avenue  
Charlie | 789 Elm Drive

Paper 2: Phone Numbers I Have (Table B)

Name    | Phone Number
--------|-------------
Alice   | 555-1111
Charlie | 555-3333
David   | 555-4444

The problem: You want to call your friends to give them directions to the party, but you need BOTH their address (so you know who to invite) AND their phone number (so you can call them).

INNER JOIN = "Only friends I can actually call"

Question: Which friends can you call to invite?

INNER JOIN Result:

Name    | Address          | Phone Number
--------|------------------|-------------
Alice   | 123 Oak Street   | 555-1111  ✓ 
Charlie | 789 Elm Drive    | 555-3333  ✓ 

Missing:

  • Bob (in Table A but not in Table B - no phone number)
  • David (in Table B but not in Table A - not a friend)

LEFT JOIN = "All my friends, whether I can call them or not"

Question: Show me all my friends and their numbers (if I have them)

LEFT JOIN Result:

Name    | Address          | Phone Number
--------|------------------|-------------
Alice   | 123 Oak Street   | 555-1111  ✓ 
Bob     | 456 Pine Avenue  | NULL       ✓ 
Charlie | 789 Elm Drive    | 555-3333  ✓ 

The difference: LEFT JOIN includes everyone from your friends table (Table A), even Bob who has no phone number.

The Super Simple Rule

  • INNER JOIN = Only matches that exist in BOTH lists
  • LEFT JOIN = Everything from the LEFT list + matches from the right list

Now with Real Data

CUSTOMERS Table:

customer_id | Name
------------|------
12345       | John
67890       | Mary
11111       | Bob

TRANSACTIONS Table:

transaction_id | customer_id | Amount
---------------|-------------|--------
1001          | 12345       | $50
1002          | 67890       | $100
1003          | 99999       | $25

INNER JOIN = customers who actually bought something:

INNER JOIN Result:

Name | Amount
-----|-------
John | $50
Mary | $100

LEFT JOIN = all customers, with purchases if they made any:

LEFT JOIN Result:

Name | Amount
-----|-------
John | $50
Mary | $100  
Bob  | NULL

The SQL Code

INNER JOIN (only customers who bought something):

SELECT name, amount FROM customers INNER JOIN transactions ON customers.customer_id = transactions.customer_id;

LEFT JOIN (all customers, with purchases if they made any):

SELECT name, amount FROM customers LEFT JOIN transactions ON customers.customer_id = transactions.customer_id;

That's it! JOINs are just matching lists together.

Visual Summary

INNER JOIN = Only the matches
Friends:     [Alice, Bob, Charlie]
Phone #s:    [Alice, Charlie, David]
Result:      [Alice, Charlie]  ← Only people in BOTH lists

LEFT JOIN = All from left list + matches
Friends:     [Alice, Bob, Charlie]  ← Keep ALL of these
Phone #s:    [Alice, Charlie, David]
Result:      [Alice+phone, Bob+no phone, Charlie+phone]

Key insight: INNER JOIN is stricter (must be in both), LEFT JOIN is more inclusive (keep everything from the left).


CTEs (Common Table Expressions)

What Is a CTE?

CTE stands for "Common Table Expression" - think of it as creating a temporary, named result set that you can reference within your query. It's like saving a calculation to a variable before using it.

Why use CTEs?

  • Readability: Break complex queries into logical, named steps
  • Reusability: Reference the same result multiple times without repeating code
  • Debugging: Test each part of your query independently

Basic CTE Syntax

WITH cte_name AS ( -- Your query goes here SELECT column1, column2 FROM some_table WHERE condition ) SELECT * FROM cte_name;

Key points:

  • Start with WITH
  • Give your CTE a meaningful name
  • Put the subquery in parentheses after AS
  • Use the CTE name like a regular table in your main query

Simple Example: High-Value Customers

Without CTE (hard to read):

SELECT customer_id, total_spent FROM ( SELECT customer_id, SUM(amount) as total_spent FROM transactions GROUP BY customer_id ) subquery WHERE total_spent > 10000;

With CTE (much clearer):

WITH customer_totals AS ( SELECT customer_id, SUM(amount) as total_spent FROM transactions GROUP BY customer_id ) SELECT customer_id, total_spent FROM customer_totals WHERE total_spent > 10000;

The story: First, calculate each customer's total spending and save it as "customer_totals". Then, filter to show only those who spent over $10,000.

Multiple CTEs (Chaining Steps)

You can define multiple CTEs separated by commas:

WITH -- Step 1: Calculate daily totals per customer daily_totals AS ( SELECT customer_id, CAST(transaction_date AS DATE) as tx_date, SUM(amount) as daily_total FROM transactions GROUP BY customer_id, CAST(transaction_date AS DATE) ), -- Step 2: Find customers with unusually high daily spending high_spenders AS ( SELECT customer_id, tx_date, daily_total FROM daily_totals WHERE daily_total > 5000 ) -- Step 3: Join with customer info SELECT c.first_name, c.last_name, h.tx_date, h.daily_total FROM high_spenders h JOIN customers c ON h.customer_id = c.customer_id ORDER BY h.daily_total DESC;

Think of it like cooking:

  1. First, prep the ingredients (calculate daily totals)
  2. Then, filter to the good stuff (find high spenders)
  3. Finally, plate it nicely (add customer names)

Fraud Detection with CTEs

Example: Find customers whose spending suddenly spiked

WITH -- Calculate each customer's average transaction customer_averages AS ( SELECT customer_id, AVG(amount) as avg_amount, COUNT(*) as transaction_count FROM transactions WHERE transaction_date < CURRENT_DATE - INTERVAL '7 days' GROUP BY customer_id ), -- Find recent transactions that are 5x their average suspicious_transactions AS ( SELECT t.transaction_id, t.customer_id, t.amount, t.merchant, t.transaction_date, ca.avg_amount FROM transactions t JOIN customer_averages ca ON t.customer_id = ca.customer_id WHERE t.transaction_date >= CURRENT_DATE - INTERVAL '7 days' AND t.amount > ca.avg_amount * 5 ) SELECT * FROM suspicious_transactions ORDER BY amount DESC;

What this does:

  1. Calculate each customer's historical average spending
  2. Find recent transactions that are 5x higher than their average
  3. These could indicate stolen cards or account takeover

Recursive CTEs (Advanced)

Recursive CTEs can reference themselves - useful for hierarchical data or sequences. This is an advanced topic, but here's a simple example:

-- Generate a sequence of dates for the last 7 days WITH RECURSIVE date_series AS ( SELECT CURRENT_DATE as report_date UNION ALL SELECT report_date - INTERVAL '1 day' FROM date_series WHERE report_date > CURRENT_DATE - INTERVAL '7 days' ) SELECT report_date FROM date_series;

Note: Recursive CTEs require the RECURSIVE keyword in DuckDB (and PostgreSQL).

CTE Best Practices

  1. Name CTEs clearly - use descriptive names like high_risk_transactions not cte1
  2. One logical step per CTE - each CTE should do one thing well
  3. Comment complex CTEs - explain what each step does
  4. Don't overuse - simple queries don't need CTEs

When to Use CTEs vs Subqueries

Use CTEs When...Use Subqueries When...
Query has multiple logical stepsQuery is simple and short
You need to reference the same result multiple timesYou only use the result once
Readability is importantPerformance is critical (sometimes)
Debugging complex logicThe subquery is just a filter

Pro tip: In DuckDB (and PostgreSQL), CTEs are typically optimized well, so choose based on readability. If you ever need to force the database to materialize the CTE (compute it once), DuckDB supports MATERIALIZED:

WITH high_value AS MATERIALIZED ( SELECT * FROM transactions WHERE amount > 1000 ) SELECT * FROM high_value;

Practical Fraud Investigation Examples


Example 1: Customer Spending Patterns

Goal: Show me all customers who spent more than $5,000 today with their contact info

SELECT c.first_name, c.last_name, c.email, SUM(t.amount) as total_spent FROM customers c INNER JOIN transactions t ON c.customer_id = t.customer_id WHERE CAST(t.transaction_date AS DATE) = CURRENT_DATE GROUP BY c.customer_id, c.first_name, c.last_name, c.email HAVING SUM(t.amount) > 5000;

Step-by-Step Breakdown

Think of this like giving instructions to a detective:

Step 1: Find the right filing cabinets

  • FROM customers c - Go to the customers filing cabinet
  • INNER JOIN transactions t ON c.customer_id = t.customer_id - Connect it to the transactions cabinet using customer ID as the link

Step 2: Filter to today's cases only

  • WHERE CAST(t.transaction_date AS DATE) = CURRENT_DATE - Only look at transactions that happened today

Step 3: Group related evidence together

  • GROUP BY c.customer_id, c.first_name, c.last_name, c.email - Put all of each customer's transactions in separate piles

Step 4: Calculate totals for each pile

  • SUM(t.amount) as total_spent - Add up all the transaction amounts in each customer's pile

Step 5: Only show the big spenders

  • HAVING SUM(t.amount) > 5000 - Only keep the piles where the total is over $5,000

Step 6: Decide what information to show

  • SELECT c.first_name, c.last_name, c.email, SUM(t.amount) as total_spent - Show the customer's name, email, and their total spending

The story: Go find all customers and their transactions, but only look at today's transactions. Group each customer's transactions together, add up how much each customer spent, then only show me customers who spent more than $5,000 today, along with their contact info.

Why This Matters for Fraud

High spending in one day could indicate a stolen card being used quickly.


Example 2: Velocity Fraud Detection

Goal: Find customers who made more than 5 transactions in the last hour

SELECT customer_id, COUNT(*) as transaction_count, MIN(transaction_date) as first_transaction, MAX(transaction_date) as last_transaction FROM transactions WHERE transaction_date >= NOW() - INTERVAL '1 hour' GROUP BY customer_id HAVING COUNT(*) > 5;

Step-by-Step Breakdown

Think of this like tracking suspicious activity:

Step 1: Find the transaction records

  • FROM transactions - Go to the transactions table

Step 2: Filter to recent activity

  • WHERE transaction_date >= NOW() - INTERVAL '1 hour' - Only look at transactions from the last hour
  • NOW() - INTERVAL '1 hour' means "1 hour ago from right now"

Step 3: Group by customer

  • GROUP BY customer_id - Put all transactions for each customer in separate piles

Step 4: Count transactions and find time range

  • COUNT(*) as transaction_count - Count how many transactions in each pile
  • MIN(transaction_date) - Find the earliest transaction time
  • MAX(transaction_date) - Find the latest transaction time

Step 5: Only show rapid-fire customers

  • HAVING COUNT(*) > 5 - Only keep customers who made more than 5 transactions

Step 6: Show the results

  • SELECT customer_id, COUNT(*), MIN(transaction_date), MAX(transaction_date) - Display customer ID, count, and time range

The story: Look at the last hour of transactions, group them by customer, count how many each customer made, and show me only customers who made more than 5 transactions.

Why This Matters for Fraud

Multiple rapid transactions could indicate card testing or account takeover.


Example 3: Geographic Impossibility

Goal: Find transactions that happened at different merchants within 2 hours (potentially impossible travel)

SELECT t1.customer_id, t1.merchant as first_location, t1.transaction_date as first_time, t2.merchant as second_location, t2.transaction_date as second_time FROM transactions t1 INNER JOIN transactions t2 ON t1.customer_id = t2.customer_id WHERE t1.merchant != t2.merchant AND t2.transaction_date > t1.transaction_date AND EXTRACT(EPOCH FROM (t2.transaction_date - t1.transaction_date)) / 3600 < 2;

Step-by-Step Breakdown

Think of this like comparing locations and times:

Step 1: Set up the comparison

  • FROM transactions t1 - Start with the transactions table (call it "t1" for "first transaction")
  • INNER JOIN transactions t2 ON t1.customer_id = t2.customer_id - Join it to itself (call it "t2" for "second transaction")
  • This lets us compare each customer's transactions to their other transactions

Step 2: Filter for suspicious patterns

  • WHERE t1.merchant != t2.merchant - The two transactions must be at different places
  • AND t2.transaction_date > t1.transaction_date - The second transaction happened after the first
  • AND EXTRACT(EPOCH FROM ...) / 3600 < 2 - They happened less than 2 hours apart

Step 3: Show the evidence

  • SELECT t1.customer_id, t1.merchant, t1.transaction_date, t2.merchant, t2.transaction_date - Show customer ID, first location and time, second location and time

The story: Compare every transaction to every other transaction for the same customer. Show me cases where someone was at two different places within 2 hours.

The Date Math Explained

  • EXTRACT(EPOCH FROM ...) converts time intervals to seconds
  • Subtract timestamps to get the interval
  • Divide by 3600 to convert seconds to hours
  • < 2 means less than 2 hours

Real-world example: If someone bought gas in Miami at 2:00 PM and coffee in Seattle at 3:30 PM, that's physically impossible!

Why This Matters for Fraud

If someone bought coffee in New York at 2 PM and gas in California at 3 PM, that's physically impossible!

Note: This query is quite advanced - don't worry if it seems complicated. The important thing is understanding the concept.


Finding Round Dollar Amounts (Often Fraud)

SELECT * FROM transactions WHERE amount IN (100, 200, 500, 1000, 2000, 5000);

Why this works:

  • Fraudsters often test cards with round numbers like $100, $500
  • Real purchases are usually messy amounts like $47.83, $123.45
  • IN (100, 200, 500...) means "show me transactions with ANY of these exact amounts"

What to look for: Multiple customers with the same round amounts on the same day.

Finding Card Testing Patterns

-- Small test transactions followed by large ones SELECT customer_id, COUNT(*) as small_transactions, MAX(amount) as largest_amount FROM transactions WHERE CAST(transaction_date AS DATE) = CURRENT_DATE GROUP BY customer_id HAVING COUNT(CASE WHEN amount < 10 THEN 1 END) > 0 AND MAX(amount) > 500;

What this does: Finds customers who made both small AND large purchases today

Step 1: COUNT(CASE WHEN amount < 10 THEN 1 END) > 0 - They made at least one purchase under $10

  • CASE WHEN amount < 10 THEN 1 END counts only transactions under $10
  • > 0 means "at least one small transaction"

Step 2: MAX(amount) > 500 - Their biggest purchase was over $500

Why this is suspicious: Fraudsters often test stolen cards with small amounts first, then make big purchases if the card works.

Real-world example: Someone makes three $1 purchases, then immediately buys a $2,000 laptop.

Finding Money Transfer Services

SELECT * FROM transactions WHERE merchant LIKE '%Western Union%' OR merchant LIKE '%MoneyGram%' OR merchant LIKE '%Money Transfer%';

What this does:

  • LIKE '%Western Union%' finds any merchant name containing "Western Union"
  • The % symbols mean "any text before or after"
  • OR means it will find transactions at ANY of these types of places

Why money transfers are risky:

  • Hard to trace once money is sent
  • Common way for fraudsters to move stolen money
  • Often used in romance scams and business email compromise

Real-world example: "Western Union #1234", "Local MoneyGram Station", "Quick Money Transfer LLC"


Common Beginner Mistakes (And How to Avoid Them)

1. Forgetting Quotes Around Text

Wrong:

SELECT * FROM transactions WHERE merchant = Amazon;

Right:

SELECT * FROM transactions WHERE merchant = 'Amazon';

Remember: Numbers don't need quotes, text does!

2. Mixing Up WHERE and HAVING

Use WHERE for filtering individual rows:

SELECT * FROM transactions WHERE amount > 100;

Use HAVING for filtering groups:

SELECT customer_id, COUNT(*) FROM transactions GROUP BY customer_id HAVING COUNT(*) > 5;

3. Forgetting the Semicolon

Every SQL statement ends with a semicolon (;)

SELECT * FROM transactions;

4. Not Understanding JOIN Requirements

Remember: You need an ON clause to tell SQL how to match tables:

-- Wrong (missing ON clause) SELECT * FROM customers INNER JOIN transactions; -- Right SELECT * FROM customers INNER JOIN transactions ON customers.customer_id = transactions.customer_id;

Practice Exercises (Start Simple!)

Exercise 1: Basic Filtering

Find all transactions over $1,000.

Click to see solution
SELECT * FROM transactions WHERE amount > 1000;

Exercise 2: Pattern Matching

Find all transactions at gas stations (merchants containing "Shell", "Exxon", or "BP").

Click to see solution
SELECT * FROM transactions WHERE merchant LIKE '%Shell%' OR merchant LIKE '%Exxon%' OR merchant LIKE '%BP%';

Exercise 3: Counting

How many transactions happened today?

Click to see solution
SELECT COUNT(*) FROM transactions WHERE CAST(transaction_date AS DATE) = CURRENT_DATE;

Exercise 4: Grouping

Show how much each customer spent in total, but only customers who spent more than $1,000.

Click to see solution
SELECT customer_id, SUM(amount) as total_spent FROM transactions GROUP BY customer_id HAVING SUM(amount) > 1000;

Building Your Confidence

Start Small

  1. Practice SELECT and WHERE first - get comfortable with basic filtering
  2. Add one new concept at a time - don't try to learn JOINs and GROUP BY simultaneously
  3. Use real examples - practice with actual fraud scenarios from your work
  4. Make mistakes - they're how you learn! SQL won't break if you write bad queries

Common Workflow for Fraud Investigation

Step-by-step process:

  1. Start with basic SELECT to preview the dataset and confirm column names.
  2. Add WHERE filters to isolate customers, date ranges, or high-risk amounts.
  3. Aggregate with GROUP BY to count events or sum exposure by customer, device, or merchant.
  4. JOIN related tables (e.g., customers, devices) to enrich with demographic or device intel.
  5. Apply HAVING to filter aggregated results (e.g., HAVING COUNT(*) > 5).
  6. ORDER BY and LIMIT to sort by risk score or amount and keep result sets manageable.

ANSI SQL Cheat Sheet

Click to expand 🌐
ClausePurposeMinimal Example
SELECTChoose columnsSELECT name, amount
FROMChoose tableFROM transactions
WHEREFilter rowsWHERE amount > 100
GROUP BYAggregate rowsGROUP BY merchant
HAVINGFilter aggregatesHAVING COUNT(*) > 10
ORDER BYSort outputORDER BY amount DESC
LIMITCap rows returnedLIMIT 100
JOINCombine tablesJOIN customers USING(customer_id)
WITH (CTE)Named subqueryWITH cte AS (SELECT...) SELECT * FROM cte
INSERTAdd rowsINSERT INTO table(col) VALUES(val);
UPDATEModify rowsUPDATE table SET col=val WHERE id=1;
DELETERemove rowsDELETE FROM table WHERE id=1;

Test Your Knowledge

Ready to test what you've learned? Take the quiz to reinforce your understanding.