Deep Dive into AI-Assisted Development (Optional)
2025-10-12
Optional Advanced Material
This document provides in-depth technical knowledge about how LLMs work and advanced techniques for AI-assisted development.
Prerequisites: Complete the LLM Quick Start session and Practice 1-2 first.
When to read: When you’re comfortable with basic AI-assisted development and want to understand the underlying technology and advanced patterns.
At its core, an LLM is a probabilistic sequence model that predicts the next token:
P(token_n | token_1, token_2, ..., token_{n-1})
Example in code generation:
// Input: "function add(a, b) { return a + "
// Most likely next token: "b"
// Less likely: ";", "1", "0"
// Very unlikely: "elephant"
The model assigns probabilities to all possible next tokens and samples from this distribution.
Tokens are the fundamental units that LLMs process:
Examples:
"Hello, world!" → ["Hello", ",", " world", "!"] (4 tokens)
"function calculateTotal()" → ["function", " calculate", "Total", "(", ")"] (5 tokens)
Context Windows are measured in tokens:
Model | Context Window | Approximate Pages |
---|---|---|
GPT-3.5 | 4,096 tokens | ~3 pages |
GPT-4 | 8,192 tokens | ~6 pages |
GPT-4 Turbo | 128,000 tokens | ~96 pages |
Claude 3 | 200,000 tokens | ~150 pages |
Gemini 1.5 Pro | 1,000,000 tokens | ~750 pages |
Practical impact: - Longer context = more information the model can “remember” - Token costs affect API pricing - Token limits affect how much code you can provide as context
LLMs use transformer architecture with self-attention mechanisms that:
Simplified example:
Input: "The cat sat on the mat"
Self-attention learns:
- "cat" relates strongly to "sat" (subject-verb)
- "sat" relates to "mat" (verb-location)
- "on" connects "sat" and "mat" (preposition relationship)
Self-attention allows the model to:
The model learns from massive datasets:
Objective: Predict the next token given previous context
Result: General understanding of language and code patterns
The model is trained on curated datasets:
Objective: Make the model helpful for specific tasks
Result: Better instruction-following and task-specific performance
Reinforcement Learning from Human Feedback:
Objective: Align model behavior with human preferences
Result: Safer, more helpful, more truthful outputs
Model | Provider | Strengths | Context | Best For |
---|---|---|---|---|
GPT-5 | OpenAI | Enhanced reasoning, creativity | 128K | Complex tasks, long context |
GPT-4.1 | OpenAI | Improved performance, efficiency | 128K | General tasks, long context |
Claude 4 | Anthropic | Improved understanding, safety | 200K | Conversational AI, long context |
Gemini 1.5 Pro | Massive context, multimodal | 1M | Document analysis, video | |
Codex | OpenAI | Code-specialized | 8K | Code generation, completion |
LLama 3 | Meta | Open-source, versatile | 32K | Research, customization |
Code Llama | Meta | Open-source, code-focused | 16K | Local deployment, privacy |
GitHub Copilot uses:
The Problem
LLMs sometimes “jump to conclusions” without showing reasoning steps.
Example:
Prompt: "Write a function to validate an email address"
Output: [generates regex without explanation]
Prompt with CoT:
Write a function to validate an email address.
Think step-by-step:
1. What are the rules for valid email addresses?
2. What edge cases should we handle?
3. What's the best approach (regex, parsing, library)?
4. How do we test it?
Then provide the implementation with comments explaining each part.
Result: More thoughtful, better-documented code with reasoning
Provide 2-3 examples of the pattern you want:
Example: Generating test cases
Given this function, generate test cases:
Example 1:
Function: function add(a, b) { return a + b; }
Tests:
- add(2, 3) should return 5
- add(-1, 1) should return 0
- add(0, 0) should return 0
Example 2:
Function: function isEven(n) { return n % 2 === 0; }
Tests:
- isEven(2) should return true
- isEven(3) should return false
- isEven(0) should return true
Now generate tests for:
Function: function findMax(arr) { return Math.max(...arr); }
Result: The model follows the established pattern
Instead of one massive prompt, chain multiple prompts:
Task: Create a full authentication system
Chain:
Benefits:
Basic Prompt:
"Explain React hooks"
Role-Enhanced Prompt:
"You are an experienced React developer teaching L3 university students.
Explain React hooks in a way that builds on their existing JavaScript knowledge.
Use practical examples and avoid jargon."
Result: More appropriate tone, level, and examples
System Prompt (set by the application): - Defines overall behavior and constraints - Usually hidden from the user - Examples: “You are GitHub Copilot, a coding assistant…”
User Prompt (your input): - Specific request or context - What you type in the chat or editor
Some tools let you set custom system prompts:
Example for a project:
System Prompt:
"You are a full-stack developer working on an e-commerce review aggregator.
The stack is: React, Node.js, Express, MySQL, Tailwind CSS.
Follow these conventions:
- Use async/await, not callbacks
- Use functional React components with hooks
- Follow REST API best practices
- Write accessible HTML with ARIA labels
- Use parameterized SQL queries to prevent injection"
User Prompt:
"Create a review card component"
Result: AI knows your project context and conventions automatically
LLMs have knowledge cutoffs and can’t access:
How it works:
Example workflow:
User asks: "How do we handle authentication in our app?"
System:
1. Searches codebase for "authentication" files
2. Retrieves: auth.js, login.js, middleware/auth.js
3. Builds prompt: "Given this code: [paste files], answer: How do we handle authentication?"
4. LLM generates answer based on YOUR actual code
What Are Embeddings?
Embeddings convert text/code into vectors (arrays of numbers) that capture semantic meaning:
"authentication" → [0.23, -0.45, 0.67, ...] (1536 dimensions)
"login system" → [0.21, -0.43, 0.69, ...] (similar vector!)
"banana recipe" → [0.89, 0.12, -0.34, ...] (very different vector)
You can find semantically similar code even with different wording:
Search query: “error handling” Matches: - “try-catch blocks” - “exception management” - “graceful failure recovery”
Even if exact words don’t appear!
When asking AI about your code:
Low temperature (0.0 - 0.3): Deterministic, focused
Prompt: "Complete: function add(a, b) {"
Temperature 0.1: "return a + b; }" (always)
Medium temperature (0.5 - 0.7): Balanced
Temperature 0.5:
- "return a + b; }" (common)
- "return Number(a) + Number(b); }" (occasionally)
- "const sum = a + b; return sum; }" (rarely)
High temperature (0.8 - 1.0): Creative, diverse
Temperature 0.9:
- "return a + b; }"
- "return [a, b].reduce((x, y) => x + y);"
- "if (typeof a !== 'number') throw new Error()..."
- Etc. (very diverse outputs)
Temperature | Use Case | Example |
---|---|---|
0.0 - 0.2 | Code completion, precise tasks | “Convert this SQL to a Sequelize query” |
0.3 - 0.5 | General coding, explanations | “Explain this algorithm” |
0.6 - 0.8 | Creative solutions, brainstorming | “Suggest 5 ways to improve this UI” |
0.9 - 1.0 | Highly creative tasks | “Generate unique product names” |
Default for coding: Usually 0.2 - 0.4
How It Works : Instead of considering ALL possible tokens, consider only the top p% probability mass:
Top-p = 0.9 means:
- Rank all tokens by probability
- Consider only tokens that sum to 90% probability
- Sample from this subset
Result: Excludes very unlikely tokens while allowing some variety
Temperature: Changes the probability distribution shape Top-p: Truncates the distribution
Best practice: Adjust one OR the other, not both aggressively
Step 1: Functionality Review
Prompt:
"Review this code for correctness and edge cases:
[paste code]
Check for:
- Logic errors
- Edge cases (null, empty, invalid input)
- Off-by-one errors
- Race conditions (if async)"
Step 2: Security Review
Prompt:
"Security audit of this code:
[paste code]
Check for:
- SQL injection vulnerabilities
- XSS vulnerabilities
- Authentication/authorization issues
- Sensitive data exposure
- Input validation gaps"
Step 3: Performance Review
Prompt:
"Analyze performance of this code:
[paste code]
Identify:
- Big O complexity issues
- Unnecessary loops or operations
- Memory leaks (especially React/DOM)
- Database query optimization opportunities"
Don’t ask for: “Refactor this entire file” Do ask for: Specific, testable improvements
Example progression:
Step 1: Extract magic numbers
Prompt: "Replace magic numbers with named constants"
Before: if (age > 18)
After: const LEGAL_AGE = 18; if (age > LEGAL_AGE)
Step 2: Extract functions
Prompt: "Extract this block into a well-named function"
Step 3: Improve naming
Prompt: "Suggest more descriptive variable names"
Step 4: Add error handling
Prompt: "Add proper error handling with try-catch"
Each step is small, testable, and safe!
Technique: Test-Driven Prompt Design
Prompt:
"I'm writing a function to [describe functionality].
1. First, generate a comprehensive list of test cases covering:
- Happy path scenarios
- Edge cases
- Error conditions
- Boundary values
2. Then write the test code using Jest
3. Finally, implement the function to pass all tests
Function signature: [provide signature]"
Result: Tests are written BEFORE implementation (TDD approach)
Prompt:
"Generate tests for a product search function:
function searchProducts(products, query) {
// Returns products matching query in name or description
}
Include tests for:
- Case-insensitive matching
- Partial matches
- Empty query
- Empty product list
- Special characters
- Multiple word queries
- No matches found
"
Output:
describe('searchProducts', () => {
const products = [
{ name: 'iPhone 13', description: 'Latest Apple phone' },
{ name: 'Samsung Galaxy', description: 'Android flagship' }
];
test('finds products with case-insensitive match', () => {
expect(searchProducts(products, 'iphone')).toHaveLength(1);
});
test('handles empty query by returning all products', () => {
expect(searchProducts(products, '')).toEqual(products);
});
// ... more tests
});
Prompt:
"Add comprehensive JSDoc comments to this function:
[paste function]
Include:
- Description of what the function does
- @param tags with types and descriptions
- @returns tag with type and description
- @throws tag for any errors
- @example showing usage"
Example output:
/**
* Fetches product reviews from multiple sources and aggregates ratings
*
* @param {string} productId - Unique identifier for the product
* @param {Array<string>} sources - Array of source names (e.g., ['amazon', 'bestbuy'])
* @returns {Promise<Object>} Aggregated review data with average ratings
* @throws {Error} If productId is invalid or sources array is empty
*
* @example
* const reviews = await fetchReviews('prod-123', ['amazon', 'walmart']);
* console.log(reviews.averageRating); // 4.2
*/
async function fetchReviews(productId, sources) {
// Implementation...
}
Step 1: Describe the problem clearly
"I'm trying to [expected behavior] but instead [actual behavior].
Here's the relevant code:
[paste code]
Here's the error message:
[paste error]
What's causing this and how do I fix it?"
Step 2: Provide context if needed
"The function is called from [context].
It receives data in this format: [example]
I've verified that [what you've already checked]"
Step 3: Iterate based on suggestions
"I tried [suggestion] but now [new issue].
Here's the updated code: [paste]"
Prompt:
"My React component isn't re-rendering when state changes.
Code:
function ReviewList() {
const [reviews, setReviews] = useState([]);
async function fetchReviews() {
const data = await fetch('/api/reviews').then(r => r.json());
reviews = data; // Bug is here!
}
return <div>{reviews.map(r => <ReviewCard key={r.id} review={r} />)}</div>;
}
Error: Component doesn't update after fetchReviews() is called.
What's wrong?"
AI Response:
"The issue is on line 6: you're directly mutating the `reviews` variable
instead of using the `setReviews` setter function. React relies on state
setters to trigger re-renders.
Fix:
- setReviews(data);
+ setReviews(data);
Explanation: React doesn't detect direct assignments to state variables..."
Agents connect an LLM to external capabilities (retrieval, calculators, code execution, APIs). They can reduce hallucination and staleness and automate multi-step tasks — but they introduce new risks, so design guardrails up front.
Agents amplify productivity — treat them like instrumentation: design for auditability, testability, and clear human oversight.
What happens: AI confidently generates false information
Example:
User: "How do I use the React useProductReviews hook?"
AI: "Sure! Here's how:
import { useProductReviews } from 'react';
const { reviews, loading } = useProductReviews(productId);"
Problem: No such built-in hook exists!
Defense: - Verify function/library names in official docs - Test code before committing - Cross-reference with authoritative sources
What happens: AI suggests deprecated or old approaches
Example:
AI: "Use componentDidMount for API calls in React"
Problem: Hooks (useEffect) are the modern approach
Defense: - Check publication dates of techniques - Prefer official documentation - Ask “Is this the current best practice in 2025?”
What happens: AI presents uncertain information as fact
Example:
User: "What's the best database for this use case?"
AI: "PostgreSQL is definitely the best choice."
Problem: Many factors determine "best" choice
Defense: - Ask for trade-offs: “What are pros and cons of different options?” - Request alternatives: “What are 3 options and when to use each?” - Make your own decisions based on requirements
The Issue: LLMs are trained on copyrighted code
Your Responsibility: - Understand your code’s provenance - Check if generated code includes copyrighted snippets - Add proper licenses to your projects - Don’t claim AI-generated code as entirely your own work
Best Practice:
For Students:
Allowed ✅: - Using AI for boilerplate code - Getting explanations of concepts - Debugging assistance - Code suggestions as learning tools
Not Allowed ❌: - Submitting AI-generated code without understanding it - Using AI for exams without permission - Claiming AI work as entirely your own - Bypassing learning objectives
Always: - Disclose AI usage if required - Demonstrate understanding when asked - Use AI to learn, not to avoid learning
The Issue: LLMs can reflect biases from training data
Examples: - Gender assumptions in user models - Cultural assumptions in UI design - Accessibility oversights - English-centric examples
Mitigation: - Review generated code for assumptions - Test with diverse users - Explicitly ask for inclusive design - Add bias checks to your prompts
Never include in prompts: - API keys, passwords, tokens - Customer data or PII - Proprietary algorithms - Confidential business logic - Production database credentials
Example of safe prompt:
✅ "Write a function to authenticate users with JWT tokens"
❌ "Write auth using this secret key: sk_live_abc123xyz..."
Know your tools:
Tool | Data Retention | Training on Your Code |
---|---|---|
GitHub Copilot | Not used for training (opt-in only) | No |
ChatGPT Free | Conversations may train models | Yes (opt-out available) |
ChatGPT Plus | Can disable training | Your choice |
Claude | Not used for training | No |
On-premise models | Data stays local | You control |
Best Practice:
AI should handle: 80% of boilerplate, routine tasks You should handle: 20% of critical thinking, architecture, business logic
Ideal workflow:
1. You design the solution architecture
2. AI generates boilerplate and structure
3. You review and modify generated code
4. AI helps debug issues
5. You write tests and verify functionality
6. AI generates documentation
7. You review everything before committing
Don’t let AI prevent learning:
Remember: Today’s junior developer who learns with AI becomes tomorrow’s senior developer who uses AI effectively.
E. Bruno - Advanced LLM Concepts for Developers