Advanced LLM Concepts for Developers

Deep Dive into AI-Assisted Development (Optional)

Université de Toulon

LIS UMR CNRS 7020

2025-10-12

Optional Advanced Material

This document provides in-depth technical knowledge about how LLMs work and advanced techniques for AI-assisted development.

Prerequisites: Complete the LLM Quick Start session and Practice 1-2 first.

When to read: When you’re comfortable with basic AI-assisted development and want to understand the underlying technology and advanced patterns.

Part 1: Understanding LLMs

What Is a Language Model?

At its core, an LLM is a probabilistic sequence model that predicts the next token:

P(token_n | token_1, token_2, ..., token_{n-1})

Example in code generation:

// Input: "function add(a, b) { return a + "
// Most likely next token: "b"
// Less likely: ";", "1", "0"
// Very unlikely: "elephant"

The model assigns probabilities to all possible next tokens and samples from this distribution.

What Are Tokens?

Tokens are the fundamental units that LLMs process:

English text: ~4 characters per token on average
Code: More variable (keywords, operators, identifiers)
Special characters: Often their own tokens

Examples:

"Hello, world!" → ["Hello", ",", " world", "!"] (4 tokens)
"function calculateTotal()" → ["function", " calculate", "Total", "(", ")"] (5 tokens)

Why Tokens Matter

Context Windows are measured in tokens:

Model	Context Window	Approximate Pages
GPT-3.5	4,096 tokens	~3 pages
GPT-4	8,192 tokens	~6 pages
GPT-4 Turbo	128,000 tokens	~96 pages
Claude 3	200,000 tokens	~150 pages
Gemini 1.5 Pro	1,000,000 tokens	~750 pages

Practical impact: - Longer context = more information the model can “remember” - Token costs affect API pricing - Token limits affect how much code you can provide as context

The Transformer Architecture : Core Mechanism: Self-Attention

LLMs use transformer architecture with self-attention mechanisms that:

Weigh relationships between all tokens in the input
Capture context both before and after each token
Learn patterns at multiple scales (syntax, semantics, structure)

Simplified example:

Input: "The cat sat on the mat"

Self-attention learns:
- "cat" relates strongly to "sat" (subject-verb)
- "sat" relates to "mat" (verb-location)
- "on" connects "sat" and "mat" (preposition relationship)

Why This Matters for Code

Self-attention allows the model to:

Understand variable scope across multiple lines
Track function calls and their arguments
Maintain consistency in naming conventions
Recognize patterns (e.g., “this pattern is a React hook”)

Training Process - Stage 1: Pre-training (Unsupervised)

The model learns from massive datasets:

Web pages: Billions of documents
Books: Fiction, non-fiction, technical
Code repositories: GitHub, GitLab, etc.
Academic papers: arXiv, PubMed, etc.

Objective: Predict the next token given previous context

Result: General understanding of language and code patterns

Training Process - Stage 2: Fine-tuning (Supervised)

The model is trained on curated datasets:

Instruction-following: “Write a function that…”, “Explain how…”
Code completion: High-quality code examples
Q&A pairs: Carefully crafted question-answer datasets

Objective: Make the model helpful for specific tasks

Result: Better instruction-following and task-specific performance

Training Process - Stage 3: Alignment (RLHF)

Reinforcement Learning from Human Feedback:

Generate multiple outputs for the same prompt
Human raters rank outputs by quality
Model learns to prefer highly-ranked outputs
Iterate to improve helpfulness, harmlessness, honesty

Objective: Align model behavior with human preferences

Result: Safer, more helpful, more truthful outputs

The Current LLM Landscape (2025) - Major Models Comparison

Model	Provider	Strengths	Context	Best For
GPT-5	OpenAI	Enhanced reasoning, creativity	128K	Complex tasks, long context
GPT-4.1	OpenAI	Improved performance, efficiency	128K	General tasks, long context
Claude 4	Anthropic	Improved understanding, safety	200K	Conversational AI, long context
Gemini 1.5 Pro	Google	Massive context, multimodal	1M	Document analysis, video
Codex	OpenAI	Code-specialized	8K	Code generation, completion
LLama 3	Meta	Open-source, versatile	32K	Research, customization
Code Llama	Meta	Open-source, code-focused	16K	Local deployment, privacy

Specialized Code Models

GitHub Copilot uses:

OpenAI Codex (GPT-4 based)
Fine-tuned on billions of lines of code
Optimized for IDE integration

Part 2: Advanced Prompt Engineering

Chain-of-Thought (CoT) Reasoning

The Problem

LLMs sometimes “jump to conclusions” without showing reasoning steps.

Example:

Prompt: "Write a function to validate an email address"
Output: [generates regex without explanation]

The Solution: Chain-of-Thought

Prompt with CoT:

Write a function to validate an email address.
Think step-by-step:
1. What are the rules for valid email addresses?
2. What edge cases should we handle?
3. What's the best approach (regex, parsing, library)?
4. How do we test it?

Then provide the implementation with comments explaining each part.

Result: More thoughtful, better-documented code with reasoning

Few-Shot Learning

Provide 2-3 examples of the pattern you want:

Example: Generating test cases

Given this function, generate test cases:

Example 1:
Function: function add(a, b) { return a + b; }
Tests:
- add(2, 3) should return 5
- add(-1, 1) should return 0
- add(0, 0) should return 0

Example 2:
Function: function isEven(n) { return n % 2 === 0; }
Tests:
- isEven(2) should return true
- isEven(3) should return false
- isEven(0) should return true

Now generate tests for:
Function: function findMax(arr) { return Math.max(...arr); }

Result: The model follows the established pattern

Prompt Chaining : Breaking Complex Tasks into Steps

Instead of one massive prompt, chain multiple prompts:

Task: Create a full authentication system

Chain:

Prompt 1: “Design a database schema for user authentication with email/password”
Prompt 2: “Using this schema [paste from step 1], write SQL migration scripts”
Prompt 3: “Write Express routes for user registration and login using this schema [paste]”
Prompt 4: “Write frontend React components for login/register forms”
Prompt 5: “Integrate the frontend [paste] with backend [paste] using fetch API”

Benefits:

Smaller, more focused outputs
Easier to verify each step
Can adjust based on intermediate results

Role Prompting - Give the AI a Persona

Basic Prompt:

"Explain React hooks"

Role-Enhanced Prompt:

"You are an experienced React developer teaching L3 university students.
Explain React hooks in a way that builds on their existing JavaScript knowledge.
Use practical examples and avoid jargon."

Result: More appropriate tone, level, and examples

Useful Roles for Development

“You are a senior developer reviewing code…” → Better quality checks
“You are a security expert auditing…” → Security-focused analysis
“You are a tech writer documenting…” → Clear, user-friendly docs
“You are a QA engineer testing…” → Thorough edge case coverage

System Prompts vs User Prompts

System Prompt (set by the application): - Defines overall behavior and constraints - Usually hidden from the user - Examples: “You are GitHub Copilot, a coding assistant…”

User Prompt (your input): - Specific request or context - What you type in the chat or editor

Using System Prompts (When Available)

Some tools let you set custom system prompts:

Example for a project:

System Prompt:
"You are a full-stack developer working on an e-commerce review aggregator.
The stack is: React, Node.js, Express, MySQL, Tailwind CSS.
Follow these conventions:
- Use async/await, not callbacks
- Use functional React components with hooks
- Follow REST API best practices
- Write accessible HTML with ARIA labels
- Use parameterized SQL queries to prevent injection"

User Prompt:
"Create a review card component"

Result: AI knows your project context and conventions automatically

Part 3: Advanced Techniques

RAG: Retrieval-Augmented Generation

LLMs have knowledge cutoffs and can’t access:

Your private codebase
Recent updates
Company-specific documentation
Database contents

The Solution: RAG

How it works:

Index your documents/code in a vector database
Retrieve relevant chunks based on your query
Augment the prompt with retrieved context
Generate response using this context

Example workflow:

User asks: "How do we handle authentication in our app?"

System:
1. Searches codebase for "authentication" files
2. Retrieves: auth.js, login.js, middleware/auth.js
3. Builds prompt: "Given this code: [paste files], answer: How do we handle authentication?"
4. LLM generates answer based on YOUR actual code

Tools That Use RAG

GitHub Copilot Chat: Can search your repo
Custom solutions: LangChain, LlamaIndex

Embeddings and Semantic Search

What Are Embeddings?

Embeddings convert text/code into vectors (arrays of numbers) that capture semantic meaning:

"authentication" → [0.23, -0.45, 0.67, ...] (1536 dimensions)
"login system"   → [0.21, -0.43, 0.69, ...] (similar vector!)
"banana recipe"  → [0.89, 0.12, -0.34, ...] (very different vector)

Why This Matters

You can find semantically similar code even with different wording:

Search query: “error handling” Matches: - “try-catch blocks” - “exception management” - “graceful failure recovery”

Even if exact words don’t appear!

Practical Application

When asking AI about your code:

Embedding model finds relevant files
Those files are added to prompt context
AI answers based on YOUR codebase, not generic knowledge

Temperature: Controlling Randomness

Temperature is a parameter (0.0 to 1.0) that controls randomness in token sampling

Low temperature (0.0 - 0.3): Deterministic, focused

Prompt: "Complete: function add(a, b) {"
Temperature 0.1: "return a + b; }" (always)

Medium temperature (0.5 - 0.7): Balanced

Temperature 0.5: 
- "return a + b; }" (common)
- "return Number(a) + Number(b); }" (occasionally)
- "const sum = a + b; return sum; }" (rarely)

High temperature (0.8 - 1.0): Creative, diverse

Temperature 0.9:
- "return a + b; }"
- "return [a, b].reduce((x, y) => x + y);"
- "if (typeof a !== 'number') throw new Error()..."
- Etc. (very diverse outputs)

When to Use Each

Temperature	Use Case	Example
0.0 - 0.2	Code completion, precise tasks	“Convert this SQL to a Sequelize query”
0.3 - 0.5	General coding, explanations	“Explain this algorithm”
0.6 - 0.8	Creative solutions, brainstorming	“Suggest 5 ways to improve this UI”
0.9 - 1.0	Highly creative tasks	“Generate unique product names”

Default for coding: Usually 0.2 - 0.4

Top-p (Nucleus Sampling)

How It Works : Instead of considering ALL possible tokens, consider only the top p% probability mass:

Top-p = 0.9 means:
- Rank all tokens by probability
- Consider only tokens that sum to 90% probability
- Sample from this subset

Result: Excludes very unlikely tokens while allowing some variety

Temperature vs Top-p

Temperature: Changes the probability distribution shape Top-p: Truncates the distribution

Best practice: Adjust one OR the other, not both aggressively

Part 4: Production-Grade AI-Assisted Development

Code Review with AI

Systematic Review Process

Step 1: Functionality Review

Prompt:
"Review this code for correctness and edge cases:
[paste code]

Check for:
- Logic errors
- Edge cases (null, empty, invalid input)
- Off-by-one errors
- Race conditions (if async)"

Step 2: Security Review

Prompt:
"Security audit of this code:
[paste code]

Check for:
- SQL injection vulnerabilities
- XSS vulnerabilities
- Authentication/authorization issues
- Sensitive data exposure
- Input validation gaps"

Step 3: Performance Review

Prompt:
"Analyze performance of this code:
[paste code]

Identify:
- Big O complexity issues
- Unnecessary loops or operations
- Memory leaks (especially React/DOM)
- Database query optimization opportunities"

Refactoring Strategies

Pattern: Incremental Refactoring

Don’t ask for: “Refactor this entire file” Do ask for: Specific, testable improvements

Example progression:

Step 1: Extract magic numbers

Prompt: "Replace magic numbers with named constants"
Before: if (age > 18)
After: const LEGAL_AGE = 18; if (age > LEGAL_AGE)

Step 2: Extract functions

Prompt: "Extract this block into a well-named function"

Step 3: Improve naming

Prompt: "Suggest more descriptive variable names"

Step 4: Add error handling

Prompt: "Add proper error handling with try-catch"

Each step is small, testable, and safe!

Testing with AI

Generate Comprehensive Test Suites

Technique: Test-Driven Prompt Design

Prompt:
"I'm writing a function to [describe functionality].

1. First, generate a comprehensive list of test cases covering:
   - Happy path scenarios
   - Edge cases
   - Error conditions
   - Boundary values

2. Then write the test code using Jest

3. Finally, implement the function to pass all tests

Function signature: [provide signature]"

Result: Tests are written BEFORE implementation (TDD approach)

Example: Testing a Search Function

Prompt:
"Generate tests for a product search function:

function searchProducts(products, query) {
  // Returns products matching query in name or description
}

Include tests for:
- Case-insensitive matching
- Partial matches
- Empty query
- Empty product list
- Special characters
- Multiple word queries
- No matches found
"

Output:
describe('searchProducts', () => {
  const products = [
    { name: 'iPhone 13', description: 'Latest Apple phone' },
    { name: 'Samsung Galaxy', description: 'Android flagship' }
  ];

  test('finds products with case-insensitive match', () => {
    expect(searchProducts(products, 'iphone')).toHaveLength(1);
  });

  test('handles empty query by returning all products', () => {
    expect(searchProducts(products, '')).toEqual(products);
  });

  // ... more tests
});

Documentation with AI

Auto-Generate JSDoc Comments

Prompt:

"Add comprehensive JSDoc comments to this function:
[paste function]

Include:
- Description of what the function does
- @param tags with types and descriptions
- @returns tag with type and description
- @throws tag for any errors
- @example showing usage"

Example output:

/**
 * Fetches product reviews from multiple sources and aggregates ratings
 * 
 * @param {string} productId - Unique identifier for the product
 * @param {Array<string>} sources - Array of source names (e.g., ['amazon', 'bestbuy'])
 * @returns {Promise<Object>} Aggregated review data with average ratings
 * @throws {Error} If productId is invalid or sources array is empty
 * 
 * @example
 * const reviews = await fetchReviews('prod-123', ['amazon', 'walmart']);
 * console.log(reviews.averageRating); // 4.2
 */
async function fetchReviews(productId, sources) {
  // Implementation...
}

Debugging with AI

The Debugging Conversation Pattern

Step 1: Describe the problem clearly

"I'm trying to [expected behavior] but instead [actual behavior].

Here's the relevant code:
[paste code]

Here's the error message:
[paste error]

What's causing this and how do I fix it?"

Step 2: Provide context if needed

"The function is called from [context].
It receives data in this format: [example]
I've verified that [what you've already checked]"

Step 3: Iterate based on suggestions

"I tried [suggestion] but now [new issue].
Here's the updated code: [paste]"

Debugging Example

Prompt:
"My React component isn't re-rendering when state changes.

Code:
function ReviewList() {
  const [reviews, setReviews] = useState([]);
  
  async function fetchReviews() {
    const data = await fetch('/api/reviews').then(r => r.json());
    reviews = data; // Bug is here!
  }
  
  return <div>{reviews.map(r => <ReviewCard key={r.id} review={r} />)}</div>;
}

Error: Component doesn't update after fetchReviews() is called.
What's wrong?"

AI Response:
"The issue is on line 6: you're directly mutating the `reviews` variable 
instead of using the `setReviews` setter function. React relies on state 
setters to trigger re-renders.

Fix:
- setReviews(data);
+ setReviews(data);

Explanation: React doesn't detect direct assignments to state variables..."

Part 5: Ethics, Limitations, and Best Practices

Understanding LLM Limitations

What LLMs Can’t Do

Access Real-Time Information
- Knowledge cutoff dates
- Can’t browse current websites
- Can’t check current package versions
Execute Code
- Can’t test if generated code actually works
- Can’t verify network requests
- Can’t check database queries
Understand Business Context
- Doesn’t know your company’s requirements
- Can’t prioritize features
- Can’t make architectural decisions
Guarantee Correctness
- May generate syntactically correct but logically wrong code
- Can miss edge cases
- May suggest deprecated approaches

Agents & human oversight

Agents connect an LLM to external capabilities (retrieval, calculators, code execution, APIs). They can reduce hallucination and staleness and automate multi-step tasks — but they introduce new risks, so design guardrails up front.

Core agent types
- Retrieval (RAG) — return cited source chunks
- Deterministic tools — calculators, test-runners for verification
- Live APIs — controlled side-effecting calls (search, services)
- Orchestrators — plan + execute multi-step workflows
Safety patterns (minimum)
- Principle of least privilege + full audit logs
- Surface provenance and a confidence score with every answer
- Cross-check critical facts with an independent retriever or API
- Refuse or ask clarifying questions when confidence is low
- Require human approval for side-effecting or high-risk actions
Quick deployment checklist
- Define scope & success metrics
- Limit tool permissions and enable logging
- Surface sources, confidence, and links for reviewers
- Monitor for anomalies and add alerts
- Human sign-off for risky operations

Agents amplify productivity — treat them like instrumentation: design for auditability, testability, and clear human oversight.

Common Failure Modes

Hallucinations

What happens: AI confidently generates false information

Example:

User: "How do I use the React useProductReviews hook?"
AI: "Sure! Here's how:
import { useProductReviews } from 'react';
const { reviews, loading } = useProductReviews(productId);"

Problem: No such built-in hook exists!

Defense: - Verify function/library names in official docs - Test code before committing - Cross-reference with authoritative sources

Outdated Information

What happens: AI suggests deprecated or old approaches

Example:

AI: "Use componentDidMount for API calls in React"
Problem: Hooks (useEffect) are the modern approach

Defense: - Check publication dates of techniques - Prefer official documentation - Ask “Is this the current best practice in 2025?”

Overconfidence

What happens: AI presents uncertain information as fact

Example:

User: "What's the best database for this use case?"
AI: "PostgreSQL is definitely the best choice."
Problem: Many factors determine "best" choice

Defense: - Ask for trade-offs: “What are pros and cons of different options?” - Request alternatives: “What are 3 options and when to use each?” - Make your own decisions based on requirements

Ethical Considerations

Copyright and Licensing

The Issue: LLMs are trained on copyrighted code

Your Responsibility: - Understand your code’s provenance - Check if generated code includes copyrighted snippets - Add proper licenses to your projects - Don’t claim AI-generated code as entirely your own work

Best Practice:

// README.md
This project uses AI-assisted development tools (GitHub Copilot).
Generated code has been reviewed, tested, and modified for our specific needs.

Academic Integrity

For Students:

Allowed ✅: - Using AI for boilerplate code - Getting explanations of concepts - Debugging assistance - Code suggestions as learning tools

Not Allowed ❌: - Submitting AI-generated code without understanding it - Using AI for exams without permission - Claiming AI work as entirely your own - Bypassing learning objectives

Always: - Disclose AI usage if required - Demonstrate understanding when asked - Use AI to learn, not to avoid learning

Bias in AI Outputs

The Issue: LLMs can reflect biases from training data

Examples: - Gender assumptions in user models - Cultural assumptions in UI design - Accessibility oversights - English-centric examples

Mitigation: - Review generated code for assumptions - Test with diverse users - Explicitly ask for inclusive design - Add bias checks to your prompts

Privacy and Security

Don’t Share Sensitive Information

Never include in prompts: - API keys, passwords, tokens - Customer data or PII - Proprietary algorithms - Confidential business logic - Production database credentials

Example of safe prompt:

✅ "Write a function to authenticate users with JWT tokens"
❌ "Write auth using this secret key: sk_live_abc123xyz..."

Data Retention Policies

Know your tools:

Tool	Data Retention	Training on Your Code
GitHub Copilot	Not used for training (opt-in only)	No
ChatGPT Free	Conversations may train models	Yes (opt-out available)
ChatGPT Plus	Can disable training	Your choice
Claude	Not used for training	No
On-premise models	Data stays local	You control

Best Practice:

Use tools with clear privacy policies
Opt out of training when possible
Never paste production secrets

Building Good AI-Assisted Development Habits

The 80/20 Rule

AI should handle: 80% of boilerplate, routine tasks You should handle: 20% of critical thinking, architecture, business logic

Ideal workflow:

1. You design the solution architecture
2. AI generates boilerplate and structure
3. You review and modify generated code
4. AI helps debug issues
5. You write tests and verify functionality
6. AI generates documentation
7. You review everything before committing

Continuous Learning

Don’t let AI prevent learning:

Read generated code — Don’t just copy-paste
Ask “why?” — Understand the reasoning
Experiment — Modify AI suggestions and see what breaks
Compare — Try multiple approaches
Research — Look up unfamiliar patterns

Remember: Today’s junior developer who learns with AI becomes tomorrow’s senior developer who uses AI effectively.

Future Trends (2025 and Beyond)

Emerging Capabilities

Multimodal Models
- Input: Images, video, audio, code
- Output: Cross-modal responses
- Example: “Debug this screenshot of an error”
Autonomous Agents
- Multi-step task execution
- Self-correction loops
- Example: “Build and deploy this feature”
Specialized Code Models
- Fine-tuned for specific frameworks
- Organization-specific models
- Example: Models trained on your company’s codebase
Real-Time Collaboration
- AI pair programming
- Contextual suggestions during coding
- Example: AI that understands your entire project