NLP Tutorial

Prompt Engineering & RAG

Prompt design, in-context learning, retrieval-augmented generation, and LLM workflows.

Prompt Engineering

Speaking to Shifting Landscapes

Prompt engineering is the strategic construction of input to guide model behavior without weight updates.

Level 1 â€” The 'CO-STAR' Framework

Effective prompts usually include these elements:

C (Context): Background info.
O (Objective): The specific task.
S (Style): Writing tone (e.g., academic, funny).
T (Tone): Emotional quality.
A (Audience): Who is this for?
R (Response): Format (JSON, Table).

Level 2 â€” Reasoning Chains

Chain-of-Thought (CoT) prompting involves asking the model to "Think Step-by-Step." This forces it to use its internal reasoning buffer before committing to a final answer.

Level 3 â€” Programmatic Prompting (DSPy)

In advanced NLP engineering, we stop writing manual prompts. We use frameworks like DSPy to automatically "compile" the best prompts based on evaluation metrics, treating prompts like code instead of "vibe-based" text.

Chain-of-Thought Example

PROMPT:
"Roger has 5 tennis balls. He buys 2 more cans of tennis balls. 
Each can has 3 tennis balls. How many tennis balls does he have now?
Think through this step-by-step before giving the final answer."

EXPECTED OUTPUT:
"1. Roger starts with 5 balls.
2. He buys 2 cans, each with 3 balls, so 2 * 3 = 6 new balls.
3. 5 + 6 = 11.
Final Answer: 11"

Retrieval-Augmented Generation

Grounding AI in Reality

RAG prevents "hallucinations" by providing models with external facts (from PDFs, Databases, or the Web) before they generate an answer.

Level 1 â€” The Architecture

User Question: "What is our leave policy?"
Retrieval: Search a vector database for relevant "Leave Policy" text.
Augmentation: Add that text into the prompt.
Generation: The LLM answers using the provided text.

Level 2 â€” Vector Databases

RAG utilizes Embeddings. Sentences are converted into numbers. We store these in databases like Pinecone or ChromaDB, which allow "Semantic Search"â€”finding things by meaning rather than keywords.

Level 3 â€” Evaluation (RAGAS)

RAG is complex, so we use metrics like Faithfulness (did the model lie about the source?) and Relevance (was the retrieved document actually useful?).

Simplified RAG Logic

# Pseudo-code for a RAG system
query = "How do I reset my password?"
docs = vector_db.search(query, k=3) # Retrieve top 3 relevant text chunks

prompt = f"Use these documents to answer: {docs}\n\nQuestion: {query}"
response = llm.generate(prompt)