Prompt Engineering & RAG
Prompt design, in-context learning, retrieval-augmented generation, and LLM workflows.
Prompt Engineering
Speaking to Shifting Landscapes
Prompt engineering is the strategic construction of input to guide model behavior without weight updates.
Level 1 — The 'CO-STAR' Framework
Effective prompts usually include these elements:
- C (Context): Background info.
- O (Objective): The specific task.
- S (Style): Writing tone (e.g., academic, funny).
- T (Tone): Emotional quality.
- A (Audience): Who is this for?
- R (Response): Format (JSON, Table).
Level 2 — Reasoning Chains
Chain-of-Thought (CoT) prompting involves asking the model to "Think Step-by-Step." This forces it to use its internal reasoning buffer before committing to a final answer.
Level 3 — Programmatic Prompting (DSPy)
In advanced NLP engineering, we stop writing manual prompts. We use frameworks like DSPy to automatically "compile" the best prompts based on evaluation metrics, treating prompts like code instead of "vibe-based" text.
PROMPT:
"Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 tennis balls. How many tennis balls does he have now?
Think through this step-by-step before giving the final answer."
EXPECTED OUTPUT:
"1. Roger starts with 5 balls.
2. He buys 2 cans, each with 3 balls, so 2 * 3 = 6 new balls.
3. 5 + 6 = 11.
Final Answer: 11"
Retrieval-Augmented Generation
Grounding AI in Reality
RAG prevents "hallucinations" by providing models with external facts (from PDFs, Databases, or the Web) before they generate an answer.
Level 1 — The Architecture
- User Question: "What is our leave policy?"
- Retrieval: Search a vector database for relevant "Leave Policy" text.
- Augmentation: Add that text into the prompt.
- Generation: The LLM answers using the provided text.
Level 2 — Vector Databases
RAG utilizes Embeddings. Sentences are converted into numbers. We store these in databases like Pinecone or ChromaDB, which allow "Semantic Search"—finding things by meaning rather than keywords.
Level 3 — Evaluation (RAGAS)
RAG is complex, so we use metrics like Faithfulness (did the model lie about the source?) and Relevance (was the retrieved document actually useful?).
# Pseudo-code for a RAG system
query = "How do I reset my password?"
docs = vector_db.search(query, k=3) # Retrieve top 3 relevant text chunks
prompt = f"Use these documents to answer: {docs}\n\nQuestion: {query}"
response = llm.generate(prompt)