Python Regular Expressions Short Answers

What are regular expressions in Python?

Regular expressions (regex) are sequences of characters that define search patterns for text. In Python, they are implemented through the re module. They are used for pattern matching, searching, and text manipulation.

How to import and use the re module?

Import using import re. Then use functions like re.search(), re.findall(), re.match(), re.sub(), etc. Example: re.search(r'\d+', 'abc123') finds digits.

What is the difference between search() and match()?

match() checks for a match only at the beginning of the string. search() checks for a match anywhere in the string. Example: re.match('c', 'abc') returns None, re.search('c', 'abc') finds 'c'.

What does findall() function do?

findall() returns all non-overlapping matches of a pattern in a string as a list. Example: re.findall(r'\d+', 'a1 b22 c333') returns ['1', '22', '333'].

What does sub() function do?

sub() replaces occurrences of a pattern with a replacement string. Syntax: re.sub(pattern, repl, string). Example: re.sub(r'\d+', '#', 'a1 b2') returns 'a# b#'.

What are common regex special characters?

. - Matches any character except newline
\d - Matches any digit [0-9]
\w - Matches word character [a-zA-Z0-9_]
\s - Matches whitespace
^ - Matches start of string
$ - Matches end of string
[] - Character class
| - OR operator

What are regex quantifiers?

* - 0 or more occurrences
+ - 1 or more occurrences
? - 0 or 1 occurrence
{n} - Exactly n occurrences
{n,} - n or more occurrences
{n,m} - Between n and m occurrences
Example: a{2,4} matches 'aa', 'aaa', or 'aaaa'.

What is a raw string (r'') in regex?

Raw strings (r'pattern') treat backslashes as literal characters, not escape sequences. Essential for regex to avoid double escaping. Example: r'\d+' instead of '\\d+'.

What are character classes in regex?

Character classes match one character from a set. [abc] matches a, b, or c. [a-z] matches any lowercase letter. [^abc] matches anything except a, b, or c. Example: [aeiou] matches vowels.

What are groups in regex?

Groups () capture parts of a match. Used to extract specific portions. Example: r'(\d{3})-(\d{2})' captures area code and number separately. Access groups with match.group(1), match.group(2).

What is the split() function in regex?

re.split() splits a string by occurrences of a pattern. Example: re.split(r'\s+', 'a b c') returns ['a', 'b', 'c']. More powerful than string's split() as it uses patterns.

What are lookahead and lookbehind assertions?

Lookahead: (?=...) - matches if ... follows
Negative lookahead: (?!...) - matches if ... doesn't follow
Lookbehind: (?<=...) - matches if ... precedes
Negative lookbehind: (? - matches if ... doesn't precede
Example: r'\d(?=px)' matches digit only if followed by 'px'.

What are flags in regex?

Flags modify regex behavior: re.IGNORECASE or re.I - Case-insensitive matching
re.MULTILINE or re.M - ^ and $ match start/end of line
re.DOTALL or re.S - . matches newline
Example: re.search('python', 'PYTHON', re.I) matches case-insensitively.

What is compile() function?

re.compile() pre-compiles a regex pattern into a regex object for reuse. Improves performance when same pattern is used multiple times. Example: pattern = re.compile(r'\d+'), then pattern.findall(text).

How to extract email addresses using regex?

Pattern: r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'. Use with findall(): re.findall(pattern, text). Captures common email formats like user@example.com.

How to validate phone numbers with regex?

Example pattern: r'$\d{3}$ \d{3}-\d{4}' matches (123) 456-7890. More flexible: r'(\+\d{1,3}[-\.\s]?)?$?\d{3}$?[-\.\s]?\d{3}[-\.\s]?\d{4}' matches various formats.

What are non-greedy quantifiers?

Non-greedy (lazy) quantifiers match as little as possible. Add ? after quantifier: *?, +?, ??, { }?. Example: r'<.*?>' matches shortest HTML tag, while r'<.*>' matches longest.

How to match word boundaries?

Use \b for word boundary. Matches position where word character is not next to another word character. Example: r'\bcat\b' matches 'cat' but not 'catalog' or 'concatenate'.

What is the finditer() function?

finditer() returns an iterator yielding match objects for all non-overlapping matches. Useful for processing matches one by one. Example: for match in re.finditer(r'\d+', text): print(match.group()).

What are common regex pitfalls?

1. Not using raw strings for patterns
2. Greedy matching when non-greedy needed
3. Forgetting to escape special characters
4. Not handling edge cases
5. Performance issues with complex patterns
6. Unicode issues with \w and \b
Always test with various inputs and consider performance for large texts.

re.match vs re.search — practical difference?

match tries only at the start of the string; search scans forward — interviews slip candidates who assume “find substring.”

How does re.MULTILINE change ^ and $?

They anchor to each line boundary, not only the whole string — without it, $ matches absolute end only.

Does . match newlines by default?

No — enable re.DOTALL if dot must span lines; multiline HTML/logs commonly trip this.

What is catastrophic backtracking?

Nested quantifiers can explode alternation work exponentially — mitigate by simplifying patterns, possessive ideas via refactoring, or compiled parsers for nested languages.

Why does findall return tuples sometimes?

If the pattern contains capturing groups, results become tuples per match — use (?:...) non-capturing groups when you want whole-match strings.

Purpose of (?:...)?

Groups precedence without capturing — avoids polluting findall results and keeps numbering stable.

Are \w and \b ASCII-only?

Default Unicode semantics — letters beyond ASCII can match \w; use re.ASCII when you need strict ASCII word rules.

Variable-length lookbehind in Python’s re?

Lookbehind patterns must be fixed width — arbitrary-length lookbehind fails at compile time; alternative engines differ.

When is fullmatch preferred?

Validating that the entire string conforms (IDs, tokens) — avoids accidentally accepting partial substring matches.

What does inline (?i) do?

Toggles case-insensitive matching for the remainder or scoped segment — handy but can hurt readability if overused.

How does re.VERBOSE change parsing?

Whitespace is ignored outside classes; you document patterns across lines — escape spaces/literals where needed.

Trap: zero-length matches in manual slicing loops?

Engines can emit empty matches — advance indices carefully or you risk infinite loops when consuming input manually.

sub with a callable receives what?

A match object per occurrence — enables conditional replacements beyond literal strings.

Why use named groups (?P<name>...)?

groupdict() yields readable keys — scales better than remembering numeric indexes in complex parsers.

re.split with capturing groups — surprise?

Delimiters captured by groups appear in the resulting list — differs from simple string split.

Overlapping matches like finding every position in “aaa”?

Standard matching is non-overlapping — overlapping scans need lookahead tricks or shifting indices yourself.

Is re.compile safe to reuse across threads?

Yes — pattern objects are immutable; compiling once saves work in hot loops (still measure if micro-optimizing).

Parse HTML with regex — acceptable?

Classic interview trap — regex extracts snippets; nested/st malformed HTML needs parsers (html.parser, lxml). Same for “validate email” strictly.

Raw strings — still footguns?

A raw string cannot end with a lone backslash — you still balance quotes and escapes where the lexer demands.

finditer vs scanner / overlapping scan performance?

finditer streams matches — good for large texts; for many passes reuse compiled patterns and avoid re-scanning unchanged slices unnecessarily.

Python Programming

External Resources

Python Regular Expressions

Python Regular Expressions Interview Questions

Tricky interview questions