Python Regular Expressions
Python Regular Expressions Interview Questions
What are regular expressions in Python?
Regular expressions (regex) are sequences of characters that define search patterns for text. In Python, they are implemented through the re module. They are used for pattern matching, searching, and text manipulation.
How to import and use the re module?
Import using import re. Then use functions like re.search(), re.findall(), re.match(), re.sub(), etc. Example: re.search(r'\d+', 'abc123') finds digits.
What is the difference between search() and match()?
match() checks for a match only at the beginning of the string. search() checks for a match anywhere in the string. Example: re.match('c', 'abc') returns None, re.search('c', 'abc') finds 'c'.
What does findall() function do?
findall() returns all non-overlapping matches of a pattern in a string as a list. Example: re.findall(r'\d+', 'a1 b22 c333') returns ['1', '22', '333'].
What does sub() function do?
sub() replaces occurrences of a pattern with a replacement string. Syntax: re.sub(pattern, repl, string). Example: re.sub(r'\d+', '#', 'a1 b2') returns 'a# b#'.
What are common regex special characters?
. - Matches any character except newline
\d - Matches any digit [0-9]
\w - Matches word character [a-zA-Z0-9_]
\s - Matches whitespace
^ - Matches start of string
$ - Matches end of string
[] - Character class
| - OR operator
\d - Matches any digit [0-9]
\w - Matches word character [a-zA-Z0-9_]
\s - Matches whitespace
^ - Matches start of string
$ - Matches end of string
[] - Character class
| - OR operator
What are regex quantifiers?
* - 0 or more occurrences
+ - 1 or more occurrences
? - 0 or 1 occurrence
{n} - Exactly n occurrences
{n,} - n or more occurrences
{n,m} - Between n and m occurrences
Example: a{2,4} matches 'aa', 'aaa', or 'aaaa'.
+ - 1 or more occurrences
? - 0 or 1 occurrence
{n} - Exactly n occurrences
{n,} - n or more occurrences
{n,m} - Between n and m occurrences
Example: a{2,4} matches 'aa', 'aaa', or 'aaaa'.
What is a raw string (r'') in regex?
Raw strings (r'pattern') treat backslashes as literal characters, not escape sequences. Essential for regex to avoid double escaping. Example: r'\d+' instead of '\\d+'.
What are character classes in regex?
Character classes match one character from a set. [abc] matches a, b, or c. [a-z] matches any lowercase letter. [^abc] matches anything except a, b, or c. Example: [aeiou] matches vowels.
What are groups in regex?
Groups () capture parts of a match. Used to extract specific portions. Example: r'(\d{3})-(\d{2})' captures area code and number separately. Access groups with match.group(1), match.group(2).
What is the split() function in regex?
re.split() splits a string by occurrences of a pattern. Example: re.split(r'\s+', 'a b c') returns ['a', 'b', 'c']. More powerful than string's split() as it uses patterns.
What are lookahead and lookbehind assertions?
Lookahead: (?=...) - matches if ... follows
Negative lookahead: (?!...) - matches if ... doesn't follow
Lookbehind: (?<=...) - matches if ... precedes
Negative lookbehind: (? - matches if ... doesn't precede
Example: r'\d(?=px)' matches digit only if followed by 'px'.
Negative lookahead: (?!...) - matches if ... doesn't follow
Lookbehind: (?<=...) - matches if ... precedes
Negative lookbehind: (? - matches if ... doesn't precede
Example: r'\d(?=px)' matches digit only if followed by 'px'.
What are flags in regex?
Flags modify regex behavior:
re.IGNORECASE or re.I - Case-insensitive matching
re.MULTILINE or re.M - ^ and $ match start/end of line
re.DOTALL or re.S - . matches newline
Example: re.search('python', 'PYTHON', re.I) matches case-insensitively.
re.MULTILINE or re.M - ^ and $ match start/end of line
re.DOTALL or re.S - . matches newline
Example: re.search('python', 'PYTHON', re.I) matches case-insensitively.
What is compile() function?
re.compile() pre-compiles a regex pattern into a regex object for reuse. Improves performance when same pattern is used multiple times. Example: pattern = re.compile(r'\d+'), then pattern.findall(text).
How to extract email addresses using regex?
Pattern: r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'. Use with findall(): re.findall(pattern, text). Captures common email formats like user@example.com.
How to validate phone numbers with regex?
Example pattern: r'\(\d{3}\) \d{3}-\d{4}' matches (123) 456-7890. More flexible: r'(\+\d{1,3}[-\.\s]?)?\(?\d{3}\)?[-\.\s]?\d{3}[-\.\s]?\d{4}' matches various formats.
What are non-greedy quantifiers?
Non-greedy (lazy) quantifiers match as little as possible. Add ? after quantifier: *?, +?, ??, { }?. Example: r'<.*?>' matches shortest HTML tag, while r'<.*>' matches longest.
How to match word boundaries?
Use \b for word boundary. Matches position where word character is not next to another word character. Example: r'\bcat\b' matches 'cat' but not 'catalog' or 'concatenate'.
What is the finditer() function?
finditer() returns an iterator yielding match objects for all non-overlapping matches. Useful for processing matches one by one. Example: for match in re.finditer(r'\d+', text): print(match.group()).
What are common regex pitfalls?
1. Not using raw strings for patterns
2. Greedy matching when non-greedy needed
3. Forgetting to escape special characters
4. Not handling edge cases
5. Performance issues with complex patterns
6. Unicode issues with \w and \b
Always test with various inputs and consider performance for large texts.
2. Greedy matching when non-greedy needed
3. Forgetting to escape special characters
4. Not handling edge cases
5. Performance issues with complex patterns
6. Unicode issues with \w and \b
Always test with various inputs and consider performance for large texts.
Note: Regular expressions are powerful but can be complex. Use raw strings (r'pattern') for regex patterns. Pre-compile patterns with re.compile() for better performance when reusing. Test regex patterns thoroughly with various inputs. Consider using simpler string methods if regex is overkill for the task.