C++ File Operations Interview Questions
C++ Files and File Operations
Basic Level (15 Questions)
What are streams in C++ and how are they related to file operations?
In C++, streams are abstraction that represents flow of data between source and destination. For file operations, streams provide a uniform interface for reading from and writing to files.
Types of streams in C++:
- Input streams: For reading data (istream, ifstream)
- Output streams: For writing data (ostream, ofstream)
- Input/Output streams: For both reading and writing (iostream, fstream)
- ios (base class)
- istream (input) ← ifstream (file input)
- ostream (output) ← ofstream (file output)
- iostream (input/output) ← fstream (file input/output)
- File streams (ifstream, ofstream, fstream) are specialized stream classes for files
- They inherit stream properties and add file-specific functionality
- Same operators (<<, >>) and functions work for both console and file I/O
- Buffering and formatting capabilities are inherited from stream base classes
What are the main file stream classes in C++ and their purposes?
C++ provides three main file stream classes in the <fstream> header:
Key characteristics:
| Class | Purpose | Inherits From | Typical Use |
|---|---|---|---|
| ifstream | Input File Stream | istream | Reading from files |
| ofstream | Output File Stream | ostream | Writing to files |
| fstream | File Stream | iostream | Both reading and writing |
- ifstream: Opens files for reading by default
- ofstream: Opens files for writing by default, creates file if doesn't exist
- fstream: Can be configured for both reading and writing
- All support various file opening modes (ios::in, ios::out, etc.)
- Provide automatic resource management (RAII)
- Support both sequential and random access operations
- filebuf: Low-level file buffer class
- wifstream, wofstream, wfstream: Wide character versions
What are the different file opening modes in C++ and when to use them?
File opening modes are specified using constants from the ios class and control how files are opened.
Common mode combinations:
| Mode | Description | Default For |
|---|---|---|
| ios::in | Open for reading | ifstream |
| ios::out | Open for writing (truncates if exists) | ofstream |
| ios::app | Append mode (write at end) | - |
| ios::ate | Seek to end when opening | - |
| ios::trunc | Truncate file if exists | ofstream (implied with ios::out) |
| ios::binary | Binary mode (no text translation) | - |
| ios::nocreate | Open fails if file doesn't exist | - |
| ios::noreplace | Open fails if file exists | - |
- ios::in | ios::out: Open for both reading and writing
- ios::out | ios::app: Append to file
- ios::out | ios::trunc: Write, truncate if exists (default for ofstream)
- ios::in | ios::binary: Read binary file
- ios::out | ios::binary: Write binary file
- ios::in | ios::out | ios::binary: Read/write binary
- Modes are combined using bitwise OR (|) operator
- ios::trunc is implied with ios::out unless ios::app or ios::ate is specified
- Binary mode prevents newline translation (important on Windows)
- ios::nocreate and ios::noreplace are not standard (implementation-specific)
- Always check if file opened successfully after specifying modes
How do you open and close files in C++ using different methods?
Methods to open files:
1. Constructor with filename and mode:
- ifstream inputFile("data.txt", ios::in);
- ofstream outputFile("output.txt", ios::out | ios::app);
- fstream ioFile("file.txt", ios::in | ios::out);
- ifstream inputFile;
- inputFile.open("data.txt", ios::in);
- ifstream inputFile;
- const char* filename = "data.txt";
- inputFile.open(filename);
- File closes automatically when stream object goes out of scope
- Destructor calls close() automatically
- inputFile.close();
- Useful when:
- Need to reuse same stream object for different file
- Want to free file handle before end of scope
- Need to ensure file is written immediately (flush and close)
- if (inputFile.is_open()) { // File opened successfully }
- if (!inputFile) { // Failed to open }
- if (inputFile.fail()) { // Open failed }
- Always check after opening attempt
- Prefer RAII (automatic closing) when possible
- Use explicit close() when reusing stream objects
- Always check if file opened successfully
- Handle errors appropriately (throw exceptions or return error codes)
- Consider using std::filesystem::path (C++17) for filenames
What is the difference between text mode and binary mode file operations?
| Aspect | Text Mode (Default) | Binary Mode (ios::binary) |
|---|---|---|
| Newline handling | Platform-specific translation | No translation (raw bytes) |
| End-of-file | May have special character | No special EOF character |
| Character encoding | May apply locale settings | Raw bytes, no encoding |
| Use case | Human-readable text files | Images, executables, structured data |
| Portability | Better for text across platforms | Consistent across platforms |
| File size | May differ from actual bytes written | Exactly matches bytes written |
| Read/write operations | Formatted I/O (>>, <<) | Unformatted I/O (read(), write()) |
- On Windows: " " becomes " " when writing, " " becomes " " when reading
- On Unix/Linux: No translation (but still considered text mode)
- May handle EOF character (Ctrl+Z on Windows) specially
- Automatic buffer flushing for newlines in some implementations
- No character translation occurs
- read() and write() work with exact byte counts
- File position indicators work with byte offsets
- Essential for non-text data (structs, images, etc.)
- Use text mode for: Configuration files, log files, CSV, XML, JSON
- Use binary mode for: Database files, image formats, executables, serialized objects
- Always use binary mode: When writing/reading C++ objects directly, when exact byte representation matters
- Consider text mode: When files need to be human-readable/editable
What are the different methods for reading from files in C++?
Formatted input methods (text mode):
- operator>>: Formatted extraction (stops at whitespace)
- getline(): Read line including spaces
- get(): Read single character
- get(char_array, size): Read characters into array
- ignore(): Skip/discard characters
- read(): Read raw bytes
- get(): Read single byte
- readsome(): Read available bytes without blocking
- Reads until whitespace
- Good for space/tab separated data
- Skips leading whitespace
- Reads entire line including spaces
- Default delimiter is newline
- Can specify custom delimiter
- Common for text files, CSV, log files
- Reads single character
- Doesn't skip whitespace
- Useful for parsing or low-level reading
- Reads specified number of bytes
- Essential for binary files
- Used with structures or raw data
- Check eof() after reading attempts
- Check fail() for formatting errors
- Check bad() for serious errors
- Use good() for general state check
- clear() to reset error flags
- while (file >> variable) { } - for formatted input
- while (getline(file, line)) { } - for lines
- while (file.get(ch)) { } - for characters
- Don't use eof() in loop condition (reads one extra time)
What are the different methods for writing to files in C++?
Formatted output methods (text mode):
- operator<<: Formatted insertion
- put(): Write single character
- write() with strings: Write string data
- flush(): Force write to disk
- write(): Write raw bytes
- put(): Write single byte
- Uses stream manipulators for formatting
- Automatically converts types to text
- Supports width, precision, fill characters
- Good for human-readable output
- Writes single character/byte
- Useful for writing raw characters
- Can write non-printable characters
- Writes specified number of bytes
- Takes pointer to data and size
- Essential for binary files
- Used with structures or raw data
- file << string_variable;
- file.write(string_data.c_str(), string_data.size());
- file.puts() for C-strings (not common in C++)
- Output is typically buffered for performance
- flush(): Forces buffer to be written
- endl: Newline + flush
- unitbuf: Unbuffered mode
- Automatic flush on:
- Program normal termination
- Buffer full
- close() operation
- Check fail() for formatting errors
- Check bad() for serious errors (disk full, etc.)
- Check is_open() before writing
- Consider exceptions for error handling
- Flush important data immediately (logs, critical info)
- Use ' ' instead of endl when flush isn't needed
- Check disk space for large writes
- Handle write errors gracefully
- Consider atomic writes for critical data
How do you perform random access file operations in C++?
Random access allows reading/writing at any position in the file, not just sequentially.
File position indicators:
- tellg(): Get current read position (ifstream)
- tellp(): Get current write position (ofstream)
- seekg(): Set read position (ifstream)
- seekp(): Set write position (ofstream)
- ios::beg: Beginning of file
- ios::cur: Current position
- ios::end: End of file
- file.seekg(offset, direction); // Set absolute position
- file.seekg(offset, ios::beg); // From beginning
- file.seekg(offset, ios::cur); // From current position
- file.seekg(offset, ios::end); // From end
- file.seekg(position); // From beginning (default)
- file.seekg(100, ios::beg); // Go to byte 100
- file.read(buffer, size); // Read from there
- file.seekp(0, ios::end); // Go to end
- file.write(data, size); // Append data
- long position = file.tellp(); // Remember position
- // ... later ...
- file.seekp(position); // Return to position
- file.write(updated_data, size); // Overwrite
- file.seekg(-N, ios::end); // N bytes from end
- file.read(buffer, N); // Read last N bytes
- Random access works best in binary mode
- Text mode position may be affected by newline translation
- Positions are byte offsets in binary mode
- Check stream state after seek operations
- seekg()/seekp() return the stream (can be checked)
- Invalid seeks set failbit
- Always verify position was set correctly
- Database files (indexed access)
- Large files (seek to specific section)
- File format parsers (skip headers, read specific sections)
- In-place updates of records
- Memory-mapped file alternatives
What is the difference between sequential and random access file operations?
| Aspect | Sequential Access | Random Access |
|---|---|---|
| Access pattern | Start to end, in order | Any position, any order | Position control | Limited (read/write sequentially) | Full control (seekg(), seekp()) |
| Performance | Good for linear processing | Good for indexed/direct access |
| Complexity | Simpler to implement | More complex, requires position management |
| Use cases | Log files, streaming data, backups | Databases, indexed files, in-place updates |
| File structure | Often simple, linear | May have indexes, headers, fixed records |
| Error recovery | Easier (continue from last position) | More difficult (must track positions) |
| Examples | Reading CSV, parsing logs, copying files | Database records, image formats, archives |
- Files are processed from beginning to end
- Natural for streaming operations
- No need to track positions
- Good for pipes, network streams, tape drives
- Efficient for large files that are processed once
- Can jump to any position using seek operations
- Requires knowledge of file structure
- Essential for database-like operations
- Good for files with fixed-size records
- Enables in-place updates without rewriting entire file
- Processing log files line by line
- Reading configuration files
- Streaming data (audio/video)
- Backup/restore operations
- Simple data import/export
- Database record retrieval
- Image format processing (read headers, then data)
- Archive files (zip, tar)
- Memory-mapped file operations
- Files with index tables
- Many applications use both approaches
- Example: Read index sequentially, then random access to data
- Example: Sequential write, random read (logs)
- Modern file systems support both efficiently
How do you handle errors and exceptions in file operations?
Error states in file streams:
- good(): No errors, operations possible
- eof(): End of file reached
- fail(): Logical error (formatting, type mismatch)
- bad(): Physical error (disk full, hardware error)
- rdstate(): Returns current state flags
- clear(): Reset error flags
- File not found (opening for reading)
- Permission denied
- Disk full (writing)
- Invalid path
- Network timeout (network drives)
- Format errors (reading wrong data type)
- Unexpected EOF
- Check is_open() after opening
- Check stream state after operations
- Use fail(), bad(), eof() as needed
- Example: if (!file) { handle error; }
- Enable exceptions: file.exceptions(ios::failbit | ios::badbit)
- Catch std::ios_base::failure exceptions
- Provides more detailed error information
- Example: try { file.open("data.txt"); } catch (const ios_base::failure& e) { }
- Use exceptions for critical errors
- Use manual checking for recoverable errors
- Always clean up resources in case of errors
- Retry: For transient errors (network, locked files)
- Alternative path: Use backup file or default location
- Partial recovery: Read what you can, skip errors
- Log and continue: Record error, continue with other files
- Abort: For critical errors, stop processing
- Always check if file opened successfully
- Handle errors close to where they occur
- Provide meaningful error messages
- Clean up resources (close files) even on error
- Consider using RAII for automatic cleanup
- Log errors for debugging
- Validate file content, not just existence
- Check available disk space before large writes
- No-throw guarantee: Operation never throws
- Strong guarantee: Either succeeds or has no effect
- Basic guarantee: No resource leaks, valid state
- File operations typically provide basic guarantee
- Can implement stronger guarantees with careful design
How do you read and write binary files with structures in C++?
Reading and writing structures directly to binary files requires careful handling.
Basic approach:
- Open file in binary mode (ios::binary)
- Use read() and write() methods
- Pass address of structure and its size
- Handle padding and alignment issues
- ofstream file("data.bin", ios::binary | ios::out);
- MyStruct data = { ... };
- file.write(reinterpret_cast<const char*>(&data), sizeof(MyStruct));
- ifstream file("data.bin", ios::binary | ios::in);
- MyStruct data;
- file.read(reinterpret_cast<char*>(&data), sizeof(MyStruct));
- Compilers may add padding bytes for alignment
- sizeof() includes padding
- Padding bytes may contain garbage values
- Use #pragma pack or compiler attributes to control packing
- Different architectures have different:
- Byte order (endianness)
- Structure padding
- Size of basic types
- Floating point representation
- Files written on one system may not read correctly on another
- Structure layout may change between versions
- Need backward/forward compatibility
- Consider including version field in structure
- Never write pointers directly (point to invalid memory when read)
- Instead, write the data the pointer points to
- Or use serialization for complex data structures
- Use fixed-size types (int32_t, uint64_t, etc.) from <cstdint>
- Mark structures as packed if portability needed
- Include magic numbers and version fields
- Write/read arrays of structures carefully
- Handle endianness conversion if needed
- Consider checksums for data integrity
- Document binary format thoroughly
- Serialization libraries (Boost.Serialization, cereal, protobuf)
- JSON/XML for text-based serialization
- Database systems for complex data
- Memory-mapped files for large datasets
What are file buffers and how do they affect file operations?
File buffers are memory areas used to temporarily store data being read from or written to files.
Purpose of buffering:
- Reduce number of system calls (improve performance)
- Batch small reads/writes into larger operations
- Allow asynchronous I/O operations
- Smooth out speed differences between memory and disk
- Full buffering: Buffer flushed when full (default for files)
- Line buffering: Buffer flushed on newline (common for terminals)
- No buffering: Immediate write (unbuffered)
- Streams maintain internal buffers
- Buffer size is implementation-defined
- Can access buffer via rdbuf() method
- Can set custom buffer with rdbuf()
- flush(): Flush output buffer
- endl: Newline + flush
- unitbuf: Set unit buffer mode (flush after each output)
- Automatic flush on:
- Buffer full
- File close
- Program termination
- Input from cin after cout
- Buffered data may be lost if program crashes
- Solution: flush() critical data
- Or use unbuffered mode for critical writes
- Larger buffers = better performance but more memory
- Frequent flushing = safer but slower
- Need to balance based on use case
- Multiple processes/threads may see stale data
- Solution: flush() and/or file locking
- Or use memory-mapped files with synchronization
- rdbuf(): Get/set stream buffer
- pubsetbuf(): Set buffer for filebuf
- sync(): Synchronize buffer with file
- Let system manage buffers for most cases
- Use flush() for critical data (logs, transactions)
- Use ' ' instead of endl when flush isn't needed
- Consider buffer size for performance-critical applications
- Be aware of buffering when debugging I/O issues
- Disable buffering (std::nounitbuf) for real-time logging
- Custom stream buffers for specialized behavior
- Memory-mapped files for very large files
- Double buffering for continuous data streams
- Circular buffers for producer-consumer patterns
How do you work with temporary files in C++?
Temporary files are used for intermediate storage that doesn't need to persist.
Traditional C++ approaches:
1. Manual temporary file creation:
- Generate unique filename (using timestamp, PID, random numbers)
- Create file with ofstream
- Delete manually when done
- Risk of name collisions if not careful
- FILE* tmpfile(void);
- Creates temporary file that auto-deletes on close
- Returns FILE* (C style, not C++ streams)
- Binary mode, read/write access
- Automatic cleanup (deleted when closed or program ends)
- char* tmpnam(char* s);
- Generates unique filename
- Deprecated due to security issues (TOCTOU race conditions)
- Not recommended for new code
- std::filesystem::temp_directory_path(): Get temp directory
- Can generate unique paths with UUID or random names
- More control and better security
- Requires manual cleanup
- GetTempPath(), GetTempFileName() APIs
- More control over temporary file creation
- mkstemp(): Creates temporary file with unique name
- Returns file descriptor, more secure
- tmpnam() and tempnam() exist but have security issues
- Always clean up temporary files
- Use RAII to ensure cleanup even on exceptions
- Store in appropriate temp directory (not current dir)
- Set appropriate permissions (not world-readable)
- Consider security implications
- Handle full temp directory gracefully
- Avoid predictable temporary file names
- Set restrictive file permissions
- Consider encrypted temp files for sensitive data
- Use secure deletion when appropriate
- Beware of symlink attacks
- Memory-mapped files for large data
- In-memory buffers (std::vector, std::string)
- Pipes for inter-process communication
- Database temporary tables
- RAM disks for performance-critical temp data
What is the <filesystem> library (C++17) and how does it help with file operations?
The <filesystem> library (C++17) provides modern, portable facilities for filesystem operations.
Key components:
Common filesystem operations:
1. Path manipulation:
- std::filesystem::path: Represents filesystem paths
- std::filesystem::directory_iterator: Iterate directory contents
- std::filesystem::file_status: File type and permissions
- std::filesystem::space_info: Disk space information
| Aspect | Traditional C++ | C++17 Filesystem |
|---|---|---|
| Path handling | Manual string manipulation | Path objects with automatic normalization |
| Portability | Platform-specific code needed | Portable across platforms | Error handling | errno, manual checking | Exceptions or error codes |
| Directory operations | Platform-specific APIs | Standardized interface |
| File information | stat() or platform APIs | Uniform interface |
| Symlink handling | Platform-specific | Built-in support |
- Construct paths with / operator
- Get filename, extension, parent path
- Check if path is absolute or relative
- Normalize paths (remove ., .., duplicate separators)
- exists(): Check if file/directory exists
- file_size(): Get file size
- last_write_time(): Get modification time
- permissions(): Get/set file permissions
- is_regular_file(), is_directory(): Check file type
- create_directory(), create_directories(): Create directories
- remove(), remove_all(): Delete files/directories
- current_path(): Get/set current directory
- directory_iterator: Iterate directory contents
- recursive_directory_iterator: Recursive iteration
- copy(): Copy files/directories
- rename(): Move/rename files
- equivalent(): Check if two paths refer to same file
- hard_link_count(): Count hard links
- space(): Get disk space information
- Returns capacity, free space, available space
- Two error handling modes:
- Throw exceptions (default)
- Return error_code (overload with ec parameter)
- Can check specific error conditions
- filesystem_error exception provides detailed info
- Can construct fstream with filesystem::path
- Path can be converted to string for compatibility
- Better error checking before opening files
- Use filesystem::path instead of strings for paths
- Handle errors appropriately (check return values)
- Use noexcept versions when appropriate
- Consider performance for frequent operations
- Be aware of symlink behavior (follow or not)
How do you handle file permissions and attributes in C++?
Traditional C++ approaches:
1. Using open() flags (low-level):
- When using fopen() or open() (C style)
- Can specify permissions as octal numbers
- Example: fopen("file.txt", "w"); // Default permissions
- Platform-specific behavior
- Windows: _chmod(), SetFileAttributes()
- Unix/Linux: chmod(), fchmod(), stat()
- Require conditional compilation
- std::filesystem::perms enumeration
- Includes owner, group, others permissions
- Read, write, execute bits
- Special bits (setuid, setgid, sticky)
- auto perms = status(path).permissions();
- Check using bitwise operations: if (perms & perms::owner_read) { }
- Check using convenience functions: (perms & perms::others_all) == perms::none
- permissions(path, new_perms);
- permissions(path, new_perms, perm_options);
- perm_options: replace, add, remove
- perms new_perms = all & ~(perms::owner_write | perms::group_write | perms::others_write);
- permissions(path, new_perms);
- permissions(path, perms::add_perms | perms::owner_exec | perms::group_exec);
- permissions(path, perms::owner_read | perms::owner_write | perms::group_read);
- last_write_time(): Get/set modification time
- Platform may also support creation and access times
- Uses std::filesystem::file_time_type
- is_regular_file(), is_directory(), is_symlink()
- is_block_file(), is_character_file()
- is_fifo(), is_socket()
- file_type() returns enumeration
- hard_link_count(): Number of hard links
- equivalent(): Check if paths refer to same file
- Hidden, system, archive, compressed attributes
- Need Windows API or third-party libraries
- Extended attributes (xattr)
- Access control lists (ACLs)
- Need platform-specific APIs
- Check permissions before sensitive operations
- Be careful with setuid/setgid bits
- Consider umask when creating files
- Least privilege principle: Grant minimum necessary permissions
- Use C++17 filesystem when available
- Check permissions before file operations
- Set appropriate permissions for new files
- Handle permission errors gracefully
- Consider security implications of file permissions
- Document permission requirements
Tricky Level (10 Questions)
What are the performance considerations for file operations in C++?
Factors affecting file I/O performance:
1. Buffer size:
- Larger buffers reduce system calls
- Optimal size depends on hardware and filesystem
- Typical sizes: 4KB to 64KB
- Can be tuned with pubsetbuf()
- Sequential access faster than random access
- Large contiguous reads/writes faster than small ones
- Read-ahead and write-behind caching help sequential access
- Large files may be fragmented
- Fragmentation hurts sequential performance
- Solid-state drives (SSDs) less affected by fragmentation
- Different filesystems have different performance characteristics
- Journaling filesystems add overhead
- Network filesystems have latency issues
- SSD vs HDD (SSD much faster for random access)
- Disk cache/RAM size
- RAID configurations
- Network speed for network drives
- Use appropriate buffer size
- Flush strategically (not too often)
- Consider memory-mapped files for large files
- Read/write large blocks instead of small pieces
- Use vectors/arrays for batch processing
- Avoid frequent open/close operations
- Structure files for sequential access when possible
- Use indexes for random access in large files
- Consider caching frequently accessed data
- Overlap I/O with computation
- Use threads or async I/O APIs
- Platform-specific async I/O (Windows OVERLAPPED, Linux AIO)
- Map file directly to memory address space
- OS handles paging and caching
- Excellent for random access patterns
- Simplifies code (access like memory)
- Measure actual performance on target system
- Consider both throughput (MB/s) and IOPS
- Test with realistic data sizes and patterns
- Account for filesystem cache effects
- Too many small reads/writes
- Frequent file open/close operations
- Not checking if optimizations are effective
- Ignoring hardware limitations
- Not considering filesystem characteristics
- System monitoring tools (iostat, perfmon)
- Profiling tools (perf, VTune)
- Benchmarking libraries (Google Benchmark)
- Custom timing with high-resolution clocks
- Profile before optimizing
- Optimize for the common case
- Consider both average and worst-case performance
- Test on target hardware
- Document performance assumptions and requirements
How do you handle large files (more than 4GB) in C++?
Challenges with large files:
- 32-bit file offsets may overflow (2GB or 4GB limits)
- Memory limitations for loading entire file
- Performance issues with naive approaches
- Filesystem limitations (maximum file size)
- Use types that support large offsets (streamoff, streampos)
- Ensure compiler defines _FILE_OFFSET_BITS=64 or equivalent
- On Windows: Use _fseeki64, _ftelli64 or 64-bit file streams
- Check sizeof(streamoff) to ensure it's 8 bytes (64-bit)
- Use seekg()/seekp() with 64-bit offsets
- Check that tellg()/tellp() return 64-bit positions
- Test with files > 4GB to verify
- Read/write file in manageable chunks
- Process each chunk independently
- Maintain state between chunks if needed
- Example: Process 64MB chunks of a 10GB file
- Map portions of file to memory as needed
- OS handles paging of file data
- Efficient for random access patterns
- Platform-specific APIs or boost::iostreams::mapped_file
- Process file as stream, don't load entire file
- Read, process, discard, repeat
- Minimal memory usage
- Requires sequential processing algorithm
- Split file into sections
- Process sections in parallel
- Combine results
- Challenges: Synchronization, merging results
- Use CreateFile with FILE_FLAG_NO_BUFFERING for very large files
- Consider overlapped (asynchronous) I/O
- Check for 32-bit vs 64-bit application limitations
- Use open() with O_LARGEFILE flag (for 32-bit systems)
- 64-bit systems typically have native large file support
- Consider direct I/O (O_DIRECT) to bypass cache for very large files
- FAT32: 4GB maximum file size
- NTFS: 16EB (theoretical), 256TB (practical)
- ext4: 16TB to 1EB depending on configuration
- Check filesystem limitations
- Always use 64-bit file operations
- Test with files > 4GB during development
- Handle out-of-disk-space errors gracefully
- Consider progress reporting for long operations
- Implement checkpoint/restart capability
- Use appropriate data structures (don't load entire file)
- Validate file size before processing
- Memory-mapped file libraries (boost, mmap)
- File chunking utilities
- Progress bar libraries
- Compression libraries for storage efficiency
How do you work with CSV and other delimited text files in C++?
CSV file characteristics:
- Comma-separated values (other delimiters possible: tab, pipe, semicolon)
- Text format, human-readable
- Rows separated by newlines
- Fields may be quoted (especially if containing delimiters or newlines)
- Header row often present
- Use getline() with delimiter
- Split each line on delimiter
- Works for CSV without quoted fields or embedded delimiters
- Example: while (getline(file, line)) { parse line; }
- Handle quoted fields, escaped quotes
- Track whether inside quotes
- Handle line continuations within quoted fields
- More complex but handles real-world CSV
- Read line with getline()
- Create stringstream from line
- Use getline() with delimiter on stringstream
- Convert strings to appropriate types
- Fast-cpp-csv-parser: Header-only, fast
- CSVparser: Simple C++ CSV parser
- Boost.Tokenizer: General tokenization
- Using regular expressions (slow for large files)
- Use << operator with comma separators
- Quote fields that contain delimiters or quotes
- Escape quotes by doubling them
- Write newline at end of each row
- Check each field for special characters
- Quote fields containing: delimiter, newline, quote
- Escape quotes within quoted fields
- Consider locale for number formatting
- "John, Doe",25 → Two fields: "John, Doe" and "25"
- Need to parse quotes properly
- "He said ""Hello""",30 → Field: He said "Hello"
- Quotes are escaped by doubling
- Quoted fields can contain newlines
- Need to continue reading until closing quote
- a,,c → Three fields: "a", "", "c"
- Consecutive delimiters indicate empty field
- First line may contain column names
- Skip or parse headers as needed
- Map column names to indices
- CSV parsing can be CPU-intensive
- Avoid excessive string copying
- Consider memory-mapped files for large CSV
- Batch processing for performance
- Parallel parsing if format allows
- Always handle quoting and escaping
- Validate data types after parsing
- Handle encoding issues (UTF-8, ASCII, etc.)
- Provide good error messages for malformed CSV
- Consider using established libraries for complex cases
- Document CSV format expectations
- Test with edge cases (empty files, huge fields, etc.)
- JSON: More structured, better for nested data
- XML: Verbose but standardized
- Binary formats: Faster, smaller, but not human-readable
- Database: For complex querying needs
What are the security considerations for file operations in C++?
Common security vulnerabilities:
1. Path traversal:
- Attackers use ../ to access files outside intended directory
- Solution: Validate and sanitize paths, use absolute paths with bounds checking
- Example: Prevent ../../../etc/passwd access
- Attackers create symlinks to sensitive files
- Solution: Check file type, use O_NOFOLLOW or equivalent
- Create files with O_EXCL and proper permissions
- Time-of-check to time-of-use vulnerabilities
- Solution: Atomic operations, file descriptors instead of pathnames
- Example: Check file exists, then open (attacker changes symlink in between)
- Reading into fixed-size buffers without bounds checking
- Solution: Use std::string, vectors, or bounded functions
- Example: gets() vs fgets() with size limit
- Error messages revealing sensitive information
- Solution: Generic error messages in production
- Log details separately for debugging
- Files created with too permissive access
- Solution: Set appropriate umask, use restrictive permissions
- Principle of least privilege
- Validate all file paths from untrusted sources
- Whitelist allowed characters and patterns
- Canonicalize paths before use
- Use secure temporary file creation functions
- Set restrictive permissions (0600)
- Unlink immediately after creation (Unix)
- Write to temporary file, then rename atomically
- Prevents partial writes from being seen
- Example: write to file.tmp, then rename to file.txt
- Limit file sizes, number of open files
- Handle out-of-disk-space gracefully
- Set timeouts for file operations
- Encrypt sensitive files at rest
- Use proven encryption libraries
- Secure key management
- Access control lists (ACLs)
- Mandatory integrity control
- Encrypting File System (EFS)
- File access control lists (FACLs)
- SELinux/AppArmor mandatory access control
- Capabilities for fine-grained privileges
- Use the principle of least privilege
- Validate all inputs (paths, filenames, data)
- Handle errors securely (don't leak information)
- Keep software updated (patches for vulnerabilities)
- Use secure defaults (restrictive permissions)
- Audit file access and changes
- Follow secure coding guidelines (CERT, OWASP)
- Fuzz testing with malformed inputs
- Static analysis for common vulnerabilities
- Penetration testing
- Code reviews focusing on security
- Static analysis tools (Coverity, Clang Analyzer)
- Fuzzing tools (AFL, libFuzzer)
- Security scanners
- Platform security APIs
What are the best practices for file operations in modern C++?
General best practices:
1. Use RAII (Resource Acquisition Is Initialization):
- Stream objects automatically close files when they go out of scope
- Prevents resource leaks even with exceptions
- Example: { ofstream file("data.txt"); ... } // auto-closes
- Always verify file opened successfully
- Check stream state after operations
- Handle errors gracefully with appropriate feedback
- Specify binary mode when needed
- Use append mode for logging
- Consider performance implications of different modes
- Portable path manipulation
- Better directory operations
- Filesystem queries (exists, file_size, etc.)
- Avoid unnecessary string copies
- Use std::string_view for read-only string parameters
- Use gsl::span for array views
- Use unique_ptr or shared_ptr for file handles if not using RAII streams
- Custom deleters for cleanup
- Let streams manage buffers by default
- Customize buffer size for performance-critical applications
- Flush strategically (not too often)
- Efficient random access
- Simpler code (access like memory)
- OS handles caching and paging
- Overlap I/O with computation
- Use threads or async/await patterns
- File paths from users
- File contents being parsed
- Sizes and limits
- Principle of least privilege
- Secure defaults
- Consider umask when creating files
- Encrypt if necessary
- Secure deletion when appropriate
- Audit access to sensitive files
- Descriptive variable and function names
- Clear error messages
- Documented assumptions and limitations
- Separate file I/O from business logic
- Use interfaces for testability
- Consider using the Visitor pattern for complex file processing
- Unit tests for file operations
- Test with various file sizes and types
- Test error conditions and edge cases
- Integration tests with real filesystems
- <fstream> for basic file I/O
- <filesystem> for advanced operations (C++17)
- Avoid platform-specific code when possible
- Path separators (/ vs \)
- Line endings ( vs )
- File permissions models
- Use conditional compilation only when necessary
- Use network byte order or specify endianness
- Document binary format thoroughly
- Provide conversion utilities if needed
- Throw for unrecoverable errors
- Use error codes for expected error conditions
- Clean up resources in all cases
- For transient errors (locked files, network issues)
- With exponential backoff
- Maximum retry limits
- Log errors with context
- Include file paths and operations
- Consider user-friendly error messages for end users
- Safety first: Validate inputs, handle errors, clean up resources
- Use modern C++ features: RAII, filesystem library, smart pointers
- Consider performance: Buffer appropriately, use right access patterns
- Write maintainable code: Clear naming, modular design, good documentation
- Test thoroughly: Unit tests, integration tests, edge cases
- Think about security: Validate paths, set permissions, handle sensitive data
- Plan for portability: Use standard libraries, handle platform differences
What is the difference between text and binary file modes?
Text mode may translate newlines; binary mode reads/writes raw bytes—critical for portable binary formats.
Why check `fail()` vs `eof()` after reading?
fail() indicates logical/IO errors; eof() only means end reached—failed extraction can set both fail and eof bits.
What does `seekg` do?
Sets the get (read) position in an input stream relative to beg, cur, or end.
How to read entire file into `std::string`?
Use std::ifstream with ostringstream and rdbuf(), or C++17 std::filesystem helpers.
What is RAII for `fstream`?
Stream destructor closes the file automatically—avoid manual close() except when reusing the same stream object.
Note: These questions cover core interview topics. Pair with the tutorial and MCQ quiz for this section. This page lists 15 basic and 10 tricky questions—use the tutorial and MCQ links above and below.