cppadvanced90 minutes

Advanced Multi-Threaded File Search and Filter Utility in C++

Build a high-performance, multi-threaded C++ utility to recursively search through directories for files matching specific content and metadata filters, then output a summary report. This mini-project will test your architectural design, concurrency control, file I/O, and advanced C++ skills.

Challenge prompt

Create a C++ program that recursively searches a given directory for files containing a specified keyword within their content. Your utility must support filtering results by file extension(s), minimum and maximum file size, and last modified date range. Implement efficient multi-threading to utilize all available CPU cores for scanning files concurrently. At the end, generate a summary report listing all matched files with their path, size, last modified timestamp, and a snippet of the matched content surrounding the keyword in each file. Handle errors gracefully and optimize for large directory structures with potentially thousands of files.

Guidance

  • Use std::filesystem for directory traversal and metadata extraction.
  • Implement thread pools or std::async with concurrency safety to parallelize file reading and filtering.
  • To extract snippets with the keyword, read partial file content around the first match instead of loading entire files into memory.
  • Carefully design data structures to safely aggregate results from different threads and avoid race conditions.

Hints

  • Consider using std::mutex, std::lock_guard, or concurrent queues for thread-safe result storage.
  • Minimize disk IO by filtering metadata before reading file content wherever possible.
  • Split directory traversal and file reading/filtering into separate phases to improve concurrency and error isolation.

Starter code

#include <filesystem>
#include <iostream>
#include <vector>
#include <string>
#include <mutex>
#include <thread>
#include <future>

// Define a struct to hold file match information
struct FileMatch {
    std::filesystem::path filePath;
    std::uintmax_t fileSize;
    std::filesystem::file_time_type lastModified;
    std::string snippet;
};

// Function declarations
std::vector<std::filesystem::path> recursiveFileSearch(const std::filesystem::path& dir);
bool fileContainsKeyword(const std::filesystem::path& filePath, const std::string& keyword, std::string& snippet);

int main() {
    // TODO: Implement argument parsing, threading logic, filtering, and reporting
    std::cout << "Implement the multi-threaded file search and filter utility here." << std::endl;
    return 0;
}

Expected output

Summary Report: Matched Files: 3 1. /path/to/file1.txt | Size: 2048 bytes | Modified: 2024-05-20 15:32 | Snippet: "...keyword example inside file1..." 2. /path/to/file2.cpp | Size: 4096 bytes | Modified: 2024-05-18 09:12 | Snippet: "...code snippet with keyword..." 3. /path/to/notes.md | Size: 1024 bytes | Modified: 2024-05-22 11:03 | Snippet: "...documentation mentioning keyword..."

Core concepts

Multithreading and concurrencyFilesystem operations and metadataFile I/O optimizationData synchronization and thread safety

Challenge a Friend

Send this duel to someone else and see if they can solve it.