Advanced Multithreaded File Processing and Data Aggregation in C++
Create a performant C++ application that reads multiple large text files concurrently, processes the extracted data to compute aggregate statistics, and outputs a sorted summary report.
Challenge prompt
Build a C++ program that accepts a list of file paths, reads each file in parallel using multithreading, extracts integer values from each line, and calculates the total sum, average, maximum, and minimum values across all files. Finally, output a summary report sorted by file name that includes these statistics for each file and a combined aggregate for all files.
Guidance
- • Use C++11 or later thread support libraries (e.g., std::thread, std::mutex) for concurrent file reading.
- • Design thread-safe data structures or use synchronization primitives to aggregate data safely.
- • Optimize file reading and parsing to handle large files without excessive memory usage.
Hints
- • Consider having each thread process its file and store statistics locally before merging results.
- • Use locks or atomic operations only when updating shared aggregate data to avoid performance bottlenecks.
- • Use standard algorithms from <algorithm> for computing min, max, and sorting results.
Starter code
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <thread>
#include <mutex>
#include <map>
#include <numeric>
#include <limits>
struct Statistics {
long long sum = 0;
int count = 0;
int max = std::numeric_limits<int>::min();
int min = std::numeric_limits<int>::max();
};
std::mutex mtx;
std::map<std::string, Statistics> fileStats;
Statistics totalStats;
void processFile(const std::string& filename) {
std::ifstream file(filename);
if (!file.is_open()) {
std::cerr << "Failed to open " << filename << std::endl;
return;
}
Statistics localStats;
std::string line;
while (std::getline(file, line)) {
try {
int number = std::stoi(line);
localStats.sum += number;
localStats.count++;
if (number > localStats.max) localStats.max = number;
if (number < localStats.min) localStats.min = number;
} catch (...) {
continue; // ignore lines that aren't integers
}
}
std::lock_guard<std::mutex> lock(mtx);
fileStats[filename] = localStats;
totalStats.sum += localStats.sum;
totalStats.count += localStats.count;
if (localStats.max > totalStats.max) totalStats.max = localStats.max;
if (localStats.min < totalStats.min) totalStats.min = localStats.min;
}
int main(int argc, char* argv[]) {
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <file1> [file2 ...]" << std::endl;
return 1;
}
std::vector<std::thread> threads;
for (int i = 1; i < argc; ++i) {
threads.emplace_back(processFile, argv[i]);
}
for (auto& t : threads) {
t.join();
}
// Output sorted summary report
std::cout << "File Stats (sorted by file name):" << std::endl;
for (auto& [filename, stats] : fileStats) {
double avg = stats.count ? static_cast<double>(stats.sum) / stats.count : 0;
std::cout << filename << ": sum=" << stats.sum << ", avg=" << avg
<< ", max=" << stats.max << ", min=" << stats.min << std::endl;
}
double totalAvg = totalStats.count ? static_cast<double>(totalStats.sum) / totalStats.count : 0;
std::cout << "Combined: sum=" << totalStats.sum << ", avg=" << totalAvg
<< ", max=" << totalStats.max << ", min=" << totalStats.min << std::endl;
return 0;
}
Expected output
File Stats (sorted by file name): file1.txt: sum=123456, avg=123.45, max=999, min=1 file2.txt: sum=234567, avg=234.56, max=999, min=2 ... Combined: sum=358023, avg=179.01, max=999, min=1
Core concepts
Challenge a Friend
Send this duel to someone else and see if they can solve it.