The 'Small Files' Problem in Data Lakes: Why Your Kafka Sink is Slow
The 'Small Files' Problem: The Data Lake Killer Streaming data from Kafka into a Data Lake (like Amazon S3 or Azure Blob Storage) seems simple. However, if you write data as soon as it arrives, you will quickly hit the S…