Stuff You Shouldn't Care About: 2013

Saturday, October 12, 2013

TokuMX - What's the Big Deal With It?

Tokutek makes a variants of both MySQL and MongoDB using fractal tree indexes. The fractal tree is supposed provide a number of benefits and enhancements to these products which you can read about in detail on their page. They also offer both open source and closed/enterprise versions of these as well.

The single biggest feature that caught my eye with their product was compression for MongoDB. Mongo has been sorely lacking compression and it has been an outstanding item on their list for quite some time that they seem to have little to no interest in fixing. If TokuMX has it, then it's very likely going to be something I start using.

I figured it would quite easy and simple to deploy TokuMX as, by all claims it's just forked MongoDB with a much enhanced storage system. I already have a puppet manifests to deploy MongoDB and handle building up replica's etc... So, should be a snap to try out TokuMX, but it's not. Tokutek doesn't actually offer rpm's or deb's either to download or in a repo, but you can get a pre-compiled tar ball..... yuck. Who deploys and scales this way any more? Maybe I can just checkout their git repo and use the included packager script to bundle me up an rpm for deployment... nope they didn't actually modify this script to work with their fork.

So, advertised benefits aside, there seems to be no quick and easy way to deploy this for testing. Even if the testing proved great, I don't really want to have to maintain a fork of their code so I can bundle rpms. I guess minimally if there was a source tar ball and a spec file that worked, I'd be fine with that, but as it stands now it's the touted benefits aren't worth the extra work to make it simple to deploy and scale.

Tokutek, think of it this way. With the 10gen repo for MongoDB and a few lines of Puppet code I can deploy a dozen servers, with multiple replica sets ready for sharding in an hours time. If I can't do that with TokuMX then puts a pretty high barrier to being worth while to swap out our existing MongoDB.

Friday, July 12, 2013

GridFS HTML5 Video Streaming

One of the projects I'm working on uses MongoDB's GridFS to store video files to be played online. Due to the size of the files it's not possible to simply just load them in to RAM and send them to the client, they need to be be read and streamed out. Also, in order to support seeking with HTML5 video, we need to account for range requests being sent to the server.

This code example should provide enough in order to see how the process works. It attempts to handle errors, and range requests in a sane way.

 $this->mdb = connect_mongodb();  
 $this->gdb = $this->mdb->getGridFS();  
 $fh = $this->gdb->findOne(array('filename' => $filename));  
 if ($fh == null ) {  
   header("HTTP/1.1 404 Not Found");  
   return false;  
 }  
 $stream = $fh->getResource();  
 $length = $fh->file['length'];  
 // Check for range request  
 if (isset($_SERVER['HTTP_RANGE'])) {  
   $range = explode('=',$_SERVER['HTTP_RANGE'],2);  
   if (strtolower(trim($range[0])) != 'bytes') {  
     header('HTTP/1.1 416 Requested Range Not Satisfiable');  
     return false;  
   }  
   if (substr_count($range[1],',') >= 1) {  
     //TODO handle multi ranges  
   }  
   if ($range[1]{0} == '-') {  
     //range begins in reverse  
     $req_bytes = substr($range[1],1);  
     (int)$start = (int)$length - 1 - (int)$req_bytes;  
     (int)$end = (int)$length - 1;  
   }  
   if (is_numeric(substr($range[1],0,1)) === true) {  
     $dash_position = strpos($range[1], '-');  
     $str_length = strlen($range[1]);  
     if (($str_length - 1) == $dash_position) {  
       //Dash position on the end, so we want the remaining amount  
       $start = (int)$range[1];  
       $end = (int)$length - 1;  
     } else {  
       //There's something after the dash  
       $r_parts = explode('-',$range[1],2);  
       $start = (int)$r_parts[0];  
       $end = (int)$r_parts[1];  
     }  
   }  
   $c_length = $end - $start + 1;  
   $a = $end - $start;  
   header('HTTP/1.1 206 Partial Content');  
   header("Content-Range: bytes {$start}-{$end}/{$length}");  
   header("Content-Length: {$c_length}");  
   fseek($stream, $start);  
 } else {  
   header("Content-Length: " . $fh->file['length']);  
   $end = $length - 1;  
 }  
 header("Content-type: video/{$matches[3]}");  
 header("Content-Disposition: inline");  
 header("Accept-Ranges: bytes");  
 header("X-Accel-Buffering: no");  
 $out = 0;  
 $bytes = 8192;  
 ob_implicit_flush(true);  
 ob_end_flush();  
 set_time_limit(0);  
 do {  
   if ($end < $bytes) {  
     $bytes = $end + 1;  
   }  
   echo fread($stream, $bytes);  
   flush();  
   $out = $out + $bytes;  
   $end = $end - $bytes;  
   if ($end <= 0) {  
     break;  
   }  
   if (connection_status() != CONNECTION_NORMAL) {  
     break;  
   }  
 } while (true);  
 fclose($stream);

While this example doesn't handle every possible combination of range requests, in my testing it handles every case I've come across.

Saturday, July 6, 2013

GridFS Chunk Size

I've been working on a project that involves storing large files in MongoDB's GridFS. The files ranged in size from a few hundred megabytes up to several gigabytes. These files are then streamed out of MongoDB by using the GridFS drivers file handle methods. I began to wonder if the chunk size had any measurable effect on performance of either reading or writing. Searching around I was not able to find any one who could provide anything beyond anecdotes about chunk size, so I decided to run some tests myself.

This is a very basic test only attempting to determine if chunk size has any measurable effect on reading and writing, and secondarily when reading as a file handle, does the amount read effect performance. My test files were 50 2GB files generated from /dev/urandom. I ensured that MongoDB had already mmap enough files to contain my entire test data files. This would ensure I wouldn't be waiting on new file creation. The basic procedure was to insert all of the filed in to gridfs, and then read them back out, measuring the time it took for each file. Below are the scripts I used to generate the files, and to perform the read/write tests.

File Generator

 #!/usr/bin/php  
 <?php  
 $tmp_location = '/mnt/temp';  
 $tmp_filename = 'testfile';  
 $num_files = 50;  
 for ($i = 0; $i < $num_files; $i++) {  
  if (is_file("{$tmp_location}/{$tmp_filename}.{$i}")) {  
   continue;  
  }  
  exec("dd if=/dev/urandom of={$tmp_location}/{$tmp_filename}.{$i} bs=1M count=2000", $out, $ret);  
  var_dump($out);  
 }  
 ?>

Write Test Script

 #!/usr/bin/php  
 <?php  
 $tmp_location = '/mnt/temp';  
 $tmp_filename = 'testfile';  
 $def_chunk = 262144;  
 $chunk_mul = 1;  
 $num_files = 50;  
 $c = new Mongo('mongodb://216.81.208.140');  
 $gfs = $c->selectDB('gridtest')->getGridFS();  
 $res = $gfs->drop();  
 echo sprintf("Write Test Chunk Size %u\n", $def_chunk * $chunk_mul);  
 for ($i = 0; $i < $num_files; $i++) {  
  $start = microtime(true);  
  $res = $gfs->storeFile("{$tmp_location}/{$tmp_filename}.{$i}",array('chunkSize' => $def_chunk * $chunk_mul));  
  $end = microtime(true);   
  echo sprintf("%f\n", $end - $start);  
 }  
 ?>

Read Test Script

 #!/usr/bin/php  
 <?php  
 $tmp_location = '/mnt/temp';  
 $tmp_filename = 'testfile';  
 $def_chunk = 262144;  
 $chunk_mul = 1;  
 $read_size = 8192;  
 $num_files = 50;  
 $c = new Mongo('mongodb://216.81.208.140');  
 $gfs = $c->selectDB('gridtest')->getGridFS();  
 $null_fp = fopen('/dev/null', 'w');  
 echo sprintf("Read Test Chunk Size %u fread size %u\n", $def_chunk * $chunk_mul, $read_size);  
 for ($i = 0; $i < $num_files; $i++) {  
  $find = array('filename' => "{$tmp_location}/{$tmp_filename}.{$i}");  
  $res = $gfs->findOne($find);  
  $fstream = $res->getResource();  
  $length = $res->file['length'];  
  $start = microtime(true);  
  while(!feof($fstream)) {  
    fwrite($null_fp,fread($fstream,$read_size));  
  }  
  $end = microtime(true);  
  fclose($fstream);  
  echo sprintf("%f\n", $end - $start);  
 }  
 ?>

Results

This is the average time it took to write all 50 files to GridFS.

This is the average time it took to read all 50 files back out. For each chunk size I preformed 3 tests with different sizes passed to fread.

Conclusion

It seems that altering the chunk size for GridFS does have at least some effect for writing. At sizes of 2 megs and greater there is a marked increase in write speed. This may in due to how the various pieces of the disk subsystem, the ext4 journal, and journal in MongoDB are interacting. Read speeds don't seem to be effected to a noticeable degree. There's a slight drop at 4 & 8 meg, but except for the one odd skew, read differences are sub 1 second and don't indicate a general trend.

Although this test didn't attempt to simulate multiple readers/writers to GridFS, and since files were written in or and then read out in order this is kind of a best case scenario for read ahead optimization, it seems that that chunk size makes no difference measurable difference when reading.