+ Reply to Thread
Results 1 to 4 of 4
  1. #1
    Junior Member
    Join Date
    Oct 2013
    Location
    Bangalore
    Posts
    1

    Question Compression Ratio

    mysql> select count(*) from TEST; +----------+
    | count(*) |
    +----------+
    | 50260238 |
    +----------+
    1 row in set (0.00 sec)

    mysql> SELECT table_name AS "Table",round(((data_length + index_length)/1024 / 1024 / 1024), 2) "Size in GB" FROM information_schema.TABLES WHERE table_schema = "TDA" AND table_name = "TEST";
    +-----------------+------------+
    | Table | Size in GB |
    +-----------------+------------+
    | TEST | 2.74 |
    +-----------------+------------+
    1 row in set (0.02 sec)

    mysql> select count(*) from TEST1;
    +-----------+
    | count(*) |
    +-----------+
    | 211332246 |
    +-----------+
    1 row in set (0.01 sec)

    mysql> SELECT table_name AS "Table",round(((data_length + index_length)/1024 / 1024 / 1024), 2) "Size in GB" FROM information_schema.TABLES WHERE table_schema = "TDA" AND table_name = "TEST1";
    +--------------------------+------------+
    | Table | Size in GB |
    +--------------------------+------------+
    | TEST1 | 2.11 |
    +--------------------------+------------+
    1 row in set (0.00 sec)

    Here TEST table is loaded with data 50260238 rows and its compression Ratio is 2.7GB and TEST1 table is loaded with 211332246 Rows (4 time more data) but its compression ratio is 2.11 GB .


    Hows this possible ????

  2. #2
    All compression ratios depend on multiple things.
    1. Data Type (varchar, int, datetime, ...)
    2. Data Distribution
    3. Data Uniqueness

    A good way to say this is, take a table with 1M rows unordered name fields, it will create a certain size compression of 35%. Take that same data and order it by name where the same name appears more than once in a data pack, then uniqueness goes up and the distribution of values goes down and the compression ration goes up with each value that does the same thing. Some data compresses better than others where more re-useable patterns can be found, some just do not compress well. But because we break them down by each data pack, the compression get better and better with each data pack because we re-apply algorithms based on the data, so data pack one will be different than data pack 2.

  3. #3
    Can you show how many columns you have in each Table and first few rows from each?

  4. #4
    You can grab any one of these tools to help you accomplish this

    http://www.codeproject.com/Articles/...cs-Viewer-MVVM

    http://www.codeproject.com/Articles/...istics-Utility

    http://www.codeproject.com/Articles/...lity-Cplusplus

    All of these were written by Interns here at Infobright.

 

 

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts