Navigate
Home
ArticleWiki
Forum
Journal
Search
Newsletter
Links
Tech News
expertsrt.com
Welcome Guest.
Username:

Password:

Remember me

Zip HTML before inserting into DB
Welcome, Guest. Please login or register.
December 04, 2008, 12:00:33 AM
11306 Posts in 1249 Topics by 499 Members
Latest Member: haulaslemycle
Experts Round Table Network  |  Databases  |  MySQL  |  Zip HTML before inserting into DB « previous next »
Pages: [1]
Author Topic: Zip HTML before inserting into DB  (Read 373 times)
Esopo
Governing Council Member
*
Offline Offline

Posts: 74


WWW
« on: December 30, 2006, 05:10:34 AM »

Hi,

As related to this post:
http://www.expertsrt.net/main/forum/topic,1066.0/

I am working on a DB that among small pieces of data is storing entire HTML pages, and already I can predict it will be getting much bigger than I want it to. The HTML info will not be necessary for most queries, only for the occasional report.

I was thinking it would be reasonable to zip the content and insert it as binary. Have I lost my mind?

Thank you,

Esopo.


BTW, I’m only in the test stage, so I can implement anything. So far with tests I have about 60 records and already 3mbs (50k per record) and I’m no even storing all I need to put in each record yet. I expect each record to use about 80ks and I expect to have no less than 15k records a month -> 1.1Gb per month. Not something I want to deal with.

How does Google handle keeping a copy of the entire Internet?
Logged
VGR
Mentor

Offline Offline

Posts: 682



WWW
« Reply #1 on: December 30, 2006, 11:58:43 AM »

hummmm I suppose you're on zindoze. Anyway, you can use gzip compression on the server side to compress your HTML data before storing it. A bit like the compression option on the server (sending gzippped data to the browser).

this way I ***guess*** you could just simply send ut the gzipped data to the browser when reading back from the DB

if not doable, then I think you will have to compress/expand yourself the data, probably using your own algorithm, a PHP lbrary ("extension") or an external call via exec() or system()

not a big deal technically, but you've to extensively test the various solutions against major browsers.
Logged

techie overlord, answers all kind of questions on http://www.europeanexperts.org
Esopo
Governing Council Member
*
Offline Offline

Posts: 74


WWW
« Reply #2 on: December 30, 2006, 08:22:25 PM »

I've run some tests using PHP's gzcompress() and gzdeflate(). They both work reasonably well, return strings (so I can just siwtfly add it to the code when inserting/retrieving info from the DB), and are fast enough even for constant use (IMO).
I'm shooting for 70%+ compression with a very low toll on the server. gzcompress() seems to be faster than gzdeflate().

Of course, the down side is that I wouldn't be able to use MySQL to execute queries on that info, but I'm not storing it for that purpose anyway.

Here is a test page I put together:
http://www.netbulge.com/misc/test_page2.php

I do notice something strange, the gzcompress() lvl5 seems to be faster than both lvl1 and lvl9.
Logged
rdivilbiss
Governing Council Member
*
Offline Offline

Posts: 414



WWW
« Reply #3 on: January 10, 2007, 05:40:03 PM »

I was thinking it would be reasonable to zip the content and insert it as binary. Have I lost my mind?

Yes, I think so.  How is this going to be better and faster than using the file system?

Care to say how your tests are going.
Logged

Rod
Esopo
Governing Council Member
*
Offline Offline

Posts: 74


WWW
« Reply #4 on: January 10, 2007, 06:39:12 PM »

I have settled for gzcompress() lvl5. It is faster than the other lvls I tried and compresses as good as the lvl9. Since it compresses as string I can just insert it into a normal text field.

I get a 75% compression on a 30k html page in 0.0015 seconds on my hostgator shared host.
Since I'm looking at about 500 queries per day, in theory it should have a toll on operations of less than a second per day... not bad at all.

In reality it takes a bit to load the library, so it should add up to more than a second, perhaps up to 5 or 10 per day. Still way within reason. The best thing is that the DB size stays manageable. The field of course can't be searched, but I'm only storing it is a reference in case some calculations need to be done in the future for reporting purposes.
Logged
Pages: [1]
« previous next »
    Jump to: