Recently, I have been building a large database for the new House-Buddy project. The site uses a huge collection of data to identify the sweet-spots around a particular location. The site is hosted on a shared hosting which in theory had unlimited DB storage, but after they noticed the size of the database has grown to 47 GB, they put their foot down and ask me to make it smaller. Granted, I was a bit careless when it came to storing data, but I was still determined to build and host this site without changing my hosting provider.
I was blissfully oblivion of the internal workings of MSSQL Server when I started this project. For example, I didn't know that MSSQL doesn't shrink the file automatically and if you are in a shared environment you may lack privileges to do it manually using DBCC SHRINKFILE
. I had to build the database that was 'small' from the very beginning. This article lists the key steps I made to reduce the size of the database.
Watch Space
There are really two main scripts that are needed to watch your space: one for the database size and one for the tables. 'Database size' is technically a sum of the database and its logs.
1 2 3 4 5 6 7 8 9 10 |
SELECT f.name, ROUND((CAST(f.size AS FLOAT) * 8.00/1024)/1024,3) AS File_Size_GB, ROUND(CAST((f.size) AS FLOAT)/128,2) AS Reserved_MB, ROUND(CAST((FILEPROPERTY(f.name,'SpaceUsed')) AS FLOAT)/128,2) AS Used_MB, ROUND((CAST((f.size) AS FLOAT)/128)-(CAST((FILEPROPERTY(f.name,'SpaceUsed'))AS FLOAT)/128),2) AS Free_MB, f.physical_name FROM sys.database_files f LEFT JOIN sys.filegroups fg ON f.data_space_id = fg.data_space_id |
You can use the following script to assess which tables are causing the biggest issues:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
SELECT t.Name AS TableName, s.Name AS SchemaName, p.rows AS RowCounts, SUM(a.total_pages) * 8 AS TotalSpace, SUM(a.used_pages) * 8 AS UsedSpace, (SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpace FROM sys.tables t INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id LEFT OUTER JOIN sys.schemas s ON t.schema_id = s.schema_id WHERE t.NAME NOT LIKE 'dt%' AND t.is_ms_shipped = 0 AND i.OBJECT_ID > 255 GROUP BY t.Name, s.Name, p.Rows ORDER BY t.Name; |
The total space taken by tables should (roughly) equal the Used space return by the first query. However, as mentioned before, SQL Server counts logs and doesn't automatically shrink the files so there may be some drift.
Prevent the log growing
Outcome: 7 GB down to 300 MB
MSSQL databases are really made of two files: data and its log. The size of the log has got a considerable impact on the 'total' size of the database if it is not being watched. How the log behaves is set by 'recovery model'. There are three main types:
- Full - stores every transaction until either a transaction log backup occurs or the transaction log is truncated.
- Simple - stores transaction when it is being executed. Once the transaction is commited, the log is cleared.
- Bulk-logged - a mix between the two. Broadly behaves as Full except for certain operations such as Bulk insert where it behaves like simple.
Therefore, to reduce the log, one needs to either use SIMPLE
or BULK-LOGGED
. I decided for SIMPLE
since seeding data wasn't a critical operation and I could always go back and reseed if necessary. Do read Microsoft document before changing your recovery mode, it may not be the right option for you.
1 |
ALTER DATABASE dbName SET RECOVERY SIMPLE |
The trick here is: switch between SIMPLE
and FULL
causes the log to be trunkated.