Assuming Excel is your play tool and that you need to handle data worth of 30,000 rows + in order to produce your analyses, below are some tips that could help improve and optimize your spreadsheet usage:
1. Do change the default csv connection properties!
• Prompt for file on refresh: achieves nothing but render the data-freshness process more prone to human error
• Adjust column width: creates extra work for the analyst by enlarging certain columns, thereby hindering visibility of others
• Insert cells for new data, delete unused cells: causes problems with formulas referencing the external data. As cells are deleted, formula references are broken, causing a “#Ref” error.
Therefore, it’s much better to use the following settings, and to make sure that your connection overwrites and clears unused cells, which in no way impacts the formulas referencing the external data
2. Clear your spreadsheet unused ranges regularly
Unused range is simply “a residual redundant space” – Excel remembers the last cell in which you ever had data even after that data is long gone. As a result, having large unused ranges in your workbook adds to your file size! Cleaning them regularly helps optimize enormous files.
3. Use nested formulas over 1 formula per column
Nested formulas re-calculate way faster than sub-formulas in separate columns do!
4. Avoid keeping calculation formulas in all your records!
This slows down your spreadsheets – whether you would be attempting to open, update a formula or re-filter the data – it all comes at a very high performance cost!
Pro tip: keep your formulas in the first row of your dataset, so that they can be easily checked or tweaked if needed. Formulas in all subsequent rows should be pasted as values.
5. Avoid formulas that point to external workbooks!
Those are just not sustainable!
They (a) break easily, (b) slow down formula re-calculation, and (c) unnecessarily increase spreadsheet size.
6. Where possible, do use pivot tables that share the same pivot cache!
Pivot Cache is a snapshot of your source data, held in Excel memory. When your pivot tables use multiple caches for the same dataset, this inflates spreadsheet size. Therefore, it’s a good practice to ensure that pivot tables based on the same data share one pivot cache.
7. Excel is not a storage place!
Do not use it as a data consolidation platform, especially if you’re dealing with heaps of data! Excel can only handle up to a million records, and the more of them you have, the more poorly it performs.
Do consider exporting historically generated data to either MS Access or a Power Pivot model
8. Save as .xlsb
xlsb stands for Excel Binary workbook, which means that such files can read and write data much faster than a regular xlsx or xlsm spreadsheet. The biggest advantage of xlsb files is that they can also execute macros, allowing you to preserve any VBA functionality you’ve implemented whilst keeping your file size as small as possible.
9. Separate your data modelling and data preparation layers from the presentation layer
It does help spreadsheet size if you handle all data preparation in one workbook, and have all data crunching via pivot tables and charts in another. The approach that has worked best for me in the past was to import my ready-to-crunch dataset in MS Access and then feed it into another spreadsheet via an ODBC connection. This would help me bring down the file size by half.
10. Keep version history of every design change you make to your spreadsheet and why you’re making it
This helps you keep track of the dependencies in your workbook, which will in turn stand you in good stead for any major spreadsheet revamp that may be needed in the future!