We all do it but we all scratch our heads and wonder how to do it well. I mean downloading data from the internet. Just look at the two screenshots: before, during and after the download of part of a large database of accounting data.
From the Forbes Magazine web site, I wanted to download their Forbes Global 2000 database:
When I copied and pasted this, it was a complete mess in Excel so I copied and Pasted Special as Text, this is the result:
You can see the problems: the rank, the company name and the County have all been pasted vertically into the same column as each other whilst the quantitative data have been pasted correctly.
I used a variety of techniques to clean this initial worksheet: i needed to set up columns to separate out rank, company and country. Then I needed to put the columns in the right order … I could have done the organisations of columns first, of course.
You will then see that each cell contains a Dollar sign and the letter B or M … this latter point means that some data is shown as $ billions and some are shown as $ millions. I then needed to identify which is which and then develop a strategy for dealing with that. As an exercise you should try to solve this problem yourself … look at this blog and you will see that I have had questions from friends and delegates asking me to show them how to sort this out.
What I did is not difficult but you have to be systematic. Here is the result of my efforts with the database now being ready for analysis:
There you are! If you get stuck with this but want to know how I did what I did, go to this page … http://excel2007master.com/rough-to-smooth/: Rough to Smooth!
Duncan Williamson