Introduction

The latest versions of Excel have the Stock and Geography Data Types in them and this page discusses how to use Data Type Stocks to great effect with a real example.

I came across a page that showed data relating to the world’s top 25 Pharmaceutical companies, as at mid March 2018. At first, I was just interested in the table as raw data that I might use one day. Then I appreciated I might use Data Types to help me out a bit more

The Data

The data come from a site called Manufacturing Chemist and the following screenshot shows what is on offer: if you go to the page, you will notice it is an image and not a table of data but I used OCR to read the data into Excel.

Top 25 Pharamaceutical Companies (extract)

As with a lot of data taken from a web site, it needs to be cleaned and this was no exception. My biggest problem was that the decimal points were often difficult to read by the OCR software so a lot of the numbers were shown as 123,145,789 instead of, say, 123,456 or even 12.3456. I used copy paste special for some of these corrections where it was easiest to do so.

Once I had the data, I copied the tab and then I needed to convert it to become an Excel Table: very simple, like this:

Data as an Excel Table

Before I created the Excel Table, I deleted the Ranking Columns (A and B) from the data and I moved all of the other data, apart from the company name, to the right of the table, leaving about 10 columns between the column of names and the rest of the data. Now I created my Excel Table …

Finally, I selected the Excel Table and clicked Data … then Data Type … Stocks and Excel converted the company names into Data Type names. Well, not quite, because there were about five names it did not recognise so I had to explore a little to see what was wrong with the names that Stocks seemed not to know. My best approach was to find the appropriate Ticker Symbol for the companies and for that I had to search on line for just a few minutes until I found them all. I use http://www.morningstar.com for that but http://www.ft.com would also help, as would yahoo finance and many other sites.

I found all of the companies but there were two problems:

1 Notice that there are TWO entries for Merck and I had to find out whether that was an error: t was not, even though the data I had said that both companies had the same Ticker! In the end, you will see that I used the Ticker Symbol MRK and MGKAF.

2 Shire’s Ticker Symbol did not work so I clicked on the link provided by Manufacturing Chemist and found that it had been taken over by Takeda, the final company in the list. I deleted Shire from the table as a result of that information.

In the end, then, I made a few changes to the Ticker Symbols and was then able to update the data for these companies. The screenshot above, shows the information I added:

  • Headquarters: so that I knew country and region
  • Number of employees
  • Updated P/E Ratio
  • Shares Outstanding
  • Updated Market Capitalisation
  • Beta Values
  • Currency

I extracted the Country code, US, GB etc and added Region. That completed the data I wanted/had. At this point, I merged my Excel Table with the original data that I had moved earlier, to give me my 24 rows * 16 columns data set.

Now I was ready to create my pivot table so that I could analyse the data by

  • Country
  • Region
  • Old and New Market Capitalisation
  • Old and new P/E ratios
  • Number of companies
  • and so on

Here is a screenshot of one of my pivot tables:

Pivot Table of the Top Pharma Companies

Reboot that pivot table to show the other data analysed by company, country and so on or copy this piot table and use that as a template.

I copied my pivot table and recreated it as follows:

Top Pharma: some ratios

Other Analysis

In cases such as this, I would also normally carry out additional analysis, such as creating histograms, creating a correlation matrix and preparing some descriptive statistics. As per the following:

Three Histograms to illustrate the nature of the data
Correlation Matrix with conditional formatting showing the extreme values of each variable
Dscriptive Statistics of the variables, showing mean, median and skewness all highlighted

Conclusions

There you are, a combination of using OCR to read from an image of a table of data; using Data Types … Stocks to enhance the data we have captured and several other key features of Excel used to explore and enhance the data: Excel Table, Pivot Table, Correlation Matrix, Descriptive Statistics, Histogram.

I am not providing the Excel file since the data are not mine but I have provided the link to the source and then all of the other features, formulas and functions I have used are built into Excel.

Duncan Williamson

26th September 2019