The data used to generate this report came from the company’s clients’ birthdays TXT file.
The company’s clients’ birthdays TXT file is made up of two columns of information. The first column contains birth dates and the second column contains the corresponding number of clients. It shows us the number of the company’s clients’ who celebrated their birthdays on particular days. For example, on 1900/1/1 eight clients’ celebrated their birthdays.
How do we mine this data for interesting insights?
In order to mine the data in the TXT file for interesting insights, I decided to use descriptive statistics. Firstly, I calculated the time period, total number of clients, standard deviation, average as well as the maximum and minimum number of clients. My thinking was that this information would give me a general idea of the data. Secondly, I plotted a chart showing the birthdays with more than 375 clients’ per day. I did this because I wanted to see the birthdays containing the highest number of people. Thirdly, I took a sample of the first fourteen birth dates for the years 1900, 1901 and 1902 and compare them to one another. I did this because I wanted to see if the client base changed at any time during the 99 year time period.
What insights can we take away from the data?
The 99 year time period between 1900 and 1999 for the company’s clients’ birthdays reveals that this company is at least 9 years old. Since this is a long time for a company to be in existence, it reveals something of the company’s longevity. How was it able to survive an entire century? Did it have institutional backing? Does it still exist today in 2021?
The large number of clients that the company had over the 99 year period gave me an indication of the large size of the data set. I did the calculation and the TXT file contained 29814 rows, a large number indeed relative to other data sets that I have worked with.
The standard deviation of 64.2 told me that the data stretches far apart from the average value of 69.4 clients per birth date. This means that the number of clients’ birthdays is widely distributed. One might expect this considering that the time period is 99 years.
The minimum number of clients’ birthdays is 1. This is expected since the company would need at least one client to have a birthday for the data to be recorded. The maximum number is 541. What this means is that on 1986/1/1 541 clients had their birthdays. If the company sent birthday cards to their clients for their birthdays, then 1986/1/1 was a busy day for the department overseeing client birthdays.
What do we see in the graph above? It appears as though the 1960s was a busy time for the company. I make this assumption based on the fact that 66% (18/27) of the birthdays represented in the chart above come from the 1960s. I wonder what was happening during the 1960s? The world had just come out of World War 2 and while most countries struggled financially, the USA experienced an economic boom. Perhaps the company in question is an American company that benefited from the economic growth in the USA during the 1960s and therefore an increase in clients. This might account for the large increase in clients.
Another interesting pattern present in the chart above is that January appears to be the month with the highest number of clients’ birthdays. Why was this the case? I can only speculate, but nine months away from January leaves us in April/May, which is springtime in the northern hemisphere. I wonder if the rate of pregnancies increases during springtime? It certainly does in the animal kingdom, but what about among humans?
Examining the number of clients’ birthdays for the first 14 client birthdays from the years 1900, 1901 and 1902 shows us that the client base changed for the years 1900, 1901 and 1902. If the client base during these three time periods stayed the same then the graphs would be identical. Did this pattern continue throughout all 99 years? One might assume so, but more research is needed to be sure.