Creating a date list
The other night, a friend called asking for some Excel help. I was in the car—hands-free—on the way home after a pretty lousy day. And the vision of some Excel trickery in the evening perked me up no end.
Down the side he had a bunch of sites. Across the top he had column headings for various activities that were to take place. And in the body of the table were dates—the date on which each activity would occur for each site.
It was a great way of capturing the data, but with 100 sites and 15 activities, once populated it was difficult to figure out what was due to happen on a day-by-day basis. He wanted a list of activities across all sites in chronological order. And he wanted this list to update (and re-order) with changes to the underlying data.
The remaining 20 minutes of the car journey saw me pondering. Getting a list of the 1500 activity–site combinations tagged with the date would be trivial. The INDEX function would allow me to number the rows and columns in the source data and pick the relevant value for each row. The tricky bit would be having that data auto-sort whenever anything changed in the source. So an activity front-loaded in the plan would be near the top of the list. But if someone suddenly shifted it back six months, the requirement was for it to automatically plummet down towards the bottom of the list.
I figured that I could use the RANK function to ascertain where the date sat in relation to the others. But then I faced the issue of multiple entries sharing exactly the same date—with 1,500 activities spread across six months, this was common. And then I remembered my post on breaking rank.
Breaking rank is an inelegant yet at the same time adorable method I came up with to differentiate between identical values for the purpose of sorting. If you add a random number between zero and one—RAND()—to every single date in the list, then the dates will not change, but their underlying values will. Dates are stored as numbers, 1 representing 1 January 1900, 2 representing 2 January 1900 and so on. (Today is 40,627 for what it’s worth.) So by adding a value between zero and one to a date, all you’re doing is being more precise about the time at which it is done. 40,627.5 was midday today, as an example, the 0.5 representing half a day.
But given that the dates are being displayed as dates rather than specific timestamps, the decimal extension matters not. But crucially, it makes every single date’s number unique in the list, and so the RANK function can identify the rank of each uniquely.
So in my interim working sheet, I had 1,500 rows. My dates were in column F, starting in F2. In G2 I simply created a random number: =RAND(). And after pasting the values over the random numbers, H2 became F2+G2. Now columns F and H looked identical (formatted as they were as dates), but column H had the artificial decimal precision. Finally, in column A, I ranked the dates against the range of dates: =RANK(H2,$H$2:$H$1501,1). The “1” at the end merely informs Excel to give the lowest ranking (1) to the earliest date, as opposed to the other way round. Columns C and E contained the site and the activity respectively.
In my final presentational sheet, I simply created a list of sequence numbers (in order), from 1 to 1,500. The subsequent three columns (date, site and activity) simply looked the sequence number in the Ranks column of the working sheet and brought back the appropriate value using a VLOOKUP.
Whenever dates in the master sheet are changed, the ranks in the working sheet will adjust accordingly and the final presentational sheet will reorder accordingly.
All in all, about half-an-hour’s work. (It’s taken longer to write this post.) But more importantly, as well as helping out a friend, it put some rather lovely fondant on what had previously been a pretty shit day.
Here’s an anonymised version of the finished article.