Tuesday, June 25, 2013

Citi Bike at Age 4 (weeks): By the Numbers

Citi Bike, New York's much anticipated, sometimes feared, and greatly needed bike share system, has now been in operation for four weeks. I have all manner of qualitative observations from this first month of service, but for now I'm going to lay out some quantitative analysis of the system's debut and early adoption.

I'm working with data published daily on the Citi Bike blog, including some useful figures (daily trips, new membership sales) and some of questionable value (daily miles, average trip duration, most popular stations). That latter category is not totally useless: over the weekend, the system reached the fun milestone of 1 million miles traveled, which makes for good press; and I suppose you could calculate the average systemwide speed, [EDIT: I did this just for kicks, and it's a mess. See below.] but it's the former on which I'll focus in an effort to track Citi Bike adoption and penetration rates.

Let's begin with the raw data. Every day, the blog provides sum totals of annual memberships sold, miles traveled, and trips taken. Antonio D'souza has been graphing these numbers:

In fact, it was Antonio's graph that inspired me to dig a little deeper into the Citi Bike numbers. Graphing the total figures for each day is good if you just want to show that the system is getting use, but it's not particularly revealing. Since these are totals since inception, they will of course only ever increase. Instead, we want to examine daily differences - the first derivative of the total numbers - to see whether the rate of growth is increasing.

Some daily increments are given on the blog (such as daily trips taken), and it's easy to derive the rest by subtracting, for example, total annual members on June 22 from total annual members on June 23. Here, then, is a chart of new membership sales and trips taken per day:
As you can see, this is quite a noisy graph. The fact that there is so much daily variation makes it difficult to discern longer-term trends. For instance, was the system somehow four times better or more appealing on 6/9 than 6/7? Certainly not; 6/7 was very rainy, and 6/9 was a beautiful Sunday. We will examine the weather/trips relationship in a future post, but let's now deal with the weekend effect - the spike in ridership that occurs, unsurprisingly, on Saturdays and Sundays.

(It's interesting to note in passing that daily ridership initially tracks quite closely with daily 24-hour passes sold, but has begun to diverge in the past week. This new trend, along with the recent strong weekday ridership, could be an indication that annual members are making more extensive use of the system - in particular for utility trips during the work week. Enabling such utility trips is a key goal of the system.)

To neutralize the weekly periodic fluctuation in Citi Bike data, we will calculate a rolling average over 7 days around each data point. This way, each number will reflect the average value over each day of the week. Financial applications usually use a trailing average, in which each point takes the average of the preceding 7 days, while scientific analysis generally employs a centered average, taking the mean of the week centered on the date in question. Here's what each of those options looks like with our data:
This chart makes clear the advantage of the centered rolling average: as Wikipedia notes, it "ensures that variations in the mean are aligned with the variations in the data rather than being shifted in time." (The trailing average is three days behind the centered average, which results in consistent underestimation when the data are generally increasing, as is happily the case with Citi Bike usage.) The chart also demonstrates how the rolling average neutralizes the periodic peaks and troughs on weekends and weekdays.

(By the way, ridership has been substantially greater on weekends than weekdays. Here are the average number of trips per day of the week:
  1. Sunday: 23,615
  2. Saturday: 18,196
  3. Thursday: 17,030
  4. Friday: 14,359
  5. Wednesday: 14,214
  6. Tuesday: 12,729
  7. Monday: 11,480
Monday is probably dragged down somewhat by opening day, which had only 6,000 trips, and Friday by that very rainy day on 6/7; I'm not sure why Thursday is so much more popular than the rest of the work week!)

Now that we have identified the data of interest (measures of change over time) and smoothed them to eliminate periodic noise, we can get down to assessing the success of Citi Bike's rollout in more detail. Let's look first at sales trends: how many new annual, weekly, and 24-hour memberships are being sold? Ideally, we'd want to see the 24-hour and weekly numbers steadily increasing, and the annual numbers holding steady or falling only slowly. (It's unreasonable to expect that annual sales won't drop off as the buzz from the launch subsides and the early adopters complete their early adoption.)
The data suggest that Citi Bike is performing quite well as measured by sales. Daily sales of 24-hour memberships have bounced around 2,100 for some time, but have recently skyrocketed; it remains to be seen whether that trend will hold up going forward. Weekly memberships initially failed to gain traction, dwindling to only 170 sales on 6/10, but have since gained traction, hovering around 400 daily sales since 6/19. Sales of annual memberships did see a fairly rapid, steady fall after launch, but stabilized around 6/10 and remained encouragingly flat through 6/20, since when they appear to have begun another slow decline. Even so, many hundreds of New Yorkers continue to join Citi Bike daily, nearly a month after the system's launch. This is heartening.

What about ridership? We know that people are buying memberships, but are they actually getting out and riding the big blue bikes? Let's compare the rolling average of daily trips to the number of active members (all annual members + new 24-hour members + 7-day passes purchased within the past week) on any given day. This figure - trips per active member - will serve as a rough proxy for Citi Bike's total mode share. (Rough, because 1 trip on Citi Bike does not necessarily imply 1 fewer trip on other modes, especially when the Citi Bike trip is recreational in nature. For instance, I recently took two trips on public transit in order to make several Citi Bike trips; but that's another story.)
This is perhaps the most heartening chart yet. As the total number of active members climbs steadily, the number of daily trips has dramatically risen (from about 12,500 on 6/10 to nearly 30,000 on 6/23), with a concomitant surge in trips per active member from a low of 0.32 to a recent high of 0.57. We may hope to see the trips-per-active-member figure continue to rise (surmounting 1.0 would be a great milestone), but the data so far suggest that Citi Bike is already becoming more and more important in the daily lives of its members.

What's next in Citi Bike data analytics? The interactive charts in this post will continue to update as more data becomes available over time, but of course my commentary will not automatically adapt to reflect new trends. I'll try to check in every so often and address recent developments as Citi Bike continues to expand its presence in NYC. You can always visit my Google Spreadsheet to see the latest numbers.

Of course, I'm not the only one exploring Citi Bike data. Greg is doing some great work analyzing the live station activity feed, and he's identified the most popular neighborhoods and come up with some good insights on best balancing practices. OpenPlans and betaNYC are holding a Citi Bike data night tomorrow. And there's a lot more data on the way, if NYC Bike Share makes good on its promise to follow Capital Bikeshare in releasing a web dashboard (though I hope they don't bury their data in a Silverlight application).

EDIT: For fun, I calculated the daily average systemwide speed (based on the otherwise useless miles per day and average trip duration data points). Here's the graph:
This is troubling, because the values are so implausible. During the first week, the claimed average speed (on those bulky bikes) is around what I attain when I'm riding my road bike through the city in a hurry (I do stop for red lights, though) - that is, it's unrealistically high. After that, though, it gets even worse, as the speed is almost always given as 7.46 +/- 0.02. I find it very difficult to believe that Citi Bike users just happen to achieve almost exactly the same average speed day after day (albeit a speed that sounds realistic given the bikes and their riders).

I'm not sure what to make of this suspicious data. It certainly shows the need for independent validation of blog data (using, say, the JSON feed). I'll keep an eye on this, though, and maybe inquire about it at tomorrow's bike share data evening.


  1. Impressive analysis Drew. Now please find a way to use this to get more stations in Williamsburg. :)

    1. Probably the best piece of quantitative evidence that a rapid rollout of stations in North Brooklyn is a good investment is Greg's Williamsburg Report Card: https://sites.google.com/site/citibikestats/nabereports#TOC-Williamsburg The few stations that are currently sited in the neighborhood are outperforming Brooklyn as a whole by nearly 50%.

      You should sign (and get your friends to sign) this petition from Council Member Steve Levin requesting the Department of Transportation's speed implementation of Citi Bike in Williamsburg and Greenpoint: http://www.change.org/petitions/new-york-city-department-of-transportation-bring-citibike-to-greenpoint-and-north-williamsburg

      I can't imagine it's a question of DOT being reluctant to complete the full 600-station rollout. We have seen delays before and will probably see them again, but I'll see if I can get some info from DOT about next steps.

  2. Speed:I suspect that the distance reported on the Citibike site is inferred from the trip times since it's not clear that they can measure distance. Initial reports said there'd be GPS on the bikes but I suspect that's not the case.

    Days: The per day data on the blog is a bit weird since it's measured 5pm-5pm. So Sunday's data actually includes Saturday evening.

    1. Interesting point! The FAQ http://a841-tfpweb.nyc.gov/bikeshare/faq/ claims in several places that the bikes have GPS, and that the GPS is used to record the bikes' routes. But I will see if I can get confirmation of this claim.

      Yes, I wish the data were 12:00 am-11:59 pm, but since I'm calculating 7-day rolling averages it doesn't make much of a difference in most of the charts.