Bike Rebalance Simulator

Toronto Bike share has a simple failure mode that shows up every single day: the wrong stations go empty or full at the wrong times. When a station hits zero bikes, riders can’t start trips. When a station hits zero docks, riders can’t end trips, and the whole network starts to degrade. Those failures aren’t random. They follow patterns that come from commuting flow, land use, and station “roles” across the city.

I built this project because most bike share work stops too early. It’s easy to make dashboards and post-hoc charts. It’s much harder to answer the operational question that actually matters: given a specific day, what should an operator do to prevent stations from failing before it happens?

This simulator is meant to be a full loop from data to action. It takes real trip records, simulates station inventory through time, and then tries different planning strategies to reduce shortages. The focus isn’t just on prediction — it’s on building a system that can propose interventions, and then verify that those interventions actually improve outcomes when replayed against real trip flow.

Modeling inventory as a day simulation

At the base of the system is a station-state simulator. Every trip is treated as two events: a departure that removes a bike from the start station, and an arrival that adds a bike to the end station (bounded by station capacity). With that, you can simulate station inventory forward minute by minute or in fixed time buckets.

I chose fixed buckets, typically 15 minutes, because they’re a good balance between realism and compute. The simulator reads all trips for a given date, converts them into time-stamped events, and updates the station bike counts as time moves forward. The output is a full day replay: bikes and empty docks at every station for every time bucket. That baseline replay is important because it makes the rest of the project measurable. You can’t improve a system you can’t reliably reproduce.

The simulator supports a simple default midnight initialization, where stations start at a uniform fill ratio of their capacity. But it also supports a stronger mode where the midnight bike distribution is explicitly provided. That matters because in real operations, the day’s outcome is heavily influenced by where bikes start at midnight — not just what trucks do later.

Learning station “behavior fingerprints” from trip history

Once the baseline simulation worked, the next piece was understanding station behavior. Stations aren’t just points on a map; they act differently depending on their context. A station in a residential neighborhood often drains bikes in the morning and fills back up in the evening. A station downtown often does the opposite. Some stations have nightlife patterns where late-night departures dominate. These patterns are stable enough that you can treat them like station identities.

To capture that, I built hourly profiles for every station using trip logs. For each station, I count how many trips depart from it during each hour of the day, and how many trips arrive to it during each hour. That produces two 24-dimensional histograms per station. Then, instead of clustering on raw counts (which would mostly cluster by volume), I normalize each station into distributions. The point is to capture the shape of activity, not how busy the station is.

By concatenating the normalized departure distribution and normalized arrival distribution, each station becomes a 48-dimensional signature vector. It’s a compact fingerprint of how that station behaves across a full day.

Clustering stations into operationally meaningful groups

With signatures built, I cluster stations using KMeans. This step takes hundreds of individual station patterns and compresses them into a small set of station archetypes. The goal isn’t academic clustering for its own sake — it’s to create a tool that makes the system more controllable.

Once the clusters exist, you can start making decisions that look like real operations. A shortage at a residential AM-outbound station is not the same as a shortage at a downtown AM-inbound station. The response timing and urgency are different. The clustering step gives the planner a way to treat station types differently without hardcoding a giant list of exceptions.

To keep clusters interpretable, I also generate simple summaries of each cluster by measuring how much mass exists in “night departures,” “AM departures,” “PM departures,” and so on. That makes it possible to attach labels later like residential outbound, downtown inbound, nightlife-heavy, or commuter-balanced.

Midnight optimization: fixing the starting conditions

Before even planning trucks, I built a midnight optimizer. This handles a surprisingly large amount of the problem because a bad starting distribution guarantees failures later no matter how smart the truck plan is. If you start the day with bikes in the wrong places, you spend the entire morning playing catch-up.

The midnight optimizer takes a fixed number of bikes in the whole system and decides how they should be allocated across stations at 00:00. It works by simulating each station’s inventory trajectory over the day using bucketized net flow (arrivals minus departures). Then it scores the station based on time spent too empty or too full relative to thresholds, like “below 10% full” or “above 90% full.”

From there, it runs a greedy 1-bike swap algorithm. In each step it finds the station that would benefit most from receiving one bike, and the station that would be least harmed (or most improved) by donating one bike, and moves a single bike. Because each station’s trajectory depends only on its own starting inventory and its own flow series, the optimizer can update costs efficiently without recomputing the entire day for every station on every move.

The result is a midnight distribution that reduces downstream failures across the day without using trucks at all. It’s a clean example of how modeling the system properly often matters more than adding complexity.

Whole-day truck planning with a global cost objective

Truck moves are the active control layer. The goal isn’t to greedily “fix the worst station right now,” because that tends to waste moves. The real objective is global: choose a small number of moves throughout the service window that reduce overall system failure across the day.

The planner represents a truck move as a timed action: move N bikes from a source station to a sink station at a specific time bucket. It enforces realistic constraints like a max truck capacity per move, donor stations not being drained below a minimum, and receivers not being filled beyond capacity. There’s also an optional travel realism constraint using station lat/lon, including distance penalties and hard rejection of overly long transfers.

The planning algorithm is greedy but global. At each step, it evaluates a set of candidate moves and selects the one that produces the largest reduction in total system cost. The cost is computed from the simulation series, and crucially, when evaluating a candidate move at time b0, only two stations are affected from that time onward. That lets the planner recompute only the cost of the source and sink rather than resimulating the entire city for every candidate.

This makes it feasible to plan across hundreds of stations with a small move budget, while still optimizing for total network health rather than one local metric.

The real upgrade: planning for buffer, not thresholds

Threshold penalties like “avoid empty” and “avoid full” are useful, but they miss the real operational intuition. Stations don’t fail because they are slightly low or slightly high. They fail when they can’t absorb what’s about to happen next.

So I upgraded the objective to be time-aware and demand-aware using a lookahead window. Instead of only penalizing stations for being near-empty or near-full, I compute upcoming departures and arrivals over the next few hours. If a station has an outbound pickup wave coming, it needs bike buffer ahead of time. If it’s about to receive a wave of arrivals, it needs dock buffer ahead of time.

That changes the planner from “react to being empty” into “pre-position inventory to prevent emptiness.” It’s closer to how real dispatch works: the best move is often the one you make before the station actually breaks.

Making the planner cluster-aware

Once clusters exist, the buffer objective can be scaled by station type and time of day. A residential station in the morning is extremely sensitive to bike availability. A downtown station in the morning is extremely sensitive to dock availability. A nightlife cluster is sensitive late at night. By applying mild multipliers based on (cluster_id, hour), the planner stops treating the city as uniform.

That makes the results more stable and more realistic, and it opens the door to tuning with operational feedback. The cluster system becomes a compact way to encode behavior patterns without hardcoding a station-by-station rule list.

Replay mode: verifying the plan actually works

Planning is only half the job. A plan that looks good on paper but can’t be verified is worthless.

After generating a schedule of TruckMove actions, I replay them through the simulator on the same trip day. Planned moves are applied at their exact times and clamped to feasibility. Then the same station-state outputs are regenerated. This makes evaluation clean: same trips, same day, same baseline — only the intervention policy changes.

That is what turns the project from analysis into a true system. You can compare baseline and planned outcomes in a way that’s honest, repeatable, and measurable.

Where this goes next

This simulator is already useful as a modeling and planning sandbox, but the most interesting next step is adding event-driven demand signals. Baseline commuter patterns are predictable, but large events create spikes that break normal assumptions. Incorporating concerts, games, and other scheduled events would let the planner pre-position inventory for abnormal flow.

The other next step is turning planner outputs into operator-facing signals instead of raw CSVs. The long-term goal is not “generate data.” The goal is to surface the right interventions, explain why they help, and make it easy to validate outcomes.

That’s the whole theme of the project: take messy real-world movement, make it measurable, and then build systems that can actually act on it.

Open →