Fast break AI: How Databricks helped the Pacers slash ML costs 12,000X% while speeding up insights


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Stats might be everything in basketball — but for Pacers Sports and Entertainment (PS&E), data about fans is just as valuable. 

Yet while the parent company of the Indianapolis Pacers (NBA), the Indiana Fever (WNBA) and the Indiana Mad Ants (NBA G League) was pumping untold amounts of it into a $100,000-a-year machine learning (ML) platform to generate predictive models around such factors as pricing and ticket demand, the insights weren’t coming fast enough. 

Jared Chavez, manager of data engineering and strategy, set out to change that, making the move to Databricks on Salesforce a year-and-a-half ago. 

Now? His team is performing the same range of predictive projects with careful compute configurations to gain critical insights into fan behavior — for just $8 a year. It’s a jaw-dropping, seemingly unthinkable decrease Chavez credits largely to his team’s ability to reduce ML compute to near-infinitesimal amounts.  

“We’re very good at optimizing our compute and figuring out exactly how far we can push down the limit to get our models to run,” he told VentureBeat. “That’s really what we’ve been known for with Databricks.” 

Cutting OpEx by 98%

In addition to its three basketball teams, the Indianapolis-based PS&E operates a Pacers Gaming esports business, hosts March Madness games and runs a busy, 300-plus day event business through the Gainbridge Fieldhouse arena (concerts, comedy shows, rodeos, other sporting events). Further, the company just last month announced plans to build a $78 million Indiana Fever Sports Performance Center, which will be connected by skybridge to the arena and a parking garage (expected to open in 2027). 

All this makes for a mind-boggling amount of data — and data sprawl. From a data infrastructure standpoint, Chavez pointed out that, up until two years ago, the organization hosted two completely independent warehouses built on Microsoft Azure Synapse Analytics. Different teams across the business all used their own form of analytics, and tooling and skill sets varied wildly. 

While Azure Synapse did a great job connecting to external platforms, it was cost-prohibitive for an organization of PS&E’s size, he explained. Also, integrating the company’s ML platform with Microsoft Azure Data Studio led to fragmentation. 

To address these problems, Chavez switched over to Databricks AutoML and the Databricks Machine Learning Workspace in August 2023. The initial focus was to configure, train and deploy models around ticket pricing and game demand. 

Both technical and non-technical users immediately found the platforms helpful, Chavez noted, and they quickly sped up the ML process (and plummeted costs). 

“It dramatically improves response times for my marketing team, because they don’t have to know how to code,” said Chavez. It’s all buttons for them, and all that data comes back down to Databricks as unified records.”

Further, his team organized the company’s 60-some-odd systems into Salesforce Data Cloud. Now, he reports that they have 440X more data in storage and 8X more data sources in production. 

PS&E today operates at just under 2% of its previous annual OPEX costs. “We saved hundreds of thousands a year just on operations,” said Chavez. “We reinvested it into customer data enrichment. We reinvested into better tooling for not just my team, but the analytics units around the company.” 

Continued refinement, deep understanding of data

How did his team get compute so staggeringly low? Databricks has continually refined cluster configurations, enhanced connectivity options to schemas and integrated model outputs back into PS&E’s data tables, Chavez explained. The powerful ML engine is “continuously enriching, refining, merging and predicting” on PS&E’s customer records across every system and revenue stream. 

This leads to better-informed predictions with each iteration — and in fact, the occasional AutoML model sometimes makes it straight to production without any further tweaking from his team, Chavez reported. 

“Truthfully, it’s just knowing the size of the data going in, but also roughly how long it is going to take to train,” said Chavez. He added: “It’s on the smallest cluster size you could possibly run, it might just be a memory-optimized cluster, but it’s just knowing Apache Spark fairly well and knowing which way we could store and read the data fairly optimally.”

Who’s most likely to buy season tickets?

One way Chavez’ team is using data, AI and ML is in propensity scoring for season tickets packages. As he put it: “We sell an ungodly number of them.”

The goal is to determine which customer characteristics influence where they choose to sit. Chavez explained that his team is geo-locating addresses they have on file to make correlations between demographics, income levels and travel distances. They’re also analyzing users’ purchase histories across retail, food and beverage, mobile app engagement and other events they might attend on PS&E’s campus. 

Further, they’re pulling in data from Stubhub, Seat Geek and other vendors outside of Ticketmaster to evaluate price points and determine how well inventories are moving. This can all be married with everything they know about a given customer to figure out where they’re going to sit, Chavez explained. 

Armed with that data, they could then, for instance, upsell a given customer from Section 201 to section 101 center court. “Now we’re able to not only resell his seat in the higher deck, we can also sell another smaller package on the same seats he purchased in the mid-season, using the same characteristics for another person,” said Chavez. 

Similarly, data can be used to enhance sponsorships, which are critical to any sports franchise. 

“Of course, they want to align with organizations who overlap with theirs,” said Chavez. “So can we better enrich? Can we better predict? Can we do custom segmentation?”

Ideally, the goal is an interface where any user could ask questions like: ‘Give me a section of the Pacers fan base in their mid-to-late 20s with disposable income.’ Going even further: ‘Look for those that make more than $100K a year and have an interest in luxury vehicles.’ The interface could then bring back a percentage that overlap with sponsor data. 

“When our partnership teams are trying to close these deals, they can, on-demand, just pull information without having to rely on an analytics team to do it for them,” said Chavez. 

To further support this goal, his team is looking to build out a data clean room, or a secure environment that allows for the sharing of sensitive data. This can be particularly helpful with sponsors, as well as collaborations with other teams and the NCAA (which is headquartered in Indianapolis). 

“The name of the game for us right now is response time, whether that’s customer facing or internal,’ said Chavez. “Can we dramatically lessen the required knowledge to cut up information and sort through it using AI?”

Data collection and AI to understand traffic patterns, improve signage

Another area of focus for Chavez’s team is examining where people are at any given time across PS&E’s campus  (which comprises a three-tier arena with an outdoor plaza). Chavez explained that data capture capabilities are in place throughout its network infrastructure via WiFi access points. 

“When you walk into the arena, you are pinging off all of them, even if you don’t log into them, because your phone’s checking for WiFi,” he said. “I can see where you’re moving. I don’t know who you are, but I can see where you’re moving.” 

This can eventually help guide people around the arena — say, if someone wants to buy a pretzel and is looking for a concession stand — and help his team determine where to position food and merchandise kiosks. 

Similarly, location data can help determine optimal spots for signage, Chavez explained. One interesting way to identify signage impression counts is placing vision gradients at spots equivalent to average fan height. 

“Then let’s calculate how well somebody would have seen this walking through with the number of people around them,” said Chavez. “So I can tell my sponsor you got 5,000 impressions on this, and 1,200 of them were pretty good.” 

Similarly, when fans are in their seats, they are surrounded by signs and digital displays. Location data can help determine the quality (and amount) of impressions based on the angle of where they’re sitting. As Chavez noted: “If this ad was only on the screen for 10 seconds in the third quarter, who would have seen it?”

Once PS&E has adequate locational data to help answer these types of questions, his team plans to work with Indiana University’s VR lab to model the entire campus. “Then we’re just going to have a very fun sandbox to go run around in and answer all these 3D space questions that have been bugging me for the last two years,” said Chavez. 



Leave a Comment