First Release of NCAA WBB 2025-26 Roster Data

Author

Derek Willis

Published

October 19, 2025

A few years ago I started publishing roster data for women’s college basketball, thanks to the assistance of students in my Sports Data Analysis and Visualization class at Maryland. The first few times were a painstaking and lengthy process, mostly due to slight but substantive variations in how colleges presented roster information on their websites. While many NCAA teams use SIDEARM Sports to serve their websites, not all of them do, and even then there are differences in how the data is structured. The end result was that we’d usually wouldn’t be able to produce a complete dataset until well into the season, even for Division I teams.

That’s mostly a solved problem now, thanks to the assistance of AI coding agents like Copilot and Claude Code. While I’ve had some frustrating journeys into dead-ends with these tools, for the most part they have dramatically increased the efficiency and speed of doing this work. Which is why I’m glad to say that today I’m publishing a preliminary release of 2025-26 NCAA women’s basketball rosters, including all but one D-I school (South Carolina State, which hasn’t published its roster yet).

The dataset, available for free on GitHub as a CSV file, includes information on more than 10,600 players from 730 NCAA teams. That leaves about 200 teams remaining to add, mostly from Divisions II and III. I plan to continue updating the dataset as more rosters are published, and will release a final version once all rosters are available, ideally not long into the season at all. You also can see the Python code that produces this information; fair warning: it’s more than 3,000 lines long.

The information in the data file only includes what has been published on team sites; I’ve not yet provided any standardization or addition, but that will happen once I have a complete dataset. For now, the data includes player names, positions, heights, class years, hometowns and previous schools (high school or otherwise). So if a team doesn’t make it simple to pull information about a player’s high school or previous college, those details might not appear in this dataset. But the goal is to publish a much more complete one during the early part of the season.

There’s another aspect of the data that’s coming, but isn’t ready just yet. This year, the Sports Data class students are working on cleaning and publishing coaching history data for current Division I coaches. This is a more complex task, since it involves getting detailed information on coaching histories from what often is narrative text on bio pages. Again, Large Language Models should be able to help with that work.

In theory, doing this for other sports should be possible, even though it will involve changes to the scrapers to accommodate different attributes for each sport. But if that’s relatively straightforward, then you’ll soon see more sports roster data releases.