HKJC Data 101: Raw scrapes vs. cleaned datasets
Why scraped data and properly cleaned datasets are fundamentally different — encoding, gaps, duplicates.
Coming soon
Resources
Free race-day sample CSV, full dataset reference, and a tailored proposal request for your team.
Full-day Race Results sample from Happy Valley on 2025-07-16. Load directly in Python or R to verify schema and field completeness.
Send us your data scope, use case, and budget context — we prepare a one-page PDF proposal covering coverage, delivery, and engagement terms tailored to your team.
Datasets
Four categories, 30+ products, delivered per season.
Pre-Race
Race info · Racecards · Race readings · Predicted pacing · Analyst comments · etc.
Post-Race
Results · Sectional position / time / pacing · Dividends · Pool sizes · Comments · Replay video
General
Horse info · Veterinary · Trackwork · News · Barrier trials (info / results / comments / video)
Odds
Win · Place · Quinella · Quinella Place · Trio · Double — both final and time-series
Blog
Why scraped data and properly cleaned datasets are fundamentally different — encoding, gaps, duplicates.
Coming soon
How look-ahead bias sneaks into feature pipelines and how to design a proper PIT store.
Coming soon
Building a low-cost, reproducible HK racing feature store with Polars + DuckDB.
Coming soon
Request schemas, samples, or a tailored proposal — we reply within 24 hours.