Dataframe 1.0.0.0

https://news.ycombinator.com/rss Hits: 6
Summary

It’s been roughly two years of work on this and I think things are in a good enough state that it’s worth calling this v1. Features Typed dataframes We got there eventually and I think we got there in a way that still looks nice. There is now a DataFrame.Typed API that tracks the entire schema of the dataframe - column names, misapplied operations etc are now compile time failures and you can easily move between exploratory and pipeline work. This is in large part thanks to maxigit and mcoady (Github user names) for their feedback. $(DT.deriveSchemaFromCsvFile "Housing" "./data/housing.csv") main :: IO () main = do df <- D.readCsv "./data/housing.csv" let df' = either (error . show) id (DT.freezeWithError @Housing df) let df'' = df' & DT.derive @"rooms_per_household" (DT.col @"total_rooms" / DT.col @"households") & DT.impute @"total_bedrooms" 0 & DT.derive @"bedrooms_per_household" (DT.col @"total_bedrooms" / DT.col @"households") & DT.derive @"population_per_household" (DT.col @"population" / DT.col @"households") print df'' Calling dataframe from Python There’s an implementation of Apache Arrow’s C Data interface along with an example of how to pass dataframes between polars and haskell. Find that here Getting data from hugging face You can explore huggingface datasets. Example: df <- D.readParquet "hf://datasets/Rafmiggonpaz/spain_and_japan_economic_data/data/train-00000-of-00001.parquet" Larger than memory files The Lazy/query-engine-like implementation is now pretty fast. It can compute the one billion row challenge in about 10 minutes on a mac and about 30min on a 12 year old Dell (not OOM). You’ll have to generate the data yourself but the code is here. Better ergonomics with numeric promotion and null awareness Introduced more lenient operators that make happy path computation much easier. E.g: D.derive "bmi" (F.lift2 (\m h -> (/) <$> m <*> fmap ((^2) . (/100). realToFrac) h) mass height) df Is now instead: D.derive "bmi" (mass ./ (height ./ 100) .^ 2) df Wh...

First seen: 2026-03-23 11:03

Last seen: 2026-03-23 16:08