Dataframes explained: the modern in-memory data science format

import pandas as pd data = { “Title”: (“Blade Runner”, “2001: a space odyssey”, “Alien”), “Year”: (1982, 1968, 1979), “MPA Rating”: (“R”,”G”,”R”) } df = pd.DataFrame(data)

Applications that use data frames

As I mentioned earlier, virtually every data science library or framework supports some kind of dataframe-like structure. The R language is generally credited with popularizing the data frame concept (although it existed in other forms before then). Sparkone of the first widely popular platforms for processing data at scale, has its own data frame system. The Pandas data library for Python and its speed-optimized cousin Polaritiesboth provide data frames. And the analysis database DuckDB combines the convenience of data frames with the power of a full-fledged database system.

It is worth noting that the application in question may support dataframe data formats specific to that application. For example, Pandas provides data types for sparse data structures in a data frame. In contrast, Spark does not have an explicit sparse data type, so any data in a sparse format requires an additional conversion step to be used in a Spark data frame.

Therefore, although some data frame libraries are more popular, there is no single definitive version of a data frame. They are one concept implemented by many different applications. Each dataframe implementation is free to do things differently under the hood, and some dataframe implementations also vary in end-user details.