-
Notifications
You must be signed in to change notification settings - Fork 27
Home
Matt Williams edited this page Apr 16, 2018
·
1 revision
I'm trying to approach the design of plotlib in a slightly more rigorous way that some other plotting libraries. The main important features as I see it are:
- Continuous/discrete data All data is in some way constructed of 1 or more dimensions where each dimension is always either continuous or discrete. For example, a time series of temperatures is 2 dimensional with both being continuous, a data set of average height by country is discrete in the country dimension but continuous in the height dimension. For want of a better term, let's call this the dimension type.
- Continuous/discrete plots Similar to data, plots drawn on paper have N dimensions, each being continuous or discrete. For example, a scatter plot is 2D with both being continuous. A histogram is 2D and is likewise continuous/continuous. A bar chart, by comparison, is continuous in its counting dimension (usually plotted in the y-direction) but is purely discrete in the categories.
- Dimension mapping Since both data and plots have compatible structure, the idea is that a plot can be defined as being, for example, discrete in the horizontal direction and continuous in the vertical. Any data that is compatible should then be able to be mapped to it. In this example, both bar charts and box plots would 'fit'. It shouldn't matter is the data set has more dimensions or it the dimensions are in the apparently 'wrong' order, the dimensions can be mapped to make the data fit the plot. We can also support multiple data sets being drawn on the same plot as long as they match dimension types. Since a plot is peeking into a certain section of the data, we call each subplot a view.
- Data subsets As well as views being able to select only certain dimensions of a data set, they can also select subsets of the data to be drawn in each dimension. For continuous dimensions, this means a sub-range of the data (e.g. the data is from -100 to 100 but plot from 10 to 60) and for discrete dimensions, it means choosing some subset of the possible categories.
- Data is independent of view Data should only care about its internal value. The way it gets drawn to the paper should be defined by a higher level which performs the transformations. This allows, for example, a line chart to be plotted as a radial plot or for 3D data to be drawn on a 2D page with any projection.
-
The layers The current model works on a number of layers from the raw data up to the final product:
-
Some data: The raw data in some form. This may be external to plotlib and could be almost anything. Likely to be something like a
Vec<f64>
though. - Representation: The lowest layer that's part of plotlib itself. This is the point at which it has its dimensionality strictly defined and each dimension has its type denoted. We also add in some information about how it should be drawn at this point by including a style object. The style is still quite abstract and must be interpreted by later layers. For example "a histogram with blue bars". This may refer to some external data source or it may copy or move the data inside itself.
- View: The part that defines how the representation should be laid out in physical space. It is defined by a set of dimensions and a number of views which map onto those dimensions.
- Page: A single view on the page. It might have a few views placed on it in the traditional 'subplot' model.
- Rendering: The final output in some explicit format. E.g. an SVG file, a PNG or a text file.
-
Some data: The raw data in some form. This may be external to plotlib and could be almost anything. Likely to be something like a
- Many possible output formats As much as possible the layers below rendering should know nothing about the pixels or bits which will be output. This allows the renderer to interpret things as it sees fit. This does mean that some elements will not be possible in some output formats. There must be flexibility here. We want the possibility to output in crazy formats like interactive JavaScript/HTML pages or STL files for 3D printing. The whole project started with wanting to draw graphs in a Linux terminal so I would like to keep that possibility.