Visualization in Matlab

2018-12-01

Import Tabular Data

Tabular data can be in a single spreadsheet or a delimited text file
readtable function:
- myTable = readtable('tablename.txt');
- access the variable using dot notation: P=myTable.Pressure;
- when there is additional header in text file, provide additional inputs to indicate the number of header lines in the file: myTable = readtable('tablename.txt','HeaderLines',5)
- use the option CommentStyle to ignore some lines with specific symbols: myTable = readtable('tablename.txt','CommentStyle','##');
Representing Discrete Categories:
- By default, the values of finite variables are imported as a cell array. But it may consume the memory
- therefore using categorical variables to store those data.
- x = categorical(x)
- can specify all the possible values as additional inputs to categorical: the second input indicates the unique category values in the original array, and the third input indicates the names that correspond to these categories:
  1
  2
  3
  4
  5
  v = [ 10 5 0 0 ];
  levels = { 'beg' 'mid' 'last' };
  categorical(v,[0 5 10],levels)
  ans =
  last mid beg beg
- to remain the inherent ordering, use Ordinal:
  1
  2
  3
  4
  5
  v = [2 4 1 1];
  levels = {'tiny','small','big','huge'};
  c = categorical(v,[1 2 3 4],levels,'Ordinal',true)
  ans =
  small huge tiny tiny
- Categorical arrays allow the use of ==, >, < for comparison: y=='small'

Preprocessing Data

Calculations Involving NaNs:
- mean without NaN: y=mean(y,'omitnan')
- median without NaN: y=median(y,'omitnan')
- test an array for numeric equality, as well as determining if the NaNs align, use isequaln: test = isequaln(x,y)
Locating Missing Data:
- replace the NaN value with zero: x(isnan(x)) = 0
- delete the NaN value: x(isnan(x)) = [];
- use ismissing on a table to identify location of any kind of missing values: missingDataLocations = ismissing(tableName);
- Use any function to determine which rows have any true values: trueRows = any(grid,2), 2 indicates that the function should search for nonzero elements along the 2nd (column) dimension.
Categories and Set Operations:
- categories function returns the unique categories within a categorical array: cats = categories(variableName);
- setdiff function performs the set difference between the first and the second input: d = setdiff(a,b), returned variable d has values in a that are not present in b.
- merge different categories in a categorical array: a = mergecats(a,{'small' 'medium' 'large'},'size')
Discretizing Continuous Data:
- discretize function to categorize values into discrete bins: binNum = discretize(x,0:0.2:1).
- Note that any NaNs or values outside the range of the bins are unclassified. Add -Inf or Inf to the vector of bin edges if want to include bins for values outside of the edges.
- To discretize data into categories, use the Categorical option: cats = {'on','off'}; binNum = discretize(x,0:0.5:1,'Categorical',cats);

Graphics Formatting Function

A plot in MATLAB is a collection of graphics objects. You can change the properties of the graphics object by providing additional inputs to the function that created the graphics object.
Plot line properties:
- plot(x,y,'*','MarkerSize',8,'MarkerFaceColor',[0.5 0.5 1])
Scatter plot:
- scale the size of the markers by supplying it as the third optional input which must be either a scalar or the same length as the input values: scatter([4 5 6 7],[9 11 13 15],[25 50 75 100])
- specify marker style: plot(xData,yData,15,'kd') to plot with black diamond markers
- fill the marker: scatter(xData,yData,15,'kd','filled')
Functions for Customizing Appearance:
- xlim function will change the limits of the x-axis: xlim([1 10])
- grid command to control whether or not grid lines are displayed: grid('on') or grid('minor')
- axis command to change the style of the axes: axis('tight') axis('square')

Importing Data from Multiple Files

Create Datastores:
- A datastore is just a reference to a file or a set of files. Creating a datastore does not automatically import any data into MATLAB.
- use datastore function with the file or folder location as the input: ds = datastore('dirName/fileName.txt'); At this point, we have only created a reference to the data file.
- preview function help see the first few lines of data in the file: preview(datastoreVariable)
- Since the datastore variable does not contain any data but only the information about the file, we can access this information through its properties: ds.propertyName. The properties can be VariableNames, Files, NumHeaderLines, MissingValue
Modify Datastore Properties
- ignore that begin with the character sequence ‘//‘ by modifying the CommenStyle property: dat.CommentStyle = '//'
- set the ReadVariableNames property to false if there isn’t a line containing the variable names: dat.ReadVariableNames = false
- Set the variable name: dat.VariableNames ={'color','size','act','age','inflated'}
Import Data into MATLAB
- read and readall: to read data using datastore: data = ds.read;
- the read function will read data up to the number of lines specified by the ReadSize property of the datastore (20000 by default).
- If File1.txt has more than 20000 rows, only first 20000 are read. When read again, the rest part are readed.
- reset function: reset the datastore to the beginning of the first data file.
- After resetting, use the function readall to read all the data.
Importing Datatypes Directly
- TextscanFormats property will return a cell array with the format used to read in each column of data: fmt = dat.TextscanFormats
- By default, the numeric columns are represented with %f and non-numeric columns are interpreted to be strings denoted by %q. The format specifier for a categorical is %C. Datatime is %D
- To modify the datatype of a variable while importing, use curly braces to index into a cell of TextscanFormats and set it to the appropriate value: ds.TextscanFormats{1} = '%q'
Skipping Columns of Data
- to import only a subset of columns, use SelectedVariableNames. Only the variables listed in the SelectedVariableNames property are imported: ds.SelectedVariableNames = {'Name','Date'}

Analyzing Groups within Data

Find unique groups of data
- findgroups function can group the values in an array and get the group numbers for each value: v = {'tiger' 'lion' 'lion' 'tiger'}; grpNums = findgroups(v)
- return the group values from findgroups by requesting a second output: [grpNum,grpVal] = findgroups(v), then grpval='lion', 'tiger'
- histcounts can count the number of observations in each group: counts = histcounts(grpNum,'BinMethod','integers');
- findgroups allows for grouping with multiple inputs. In addition to the group number, it can also return the group values from each input variable: [grpNum,petVals,genderVals] = findgroups(pets,gender)
Aggregating Grouped Data
- function splitapply to perform different operations on groups of data: splitapply(@min,data,grpNums)
- after that, you can plot the grouped results using bar chart: [gNum1,gName1] = findgroups(mnth); avgWS = splitapply(@mean,hurrs.Windspeed,gNum1); bar(avgWS);xticklabels(gName1)
- monthNum2Name can help convert month to names: xticklabels(monthNum2Name(gName1)); xtickangle(45)
Aggregating Grouped Data into a Prescribed Format
- You might want to see the correlations between the groups by aggregating groups and storing the results in a particular structure. accumarray can do that.
- The first input to accumarray is the results from findgroups, with columns representing the group numbers. The second input is the data to be aggregated. The third input is left blank and the fourth input is the function to be used for aggregation: avgP = accumarray([G1 G2],Price,[],@mean)

Customizing Graphics Objects

Accessing Graphics Objects
- To modify the properties of a graphics object, the first step is to obtaining a variable (sometimes called a handle) that refers to the particular graphics object.
- Obtain the graphics object variable by assigning output from the graphics functions: f = figure
- By assigning output from the plot command, you can obtain a line object variable: p=plot(x,y)
- To get the graphics object variables for a plot that is already created, use the functions gcf, gca, and gco to obtain the current figure, axes and selected object (“get current figure/axes/object”): fig = gcf;
Querying and Modifying Properties:
- use dot notation with the property name to return object property value, e.g.: ax = gca; fw = ax.FontWeight
- use the dot notation to assign a value to an object property: ax.FontWeight = 'bold', ax.XTick = [1,4,8,12]
- modify the data values of an existing plot: p.XData = linspace(0,1,12)
- rememeber that if you want to change the property of axes, then use gca.
  1
  2
  3
  4
  5
  6
  7
  # create a figure containing 2 axes and 2 line plots
  fig = figure;
  ax1 = axes;
  l1 = plot(t,y1)
  axis tight;
  ax2 = axes('Position',[.6 .6 .25 .25])
  l2 = plot(ax2,t,y2);
The Graphics Object Hierarchy
- All graphics objects are part of a hierarchy that starts with the root, the main display containing the MATLAB environment. You can make use of the graphics object hierarchy to obtain a specific graphics object after a plot is created.
- besides using gca to get axes object, we can also get the axes graphics object using the Children property of the figure: ax = fig.Children
- The scatter and line plots are the children of the axes: p = ax.Children
- X Y axis are the children of axes:
  1
  2
  3
  4
  5
  xLab = ax.XLabel;
  xLab.FontName = 'Garamond'; % only change the font of x axis
  xAx = ax.XAxis;
  xAx.TickDirection = 'out';
  xAx.FontName = 'Courier';
- After ploting a bar chart with two or more bars, if you want to change the property of only one of the bars:
  1
  2
  3
  4
  5
  6
  ax = gca;
  b = ax.Children;
  b(1).FaceColor = [1 0 0];
  
  xAx = ax.XAxis;
  xAx.FontWeight = 'bold';

The following is a summary of this section:

ds = datastore('fuelEconomy2.txt')
ds.ReadSize = 362;
data = ds.read;
 
[gNum, gNames] = findgroups(data.NumCyl )
avgMPG = splitapply(@mean, data.CombinedMPG, gNum)
 
b = bar(avgMPG);
xlabel('Number of cylinders')
title('Average MPG')

% Customize the chart
f = gcf;
a = gca;

f.Color = [0.81 0.87 0.9];

a.Color = [0.81 0.87 0.9];
a.Box = 'off';
a.YAxisLocation = 'right';
a.YGrid = 'on';
a.GridColor = [1 1 1];
a.GridAlpha = 1;
a.XTickLabel = gVal;
a.YLim = [0 40];

ax = a.XAxis;
ax.TickDirection = 'out';

b.FaceColor = [0,0.31,0.42];
b.BarWidth = 0.5;

Images and 3-D Surface Plots

Making grid
- meshgrid function converts vectors of points into matrices that can represent a grid of points in the x-y plane: [X,Y] = meshgrid(x,y)
Interpolating Scattered Data
- Interpolating irregularly located data to a regular grid requires two steps: 1) Using the scattered data to create an interpolating function and 2) Evaluating the interpolant at desired locations. Use griddata
- griddata function: the first three inputs represent the original data and the next two inputs contain the locations at which you would like to get the interpolated data: zInterp = griddata(xOrig,yOrig,zOrig, xNew,yNew);
Visualizing Surfaces:
- surf(X,Y,Z)
- The color of the lines between the patches is determined by the EdgeColor property: s.EdgeColor = 'interp'
Colormaps and Indexed Colors:
- The colors in a surface are determined by indexing into a color lookup table associated with the parent figure window, called a colormap.
- Each point on a surface has a color data value. These values are stored in the CData property of the surface.
- The color data value is mapped to a range of values. The range is set by the axes using the CLim property of the axes: c = ax.CLim
- The default colormap of a figure is called parula. You can modify this using the colormap function: colormap(jet)
Creating Indexed-Color Images
- Using pcolor: the pcolor function has the same syntax as surf. In fact, it actually creates a flat surface with ZData all set to 0 and CData set to Z.
- The direction of the y-axis can be changed using the axis command: axis xy or axis ij

Import Unstructured Data

Low-Level File I/O
- fopen function: to open a file. This does not open the file in an editor but instead opens a connection between MATLAB and the file for reading and writing the data: fi = fopen('economy.txt');
- The returned value is a unique file identifier used to reference the open file.
- Once the file is opened, you can use the file identifier to read data from the file. fgetl function takes the file identifier as input in order to read in the first line of code.: fgetl(fi) ans = date, unrate, gdp, feddebt
- Subsequent file reads will read in subsequent lines. The file position indicator moves to the beginning of the next line after each read. So when using fgetl the second time, it will return the second line of the file.
- use frewind command to move the file position indicator back to the beginning of the file: frewind(fi)
- To close a file that has been opened by fopen, use the fclose function: fclose(fi)
- if a file was opened, but an identifier was not stored, use fclose('all') to close all opened files.
Importing a Block of Formatted Data:
- You can read in data from a file with arbitrary formatting using the textscan function
- textscan function has two required inputs: a file identifier and a format specification string, e.g. data = textscan(fid,'%D%q%f%f%f'), this is to convert the first unit of data to a date, %D, the second to a string, %q, and the next three units to double precision numbers, %f. This pattern is repeated indefinitely, so the sixth unit is a date, the seventh is a string and so on.
- use cell indexing to extract the data from a column； secondColumn = data{2}
- %q specifier can read both numbers and strings
- Data is read from the file sequentially in blocks delimited by whitespace (by default) or a specific delimiting character (if provided): data = textscan(fid,'%D%f%q%C','Delimiter','\t')
- the data will stop being read as soon as a match was not found, even though there were additional matches later in the file. Generally, textscan matches the format string pattern as many times as possible until a match is not found.
- When a text file contains header lines, you can still import data by instructing textscan to skip a number of lines before attempting to read data: data = textscan(fid,'%D%f','HeaderLines',5);
- To read only a specific number of lines from a file, specify the number using a third input to textscan. The following code will read five rows: data = textscan(fid,'%D%f%f',5); Remember there will also be an indicator in textscan, so when it’s used the second time, it will not start from the beginning but the stop point from last time been called.
- Reading Sections with Headers:
Parsing Data in Text:
- use the strfind function to determine the index values where certain phrases or characters appear: iv = strfind('cat,dog,goat',',') iv = 4 8
- strsplit function can split up a line of text into individual strings in a cell array. This function will split on whitespace unless a delimiter is provided as an optional input value: C = strsplit('cat dog goat') C = 'cat' 'dog' 'goat'
- strcmp function can find a word within a list of strings in a cell array. This function will compare values and return a logical array: strcmp(C,'goat') ans = 0 0 1
- deblank function can remove trailing white spaces.data = deblank(data)
Processing Data in Blocks:
- You may need to programatically adjust your format specification string that you use to read in data with textscan. repmat function can help out: formatSpecString = repmat('%q',1,5); formatSpecString = '%q%q%q%q%q' fstr = ['%D' repmat('%f',1,3)] fstr = '%D%f%f%f'
- feof function can test if a file has reached the end. This can be used in a while loop to read in data until the end of the file: while ~feof(fid) ... end

Review of data types

Extract Data from a Table:
- use {} to extract the data as numeric format
- use () to extract the data as table format
Merge data:
- key variables are the variables that are common to all sources and uniquely identify each observation, or row: T12 = join(T1,T2)
- innerjoin: just select the observations that have key variables common to both tables: C = innerjoin(A,B);
- outerjoin: include every single observation (row) from both tables: C = outerjoin(A,B)
- Set the Mergekeys property to true will return a table where the key values of A and B are merged into one variable in the C: C = outerjoin(A,B,'MergeKeys',true);
Represent Dates and Times:
- To convert a cell array of strings of dates to a datetime array, use the datetime function: dates = datetime(dates);
- hour can get the hour time of the date