intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Lecture Administration and visualization: Chapter 6 - Tools for data visualization

Chia sẻ: _ _ | Ngày: | Loại File: PDF | Số trang:33

7
lượt xem
4
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Lecture "Administration and visualization: Chapter 6 - Tools for data visualization" provides students with content about: Three kinds of visualization; Mathematical visualization; Scientific visualization; Information visualization; Introduction to pandas, numpy; Introduction to matplotlib;... Please refer to the detailed content of the lecture!

Chủ đề:
Lưu

Nội dung Text: Lecture Administration and visualization: Chapter 6 - Tools for data visualization

  1. 1
  2. Chapter 6: Tools for Data Visualization Lecture 0: Introduction to Course 2
  3. Outline 1. Overview 2. Introduction to pandas, numpy 3. Introduction to matplotlib 3
  4. 1. Overview • Three kinds of visualization • Mathematical Visualization • Scientific Visualization • Information Visualization 4
  5. Mathematical Visualization • Data results from a mathematical equation • Missing data can be readily generated by a computer program 5
  6. Scientific Visualization • Visualization of scientific data • Data measured from real world scientific devices or come from expensive simulations • Coordinate data • Spatial coordinates • Temperature, pressure • time 6
  7. Information Visualization • Visualization of more abstract, non- coordinate data • Process abstract data into a more concrete form that can be more effectively perceived by an observer 7
  8. Modes of Visualization Interactive Presentation Visualization Visualization • Used for • Used for discovery communication • Intended for a • Intended for large single investigator group or mass • Re-renders based audience on user input • Does not support user input 8
  9. Goal of visualization • Comparison • Distribution • Relationship 9
  10. Data Visualization Framework 10
  11. Data Types Discrete Continuous Ordered (values are comparable) Unordered (values are not comparable) 11
  12. 2. Introduction to numpy, pandas • Python provides some library to manipulating with data • numpy: a basic library to working with arrays • pandas: another library with more functionalities 12
  13. Numpy • Stands for “Numerical Python” or “Numeric Python” • Introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects • Provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance • Many other python libraries are built on NumPy Link: http://www.numpy.org/ 13
  14. Pandas • Adds data structures and tools designed to work with table-like data (similar to Series and Data Frames in R) • Provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc. • Allows handling missing data Link: http://pandas.pydata.org/ 14
  15. Loading Python Libraries In [ ]: #Import Python Libraries import numpy as np import scipy as sp import pandas as pd import matplotlib as mpl import seaborn as sns 15
  16. Reading data using pandas In [ ]: #Read csv file df = pd.read_csv(URI) Note: URI contains the link to the data file The above command has many optional arguments to fine-tune the data import process. There is a number of pandas commands to read other data formats: pd.read_excel('myfile.xlsx',sheet_name='Sheet1’, index_col=None, na_values=['NA']) pd.read_stata('myfile.dta') pd.read_sas('myfile.sas7bdat') pd.read_hdf('myfile.h5','df') 16
  17. Exploring data frame import pandas as pd import seaborn as sns import matplotlib.pyplot as plt iris = pd.read_csv("../input/Iris.csv") iris.head() 17
  18. Data Frame data types Pandas Type Native Python Type Description object string The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings). int64 int Numeric characters. 64 refers to the memory allocated to hold this character. float64 float Numeric characters with decimals. If a column contains numbers and NaNs(see below), pandas will default to float64, in case your missing value has a decimal. datetime64, timedelta[ns] N/A (but see the datetime module Values meant to hold time data. in Python’s standard library) Look into these for time series experiments. 18
  19. Data Frame data types In [4]: #Check a particular column type df['salary'].dtype Out[4]: dtype('int64') In [5]: #Check types for all the columns df.dtypes Out[4]: rank object discipline object phd int64 service int64 sex object salary int64 dtype: object 19
  20. Data Frames attributes df.attribute description dtypes list the types of the columns columns list the column names axes list the row labels and column names ndim number of dimensions size number of elements shape return a tuple representing the dimensionality values numpy representation of the data 20
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2