Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

MIS 665 - MT1

Midterm 1 MIS 665 Business Analytics

TermDefinition
importing pandas package import pandas as pd
Read csv file df = pd.read_csv('data/file.csv')
save to csv df.to_csv('data/output.csv')
Show first 5 rows df.head()
show column names df.columns
show count of (rows, columns) df.shape
basic statistics df.describe()
descriptive statistics (mean, median, etc.) df['col'].mean()
select single column df['col']
select multiple columns df[['col1', 'col2']]
filter rows df.loc[df['col'] == 'value'] # (wrap in len() for count)
sort values d.sort_values('col')
count values (freq. table) df['col'].value_counts()
group and count values df.groupby('col').size()
convert to list df.values.tolist()
Steps to visit a website 1. IMPORT packages, 2. get CHROME driver, 3. enter the site with DRIVER.GET('link')
xpath by class //div[@class=='example']
xpath by attribute //td[@date-stat='player]
find elements driver.find_elements(By.XPATH, '//div') # returns list row.find_element # returns single element
get url or class element.get_attribute('href') # or 'class'
initialize dictionary data = {'Col1': [], 'Col2': []}
loop through rows for row in driver.find_elements( By.XPATH, //div[@class=='item']']:
convert to dataframe df = pd.DataFrame(data)
try / except try / except
check data types df.info()
count missing df.isnull().sum()
drop missing rows df.dropna()
remove $ df['Price'].str.replace('$', '')
to numeric df['Col'] = pd.to_numeric(df['Col'])
create new columns df['new_col'] = df['sales'] * 0.34
3 blocks import --> driver --> get URL
ETL Extract, Transform, Load
fill missing values df.fillna(['Col': 'value'}) OR df['Col'] = df['Col'].fillna(df['Col'].median())
remove columns (1 means columns) df.drop('Col1', axis=1)
check for duplicates df.duplicated().sum()
drop duplicates df.drop_duplicates()
reset index after removing df.reset_index(drop=True)
remove whitespace str.strip()
replace chars str.replace(,)
search text str.contains()
merge data pd.merge(df1, df2, on='key')
join only matching rows how='inner'
join all from left + matches how='left'
join all from right+ matches how='right'
join all rows from both how='outer'
what to check first df.info() and df.isnull.sum()
how to find inconsistent text values value_counts()
advanced filtering and &
advanced filtering or |
advanced filtering not ~
check if value in list .isin
check between ranges .between
query SQL-like syntax df.query('Dept == "Sales" and Age > 25')
count rows per group df.groupby('Col').size()
aggregations .agg(['sum', 'mean', 'min'])
pivot table df.pivot_table(index='Row', columns='Col', values='Data', aggfunc='sum')
time series - set date as index df.set_index('date'), df.sort_index()
resample by time period df.resample('ME').sum() # Monthly, yearly is YE
filter by date range df.loc['2026-06']
count row per group groupby().size()
multiple aggregations at once .agg()
numerical data chart (distribution) df['col'].plot.hist() OR .box()
categorical data (counts) df.groupby('col').size().plot.bar()
Two numerical columns (relationship) df.plot.scatter(x=, y=)
Categorical + numerical (mean values) df.groupby('cat')['num'].mean().plot.bar()
count by category chart df.groupby('').size().plot.bar()
use .unstack() when doing a multi-column groupby chart
seaborn violin plot sns.violinplot(x=, y=, data=)
correlation matrix df.corr(numeric_only=True)
correlation with target column df.corr()['Target'].sort_values()
charts for numerical distribution histogram / boxplot
chart for categorical counts / comparisons bar chart
use .unstack() after groupby for grouped bars use .unstack() after groupby for grouped bars
stacked=True for stacked bar charts stacked=True for stacked bar charts
Divide by row totals for proportion charts Divide by row totals for proportion charts
correlation overview with seaborn sns.heatmap(df.corr())
statistically significant p-value p-value < 0.05
What is SQL Structured Query Language
steps to connect to database in python import package, connect with create_engine, .get_table_names(), pd.read_sql()
steps in time series set date time index, assign date time index as index, sort index
SQL Query Structure SELECT [DISTINCT] <columns> FROM <table> [WHERE <condition>] [GROUP BY <columns>] [HAVING <condition>] [ORDER BY <columns>] [LIMIT <n>]
SELECT * FROM table df
SELECT col1, col2 FROM table df[['col1', 'col2']]
DISTINCT .unique()
UPPER() .str.upper()
WHERE clause filtering
AND &
OR |
IS NULL .isna()
IS NOT NULL .notna()
COUNT(*), SUM(), AVG(), MAX(), MIN() SQL aggregate functions
SELECT col, COUNT(*) FROM t GROUP BY col df.groupby('col').size()
GROUP BY HAVING .groupby() # filter after groupby
ORDER BY col ASC .sort_values()
LIMIT 5 .head()
SQL: SELECT * FROM t1 INNER JOIN t2 ON t1.key = t2.key Pandas: pd.merge(df1, df2, on='key', how='')
AS new_name .rename(columns={})
# display the people from 'Kansas' df.loc[df['location'] == 'Kansas'].head()
Created by: lexi.welte
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards