this is the nav!
Workspace
Jaleann McClurg/

# Get Started Analyzing Survey Data with SQL & Python

0
Beta

## .mfe-app-workspace-kj242g{position:absolute;top:-8px;}.mfe-app-workspace-11ezf91{display:inline-block;}.mfe-app-workspace-11ezf91:hover .Anchor__copyLink{visibility:visible;}Code-along | 2024-01-23 | Analyzing Survey Data with SQL & Python | Richie Cotton

This code-along analyses data from a survey about the growth of Finnish companies. The data reports the perceptions of top managers on growth, innovativeness, and the ability for renewal.

#### Where is the data from?

• Suominen & Pihlajamaa, 2022
• The dataset

#### What will I learn today?

• How to summarize and visualize questions with a numeric response using a histogram.
• How to determine whether there is a difference between two groups of numeric responses using a Mann-Whitney U test.
• How to summarize and visualize questions with a categorical response using a bar plot.

For this analysis we need the `plotly.express` package for drawing histograms and bar plots.

We'll also need the `mannwhitneyu` function from the `scipy.stats` package to perform the Mann-Whitney U test.

#### Instructions

Import the following packages.

• Import `plotly.express` using the alias `px`.
• From `scipy.stats` import the `mannwhitneyu` function.
```.mfe-app-workspace-11z5vno{font-family:JetBrainsMonoNL,Menlo,Monaco,'Courier New',monospace;font-size:13px;line-height:20px;}```# Import plotly.express using the alias px
import plotly.express as px

# From scipy.stats import the mannwhitneyu function
from scipy.stats import mannwhitneyu ``````

### Task 1: Import the Survey Dataset

The survey data is contained in a CSV file named `"What_does_it_take_to_generate_new_growth_Survey_data.csv"`.

#### Data dictionary

The dataset contains the following columns.

• `Growth_Firm`: Is the company (firm) currently classified as a growth company under OECD definitions?
• `question_2_row_1_transformed`: The responses to question 2, part 1 (with some pre-applied transformation).
• `question_2_row_2_transformed`: The responses to question 2, part 2 (with some pre-applied transformation).
• `question_3_row_1`: The responses to question 3, part 1.
• ...
• `question_7_row_1`: The responses to question 7, part 1.

The details of each question are fully described in `survey_questions.csv`, and we'll cover the details of the specific questions that we look at as we come to them in the tasks here.

#### Instructions

Use SQL to import the survey data.

• Select everything from `survey_data.csv`.
• This uses European style CSV settings, so you can't use the default CSV reading settings.
• Set the column delimiter to a semi-colon.
• Set the decimal separator to a comma.
• Set the null string to a space.
• Assign to a DataFrame named `survey`.
Code hints

• Workspace lets you import from a CSV file into a SQL query by calling DuckDB's `read_csv_auto()` in the `FROM` clause.

• `delim` sets the column delimiter.

• `decimal_separator` sets the decimal separator.

• `nullstr` sets the value used for NULLs (missing values).

Unknown integration
DataFrameavailable as
survey
variable
``````-- Select everything from survey_data.csv
SELECT * FROM read_csv_auto('survey_data.csv',delim=";", decimal_separator=",",nullstr=" ")
``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

The dataset doesn't contain the actual questions that were asked. To find out what the questions are, we can look up the column titles in the data dictionary contained in `survey_questions.csv`.

#### Instructions

Use SQL to import the data dictionary for the survey questions.

• Select everything from `survey_questions.csv`.
• This uses the default read CSV settings.
Code hints

• If you are importing a file from CSV with the default `read_csv_auto()` settings, then Workspace lets you simply type the file name in the `FROM` clause.

Unknown integration
DataFrameavailable as
dictionary
variable
``````-- Select everything from survey_data.csv
SELECT * FROM 'survey_questions.csv'``````
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

• AI Chat
• Code