Workspace
Alexia Bailey/

Project: Analyzing Students' Mental Health in SQL

0
Beta
Spinner

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.

Explore the students data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

Field NameDescription
inter_domTypes of students (international or domestic)
japanese_cateJapanese language proficiency
english_cateEnglish language proficiency
academicCurrent academic level (undergraduate or graduate)
ageCurrent age of student
stayCurrent length of stay in years
todepTotal score of depression (PHQ-9 test)
toscTotal score of social connectedness (SCS test)
toasTotal score of acculturative stress (ASISS test)
Unknown integration
DataFrameavailable as
students
variable
-- Run this code to save the CSV file as students
SELECT * 
FROM 'students.csv';
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

The purpose of this project is to determine if I agree with the overall findings of the study. The first finding was that international students have a greater risk of depression. While I found some stats that conflict with that finding, there are two elements where I'd have to agree: The acculturative stress scores are higher for international students both in terms of the max and average scores. Also, the international students are more than twice as likely to score as severely depressed than their domestic counterparts.

As to the length of stay, I found that the overall patterns break at years 5 & 7; otherwise, the depression scores go up and the acculturation stress goes down over time. I think the detailed patterns are interesting and worth a look.

With that said, if you read below, you'll see the progress of my exploration of the data and interim hypotheses and conclusions.

--

First, let's uncover a few basic facts about the data. We know from the query above that there are 286 rows of data, so that means 286 students. How many of these are international vs. domestic students?

Unknown integration
DataFrameavailable as
df
variable
SELECT COUNT(*),
CASE WHEN inter_dom = 'Inter' THEN 'International'
     WHEN inter_dom = 'Dom' THEN 'Domestic'
	 ELSE NULL END AS type
FROM students
GROUP BY inter_dom;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Okay. We have 3X more international than domestic students in this population.

So let's see how our students score on the diagnostic tests. Let's start with the depression test (PHQ-9). I'm leaving the NULL values out for now since the goal is to look at differences between international and domestic students and we don't know this about those 18 students.

BTW, I wasn't familiar with the PHQ-9 test so I Googled it and learned that PHQ-9 scores of 5, 10, 15, and 20 represent mild, moderate, moderately severe, and severe depression, respectively. So the averages below, being between 8-9, represent something between mild and moderate depression, and our students with the MAX scores are severely depressed.

Unknown integration
DataFrameavailable as
df1
variable
SELECT inter_dom,
	   MIN(todep) AS min_phq,
       MAX(todep) AS max_phq,
	   ROUND(AVG(todep), 2) AS avg_phq
FROM students
WHERE inter_dom IS NOT NULL
GROUP BY inter_dom
ORDER BY inter_dom DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Interesting. It looks like the max PHQ is slightly higher for international students but the average is higher for domestic students. A smaller population size for domestic students could be at least a partial explanation since the smaller the dataset, the more it can be skewed.

The project guide doesn't ask for it, but I'm wondering if there's a difference between the genders. Let's take a look, shall we?

Unknown integration
DataFrameavailable as
df2
variable
SELECT gender, inter_dom,
	   MIN(todep) AS min_phq,
       MAX(todep) AS max_phq,
	   ROUND(AVG(todep), 2) AS avg_phq
FROM students
WHERE inter_dom IS NOT NULL
GROUP BY gender, inter_dom
ORDER BY inter_dom DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Hmmm. So the max PHQ is higher for the international students but the average, again, is higher for domestic students. Gender doesn't seem like it plays the role I might have thought - the male averages have the biggest swing, with the lowest average scores for international males and the highest average scores for domestic males. The domestic females scored higher on average than the international ones. So it really still feels like the international students are doing better, on average. Is it because they are sticking together and feel like they are part of a community of international students? Perhaps the highest scores are outliers that will happen regardless of international/domestic status? How many of those are there?

Unknown integration
DataFrameavailable as
df3
variable
SELECT COUNT(*),
FROM students
WHERE todep >= '20'
GROUP BY inter_dom
ORDER BY inter_dom DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Hmmm. There are 3X the number of international students in this study, but only 2 more international students who are severely depressed. So it still doesn't sound to me like an open and shut case that the international students are particularly disadvantaged. Maybe it will show up in one of the other scores. Let's take a look at the tosc data and see how social connectedness plays into things.

Unknown integration
DataFrameavailable as
df4
variable
SELECT inter_dom,
	   MIN(tosc) AS min_tosc,
       MAX(tosc) AS max_tosc,
	   ROUND(AVG(tosc), 2) AS avg_tosc
FROM students
WHERE inter_dom IS NOT NULL
GROUP BY inter_dom
ORDER BY inter_dom DESC;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

I was also unfamiliar with SCS scores so I Googled it and learned that it's a score calculated by summing the self-reported responses to 20 questions on a scale from 1-6 from strongly disagree to strongly agree. The higher the score, the greater the sense of social connectedness. The numbers above look pretty close, with the largest delta in the minimum scores. This suggests that there's a higher floor on how disconnected the international students are but otherwise, the two populations aren't that different. That does fit in with my theory about the international students intentionally bonding with each other in a shared experience of being immersed in a new culture. Which is a good segue since the next thing to explore is the toas column, which is a measure of acculturative stress.




  • AI Chat
  • Code