Workspace
Symon Jasmin/

Exploring Astronaut Activities in SQL - Webinar

0
Beta
Spinner

Extravehicular activity (EVA)

is any activity done by an astronaut in outer space outside a spacecraft. In the absence of a breathable Earthlike atmosphere, the astronaut is completely reliant on a space suit for environmental support. EVA includes spacewalks and lunar or planetary surface exploration (commonly known from 1969 to 1972 as moonwalks). In a stand-up EVA (SEVA), an astronaut stands through an open hatch but does not fully leave the spacecraft.[1] EVAs have been conducted by the Soviet Union/Russia, the United States, Canada, the European Space Agency and China.

On March 18, 1965, Alexei Leonov became the first human to perform a spacewalk, exiting the Voskhod 2 capsule for 12 minutes and 9 seconds. On July 20, 1969, Neil Armstrong became the first human to perform a moonwalk, outside his lunar lander on Apollo 11 for 2 hours and 31 minutes. In 1984, Svetlana Savitskaya became the first woman to perform a spacewalk, conducting EVA outside the Salyut 7 space station for 3 hours and 35 minutes. On the last three Moon missions, astronauts also performed deep-space EVAs on the return to Earth, to retrieve film canisters from the outside of the spacecraft. American Astronauts Pete Conrad, Joseph Kerwin, and Paul Weitz also used EVA in 1973 to repair launch damage to Skylab, the United States' first space station.

EVAs may be either tethered (the astronaut is connected to the spacecraft; oxygen and electrical power can be supplied through an umbilical cable; no propulsion is needed to return to the spacecraft), or untethered. Untethered spacewalks were only performed on three missions in 1984 using the Manned Maneuvering Unit (MMU), and on a flight test in 1994 of the Simplified Aid For EVA Rescue (SAFER), a safety device worn on tethered U.S. EVAs.

Exploring our data

Let's start by looking at the table we will be working with.

Unknown integration
DataFrameavailable as
df
variable
SELECT *
FROM evas
ORDER BY duration DESC
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Let's inspect the purpose column in greater detail.

Unknown integration
DataFrameavailable as
df
variable
SELECT purpose
FROM evas
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

What are the most common types of EVAs?

extravehicular activity (EVA), otherwise known as a spacewalk

Unknown integration
DataFrameavailable as
df
variable
SELECT purpose,
	CASE WHEN purpose ILIKE '%photos%' 
		THEN 1
		ELSE 0
		END as photography,
	CASE WHEN purpose ILIKE '%collect%' 
		THEN 1
		ELSE 0
		END as collection,
	CASE WHEN purpose ILIKE '%construct%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%'
		THEN 1
		ELSE 0
		END as installation,
	CASE WHEN purpose ILIKE '%replace%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%service%'
		THEN 1
		ELSE 0
		END as repair
FROM evas;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

building from the previous query!

Unknown integration
DataFrameavailable as
df
variable
WITH purposes AS (
	SELECT purpose,
		CASE WHEN purpose ILIKE '%photos%' 
			THEN 1
			ELSE 0
			END as photography,
		CASE WHEN purpose ILIKE '%collect%' 
			THEN 1
			ELSE 0
			END as collection,
		CASE WHEN purpose ILIKE '%construct%' OR purpose ILIKE '%install%' OR purpose ILIKE '%assembl%'
			THEN 1
			ELSE 0
			END as installation,
		CASE WHEN purpose ILIKE '%replace%' OR purpose ILIKE '%repair%' OR purpose ILIKE '%service%'
			THEN 1
			ELSE 0
			END as repair
	FROM evas
)

SELECT SUM(photography) AS count,
		'photograpy' AS type
FROM purposes
UNION
SELECT SUM(collection) AS count,
		'collection' AS type
FROM purposes
UNION
SELECT SUM(installation) AS count,
		'installation' AS type
FROM purposes
UNION
SELECT SUM(repair) AS count,
		'repair' AS type
FROM purposes
ORDER BY count

This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

How much material has been extracted from EVAs?

Skimming through the purpose column, we also saw numerous references to extracting rock/dust or geological material. In this case, it will be difficult to extract the total quantity across the columns. Regular expressions to the rescue!

We will define a pattern to extract the total pounds extracted per EVA, and then sum them up. Let's first do a sense check of the data.

Unknown integration
DataFrameavailable as
df
variable
SELECT purpose
FROM evas
WHERE purpose ILIKE '%geologic%' or purpose ILIKE '%rocks%'
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Okay, we now know that the format of the pounds extracted is always number lbs of rock/geologic. construct a pattern to detect this and extract the number!

To do so we will make use of:

  • \d+ to match one or more digits.
  • \.? to match zero or more periods.\
  • * to match zero or more digits following the optional decimal place.
  • () to specify we only want the digits.
  • \s to match the whitespace (i.e., spaces).
  • [] and | to specify we either want to match "geologic" or "rock".
Unknown integration
DataFrameavailable as
df
variable
SELECT purpose,
		SUBSTRING(purpose, '(\d+\.?\d*) lb of ((rock)|(geologic))') as weight
FROM evas
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Now we can use a CTE to calculate the total amount!

Unknown integration
DataFrameavailable as
df
variable
WITH weights AS (
	SELECT purpose,
			SUBSTRING(purpose, '(\d+\.?\d*) lb of ((rock)|(geologic))') as weight
	FROM evas
)

SELECT SUM(weight ::NUMERIC) 
FROM weights
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

Which astronauts have the most time in EVAs?

We also have information on how much time each EVA took, as well as the astronauts who participated. Let's use this information to try and calculate totals for each astronaut!

Unknown integration
DataFrameavailable as
df
variable
SELECT 
	crew,
	SPLIT_PART(crew, ',', 3) AS third_astronaut
FROM evas
WHERE SPLIT_PART(crew, ',', 3) != ''
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

we know how we can extract the time for each astronaut! Let's create a CTE we can then use to piece together duration information for each astronaut.

Unknown integration
DataFrameavailable as
df
variable
WITH astrounauts_split AS (	
	SELECT 
		crew,
		SPLIT_PART(crew, ',', 1) AS frist_astronaut,
		SPLIT_PART(crew, ',', 2) AS second_astronaut,
		SPLIT_PART(crew, ',', 3) AS third_astronaut,
		duration
	FROM evas
)
SELECT *
FROM astrounauts_split
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

splitting apart this table and appending each set of results to one table.

Unknown integration
DataFrameavailable as
df
variable
WITH astronauts_split AS (	
	SELECT 
		crew,
		SPLIT_PART(crew, ',', 1) AS first_astronaut,
		SPLIT_PART(crew, ',', 2) AS second_astronaut,
		SPLIT_PART(crew, ',', 3) AS third_astronaut,
		duration
	FROM evas
),
astronauts_duration AS (
	SELECT 
		first_astronaut AS astronaut,
		duration
	FROM astronauts_split
	WHERE first_astronaut != ''
	UNION ALL
	SELECT 
		second_astronaut AS astronaut,
		duration
	FROM astronauts_split
	WHERE second_astronaut != ''
	UNION ALL
	SELECT 
		third_astronaut AS astronaut,
		duration
	FROM astronauts_split
	WHERE third_astronaut != ''
)
SELECT 
	astronaut,
	SUM(duration) AS total_duration
FROM astronauts_duration
GROUP BY astronaut
ORDER BY total_duration DESC
LIMIT 10
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.

cumulative amount of time spent in EVAs over time?

Finally, let's take a look at the cumulative time spent in EVAs by year and space program.

Unknown integration
DataFrameavailable as
df
variable
SELECT
	TO_DATE(year::TEXT, 'YYYY') as year,
	program,
	duration,
	SUM(duration) OVER (PARTITION BY program ORDER BY year) AS cumulative_duration
FROM(
	SELECT
		year,
		program,
		SUM(duration) AS duration
	FROM public.evas
	GROUP BY year, program
	ORDER BY year, program
	) AS sub
ORDER BY year, program;
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.