Author here: I should clarify the satellite is not running Windows. Instead, it’...

barbegal · on Sept 25, 2024

Could I ask you to clarify why avoiding safemode is so important? In a non satellite system safemode means everything is driven to a safe state which is fine during testing in the lab.

Also do you not run these tests in an even more simulated environment where there is only the flight computer and no real hardware at all?

linebeck · on Sept 25, 2024

Having discussed this same question with the more experienced members of my team, the only conclusion I can draw is that the customer (US Government) is incredibly risk averse. Any unexpected entry into safemode would require a report, multiple meetings with the customer, and them being pretty angry. Their line of reasoning seems to be "Safemode->Something is wrong->Why is something wrong? We're not paying you to be wrong". I'm personally of the opinion that safemode isn't that bad. It's fully recoverable and shows the system is working properly.

We normally have a Functional Test Assembly (real computer and some other hardware for testing) to run our tests against, but we only have one setup and it is consistently unreliable. This particular CLT was unable to get a clean run in the lab but it was decided that the issues were related to the lab setup rather than the actual test, so we moved forward to run on the satellite (against our team's protests).

This to me is the real crux of the issue: if we can't even trust our own testing environment, what's the point of having it at all? If the customer is so risk averse, why would we take this chance? Needless to say, I don't think we'll be running anything on the satellite without full FTA vetting anytime in the near future.

Jtsummers · on Sept 25, 2024

> Any unexpected entry into safemode would require a report, multiple meetings with the customer, and them being pretty angry. Their line of reasoning seems to be "Safemode->Something is wrong->Why is something wrong? We're not paying you to be wrong". I'm personally of the opinion that safemode isn't that bad. It's fully recoverable and shows the system is working properly.

To the last part first: Good that safe mode kicked in and did the right thing, but now what? What caused it to enter safe mode in the first place?

That's why they care when it happens. If they don't know why it's entering safe mode, they can't correct the actual problems in the system.

axus · on Sept 25, 2024

"Safemode is when all non critical functions are automatically shut down and the satellite becomes entirely focused on generating power by pointing its solar panels towards the Sun and trying to reestablish any communication that was lost."

The non-critical functions are all the things the customer actually bought the satellite for. Cool that it's still alive, but now the Space Internet / death lasers / etc. are offline.

linebeck · on Sept 26, 2024

There are faults IDs that trip if certain telemetry goes outside of a normal range. If a safemode were to occur, we would investigate which faults tripped and at what time, and use those to construct a "story" of what happened on the satellite before it entered safemode. We're also constantly recording every telemetry that comes down, so we could reference any telemetry we wanted as far back as months in the past.

To your point, yes you're correct. The cause of the safemode is much more interesting than the fact we entered it.

minetest2048 · on Sept 26, 2024

> We normally have a Functional Test Assembly (real computer and some other hardware for testing) to run our tests against, but we only have one setup and it is consistently unreliable

Its interesting to see that someone with a 2B budget have the same problem as someome with 5 million budget... we have an engineering model for our cubesats but its flaky

yashap · on Sept 26, 2024

I enjoyed the humour, and the content. Personally I wouldn’t change it - it’s kind of a click-bait title, but I never would have read the article if it had a boring title, and I am glad I read it.

akira2501 · on Sept 26, 2024

Can you speak at all as to how the development on this software is done? Is it distributed with centralized version control? Does release and engineering process interact with the version control at all? Are there mechanisms that link defect reports, corrections, and sign offs back to version control and into the build system?

I got lost recently in how the Shuttle software was managed, mostly through IBM mainframes, and z/OSs facilities for all the above. I'm curious how modern development looks in comparison.

jdiez17 · on Sept 26, 2024

> I got lost recently in how the Shuttle software was managed, mostly through IBM mainframes, and z/OSs facilities for all the above. I'm curious how modern development looks in comparison.

Do you have any references for this? I also recently went down a research rabbit hole of the history of computing on Earth and in space - super interesting stuff. And the parallels are quite obvious when you look at it.

akira2501 · on Sept 28, 2024

> Do you have any references for this?

Oh yea. https://ntrs.nasa.gov/citations/20090001334

> And the parallels are quite obvious when you look at it.

The insane level of detail and strategy when writing the shuttle software is something to behold. The testing laboratory SAIL was a full scale orbiter that actually flew test missions. "Day of use I-Loads" are one of my favorite things. They couldn't change the software load, but they could move some constants around before launch, really useful for feeding wind data into the shuttle before it launched.

jdiez17 · on Sept 30, 2024

Thanks!

linebeck · on Sept 26, 2024

FSW development is done by a different team than mine but I believe it's just managed through gitlab. Releases are done through tags, and any updates that need to be made have tickets created for them and are developed by the FSW team. Final approval is given by certified product engineers and then a new tag is created for that release. Like I said this is a different team but from what I've seen the process is fairly modern given how old our hardware is. I'm not sure of the exact process of how it's loaded onto the satellite through.

topspin · on Sept 25, 2024

I understood you were using an analogy. Didn't even occur to me that Windows was actually being used.

However, I did come away thinking there are other dysfunctions at play in all of this. Perhaps an excessive amount of wheel re-inventing.

wrs · on Sept 26, 2024

Technical blog pro tip: Assume that many of your readers are VERY literal-minded, and many of your other readers like their humor obscure and as deadpan as possible. Sorry.