Your SQL's probably not that bad (compared to most people)

Jul 4, 2024

Happy July 4th to those who celebrate.

Just the other day on the Discord we were having a chat about one of the most inane interview questions that could be asked of a data position: "where would you rate your SQL skills?" and it set off... some feelings amongst a bunch of people, myself included.

Even from a purely objective viewpoint, these sorts of questions are crap. There's been plenty of studies and reports that show that most people who aren't cis white men tend to rate themselves lower when talking about their skills and competence.

But aside from that, the people who worry about whether they write good or bad SQL are probably already well ahead of the curve in terms of writing good SQL. The fact they care to even ask the question means they've put more thought into it than a lot of people who just touch SQL briefly without any performance or runtime constraints.

Most of us in the discussion are aware of people who write really really good SQL, either in terms of solving complex problems or writing highly optimized code to run databases. And thanks to that, many of us had trouble figuring out just what o say as to our skill level. If the best people we know are considered 'experts' then does that make us 'intermediate'? But then what about all the beginners, all the people who don't use SQL every day of their professional lives? Surely we must be better than those folk.

So today, I just wanted to briefly ramble a bit – not about how to run a SQL interview because I explicitly do not give SQL interviews, but about "when can I comfortably say 'I know enough SQL to do my job'" at an interview.

The bar for SQL competence is... probably both lower and higher than you can imagine.

Lower because SQL is a relatively simple language that is pretty easy to learn. At work, when I have to interview for a Quant UX Researcher position, we explicitly do not require SQL knowledge. Most people already have it, but we literally believe that anyone who can wield a general purpose programming language (R, Python, C, etc. which IS one of the requirements) can learn SQL so quickly it's irrelevant to the interview process. The functionality space that we have to test for in any kind of SQL test is quite small – you can count off the most relevant topics with just two hands.

Higher because, despite its simplicity, I still run across people who say they can use SQL but then can't write a simple join or group-by. There's a lot of bad, nearly broken SQL that's being written and used every day by SWEs and that most of us won't ever see and compare ourselves to.

When I think about SQL interviews, I always wind up looking up TJ Murphy's post about how he does SQL interviews. While I'm no slouch with the language, TJ's about one or two orders of magnitude better at it than I am. The stuff he tests for is pretty able pinpoint gaps in a person's knowledge of the language.

TJ Murphy on LinkedIn: I have done 100s of analytical SQL coding interviews. My standards are… | 10 comments
I have done 100s of analytical SQL coding interviews. My standards are higher than most. Here’s what you need to know to pass – » HOW IT WORKS My interview is… | 10 comments on LinkedIn

But if you look what he's looking for, a lot of stuff should already be in your toolbox – joins, CTEs, filtering, aggregation. Things can get a bit more exotic with range joins and window functions, but those are put in the "hard" section.

More importantly, there's not a lot left in the SQL language to test outside of those topics. SQL has always been about composing the basic ideas in complex ways to get correct results. The difficulty is more in mentally juggling all the columns, joins, and data transformations in your head without introducing unwanted duplication in the output.

Most data practitioners who have to wrangle with unwieldy SQL databases on a regular basis are going to have decent, "good enough to get the job done" SQL skills. You're practically guaranteed to have worked through all the common operations of selecting, joining, filtering, CTEs, subqueries, aggregation, and maybe some window functions. And you know what? That's usually more than enough because it gets the job done. If you already know these solidly intermediate/advanced parts of SQL, you can learn any of the more obscure stuff if the need ever arises.

And let's be honest, "good enough to do the job people pay me to do" is... good enough. Knowing that much already means you're comfortable with SQL enough to be able to answer arbitrary questions from stakeholders. Sometimes you need to use an obscure window function and have to spend a day looking it up and testing it. Does it really matter that you can't remember the exact syntax off the top of your head? No, it shouldn't.

So yeah, go ahead and be confident that you know SQL. If you can be confident that you can field random requests from teams that need data, no matter how weird their request is, you can be confident in saying you know how to work with SQL.