Databases are a necessity for any modern application. You might choose to go the traditional relation route, or choose a more hip NoSQL option. As developers we are often taught to load/query based on primary keys because it is more performant. The downside to this approach is you might possibly be leaking sensitive information that you never intended. How is this possible?
SQL / Relational Example
Ids in relational databases are usually integers or some variation of that, this allows us to get really succinct and clear urls in our web apps. Let’s take the following url as an example.
That is a clean url, but let’s look at it from an outsiders perspective. What might we be able to guess from the url.
- We are looking at User #100
- We are looking at Order #1000
We might also be able to glean how much each order is based on site information. This site might be selling $20 trinkets. So let’s break it down.
100 users * 1000 orders * $20 = $2,000,000
I now know that this business has a likely income of $2,000,000 dollars. This is sensitive information that the business might not want getting out, but as developers we may have just spilled the beans.
Let’s take Tech.Pro for example. I know that they use SQL Server on the backend and each post is given a numeric id. This post’s id is 1538. It might be a safe guess to make that there are 1538 posts on Tech.Pro as of writing this post.
NoSQL (RavenDB) Example
The default behavior of RavenDB is to associate the domain model name with your Id.
if you are using sharding your id might look like this.
If this Id is publicly visible, what can we tell about our application?
- You have a Shard based on the geographic region of “USA”
- You have a domain model of Customers
- You potentially have 1000+ customers
the last point of 1000+ customers is iffy since RavenDB uses a hilo algorithm, but it might be close enough to get an understanding of the user base. Additionally, you know that the developer has a domain object of customers. Knowing how a developer is structuring an app might give you insight into how to break their app.
The solution is really simple, but complex in its implementation.
You need to generate your own public facing Ids.
Some ways to generate it might include the following data.
- Date and Time
- Guid (non-sequential)
- Counts relative to that specific user (if user data)
- Slugs based on public data
The above solution will work for both RDMBS and NoSQL databases, since your Id will likely be part of your primary or secondary indexes.
Next time you think about exposing a default Id from your database, think about the implications to your business and ask yourself: “Do I really want people knowing this information?”
P.S. Sorry for the terrible Adult diapers joke in the title 🙂