CognitiveCarbon’s Content – The guy with the technical know-how AND communication skills.

MongoDB: What is it, and how did it come into play as part of the shocking disclosure that took place at “The Pit”?

On August 13, 2022, Catherine Englebrecht and Gregg Phillips described what they found last year on an unsecured server in China: personal information for 1.8 million US election workers…and more

CognitiveCarbon 👆

Catherine Englebrecht of True the Vote and Gregg Phillips of OPSEC organized an event on Saturday, August 13th in Arizona called “the Pit.”  

The purpose of the event was to bring together 100-150 journalists, social media influencers, researchers and other interested parties to discuss and disclose new information related to election “irregularities” that True the Vote had found—in part, to shed some additional light on the results of the 2000 Mules movie that relied on Gregg’s geolocation tracking of cell phones during the time around the November 2020 election, and also in part to share new information that they had come across in the interim. 

This article is based upon my notes and recollection of what was shared at the event; any errors in relaying the story accurately are entirely my own.  

I am going to focus primarily on the technical aspect of what was shared; there are others who are writing about the broader issues and shockingly serious implications of the FBI’s engagement (or lack thereof) with the data that Gregg and Catherine attempted to bring to their attention. 

But my area of expertise is in the technical details, so I’ll attempt to shed some light there that others haven’t yet covered.

As you may know if you saw the movie, the GPS location data from cell phones that Gregg obtained —a small portion of which was used in 2000 Mules —was processed so as to help identify “patterns of life” for people who were caught illegally stuffing ballot boxes in many cities across the nation. 

These so-called “Mules” were ballot traffickers— as the movie depicts, they were deployed, in some coordinated fashion by yet unknown entities across various swing state cities during November of 2020 to evidently illegally influence the 2020 election results. 

Gregg’s team or other parties who were working with him were continuing to take a hard look at election patterns in the 2022 primaries and beyond, and they were also looking more broadly at a wide spectrum of election technology providers to understand better how the actions of the Mules fit into the bigger picture of executing the fraud that was exposed in 2000 Mules.

As part of this Arizona Pit event, Gregg and Catherine described an incident that arose during research that their teams had been conducting on various suppliers of software to US election agencies.

During the course of probing one such software provider, in early 2021 Gregg and his team stumbled across an IP address for a server that was purportedly associated with a company named Konnech, at least according to the records of services that track IP address ownership and location.

That IP address, it turns out, was located in China—it was evidently used by some instances of the software application for a period of time, before switching to a new IP address in Grand Rapids Michigan. 

Geolocation tools that I used suggest that the server that was hosting this address in China was somewhere near Hangzhou, possibly somewhere near Zhejiang University.

Konnech makes software to service various parts of the election process for US and other countries. One of their modules is called PollChief; this is a resource management tool for helping election agencies manage their poll workers who staff various polling locations on election days. 

It manages, among other things, schedules of poll workers, and includes the details necessary to recruit, retain and pay them. It, and the broader suite of software can be used to keep track of all sorts of logistics information about election equipment such as where is its physically located—for inventory purposes during non-election season, but also when and where that equipment is deployed for running elections. There are also modules marketed by Konnech that are involved with the process of casting ballots themselves for certain groups of people.

The software itself isn’t immediately concerning at a surface level; but the fact that one instance of it was apparently connecting to a server in China certainly raised some eyebrows—which warranted a closer look.

While Gregg and his team were investigating, they ran some routine cybersecurity checks to see what services were being used by that Chinese IP address to determine what was behind it. One of these routine “scans” showed a port on that IP address—27017—that is typically used by a database application called “MongoDB” (I’ll explain some of these terms more fully in just a bit.) 

That was interesting and somewhat unexpected, since the applications from Konnech were ostensibly using SQL databases (more on that in a bit) and therefore this find was worth exploring.

Let’s take a technical look at what MongoDB is before we get to what it is that they reportedly discovered.

But first, we’ll take a step back to bring you up to speed on some basics and explain some of the terms. Databases are software systems that store large collections of data for fast lookup, correlation, reporting, and retrieval by software applications.  

When you see a web-based application these days, somewhere behind it is a database server serving up the “personalized” information that you see displayed to you on the web pages of the application.

A popular form of a database system in use these days is called an “SQL” server (pronounced “See-Kwel”.) Microsoft is one provider of SQL servers, but there are many others.

For those who are curious, SQL Servers are based on the mathematics of set theory (if you’ve used a Venn Diagram, or if you took some advanced math in high school or college you probably have a basic idea of what sets are) and they also rely heavily on something called Linear Algebra. This is a branch of math that deals with vectors and 2D, 3D or higher dimension “matrices”. This is what makes them fast and efficient and manipulating data.

If you’re familiar with Microsoft Excel, then you know that a spreadsheet is just a simple form of a 2D matrix of information: a sheet in Excel has rows and columns of cells which contain and organize data.  

A SQL server is analogous in some ways to a spreadsheet, but it is more sophisticated. It stores data across many “tables” (think sheets in Excel) — and tables that are meant to link together in some fashion are stored in collections that are called, well, “databases.” 

See the source image
an example of an Excel table

SQL servers are designed for types of data that are relatively clean and ‘well-structured’; one typically puts data into various tables, which are then linked to each other in a ‘relationship’ by some field that is common to both tables. In this way you can “lookup” data in one table using a related value in a column in some other table. 

SQL servers are typically used for very large, very well-structured data when you need really fast results to make your software run smoothly.

See the source image
A relationship diagram for various tables in a SQL database

However, not all problems that you find in the real world are a good match for ‘well structured’ databases like SQL. Sometimes you need to work with data that is messier or only loosely-structured—or structured in a way that may frequently change because you haven’t yet worked out all the final details of your application just yet. 

In SQL databases, a single ‘record’ has the same format (fields) as any other record (for example, think of a single row of an excel spreadsheet table as a ‘record’.) For instance, imagine a ‘pollworker’ record for a polling place —it might have, for example, a field for name, address, phone number, email address etc. with one ‘record’ for every poll worker. You might also have information for emergency contacts, or other family members related to that poll worker.

SQL databases are common and useful, but there is also a type of database called a “No SQL” database. These kinds of databases are different; they allow each record to potentially have different numbers (and sizes) of fields, with perhaps only a few fields per record kept in common. For example, maybe one worker’s record has some extra fields that the other worker’s records don’t typically have, like secondary phone numbers or extra email addresses.

When you have less well-structured databases, or data that doesn’t “fit well” into a SQL table—or if you have a need to quickly add new fields as you’re developing your software without having to rebuild your database every time you make some major change—it is now common to use one of these “No SQL” databases. 

They are just easier for programmers to work with—if the software requirements for data changes, it is often faster and easier to “upgrade” the application when the data is in a NoSQL database than if it is in a more rigidly structured SQL database. 

Software development gets done faster, which saves the company money. You just tack on a few new fields for certain new records, leave the old records alone, and program your application to recognize when it sees a new format for data records and act appropriately.

These kinds of databases are also conceptually easier for less skilled programmers to use, they “connect up” to modern web applications more easily, and they have grown in popularity over the last decade. MongoDB was one of the first and one of the more popular ones in use these days. 

Sometimes, a “NoSQL” database is used during early software development because of the flexibility in rapidly changing the data structures as the program gets fleshed out; but then once the program reaches final product stages, it is converted to a SQL database because the data structures aren’t expected to change as much after the program goes on the market, and SQL servers perform faster.

The ease of use of NoSQL databases like MongoDB for unsophisticated programmers, however, is a problem: because less skilled (and less expensive) programmers can and do use them, they often overlook certain critical security settings because of their lack of knowledge.

So now let’s get to what Gregg’s team found.

Recall that they found an IP address in China associated with what appeared to be one of Konnech’s products (according to the DNS records for that IP, anyway) that was used for one or more US voting agencies; and that one of the ports that was in use at that IP address was typically used for MongoDB, if it happened to be installed on that server. 

Let’s briefly use a “house” analogy here for a moment to make this easier to grasp. Think of the “house” as the IP address, and the “ports” are the doors and windows on the house—ways to get into or out of the house.

A common practice for cybersecurity professionals who are exploring a network is to “test the locks” when they find “open windows or doors” as they walk around a “building” of interest, and in this case, they did a quick check on the MongoDB port (“rattled the windows”) to see if it responded. 

When it did, they next tried a pretty basic thing: they tested to see if they could log into it with the default, “out of the box” username and password. That would be a pretty dumb thing for the owner of this machine to have left in place, but it is surprisingly common. 

In other words, as the cyber team rattled the windows and doors, they found a boneheaded error on the MongoDB installation that only a novice would be expected to make. The doors and windows weren’t even locked. In fact, they were wide open.

You see, when MongoDB is freshly installed, it doesn’t have proper security rules set up to restrict who can read and write data into it; unless the person configuring that MongoDB installation takes the extra necessary steps (and knows how to do it), MongoDB either has NO password, or the default password which is… well, you can probably guess. “PASSWORD”.

So Gregg’s team was able to “walk in the front door”, as it were, because there was no lock on the door, and “look around the place.”

What they found was shocking: they found data that included personal details of nearly 1.8 million US poll workers. Details like their names, phone numbers, addresses, etc. Even the names of family members: things that might routinely be collected when you hire someone and issue them a paycheck.

But they also reportedly found rich details about where election machines were located, including floorplans of buildings used in elections. Nominally, this information would be of use by the election agencies, because the application they were using helped them track their election machine inventory. 

But none of this should have been left out in the open for just anyone to see; and it sure as hell shouldn’t have been done in China. In short, it was a serious data spill. 

China apparently has a law that any data found on its Internet belongs to the government, so in effect, China has “custody” of anything that existed on this server.

Perhaps because of the fact that Chinese programmers know this policy about the CCP, they are lazy and don’t bother much with securing their database servers; it could be a cultural thing—kind of a “what’s the point, it all belongs to them anyway” attitude among some workers. It could also be that the more talented programmers take jobs in other countries, while the more incompetent ones stay behind to work in China.

So how did it come to pass that this information was in China? Well, it is a practice these days for companies in the US to outsource software development to India, China, Pakistan, Armenia, Russia and elsewhere because the wages are lower for the same level of software talent that they might be able to find in the US; moreover, there is such a demand for programmers these days in all sectors of the economy that US companies sometimes can’t even find people with the right skills locally to do the work they need done. 

This is why companies like Microsoft and others make such extensive use of the H1B Visa program: the demand for software talent far outstrips the supply.

So what may have happened is that Konnech employed a programmer or two in China to do the development and testing of pieces of the Konnech application suite, with the idea that when it came time, they would bring the final application (and database) back onshore to Grand Rapids for final deployment and use in actual elections. 

However, having worked on client/server applications like this myself, I know that the “network administration” and “network security” skills are often lacking or absent in junior programmers, and they might make basic errors in configuration in deployment of production systems too (like leaving all the doors and windows unlocked.) 

They might also fail to change the IP addresses properly after the “Beta” phase is done but before going “live” in some US city. They might also copy “live production” data back to a development system in China, so that the programmers can continue to refine the application and remove bugs even as the production system is in use.

A reasonably likely explanation, therefore, is that Konnech (or maybe a contractor it was using) outsourced work to China, and the US program manager and the Chinese labor programmers it used were grossly incompetent, leaving either test or live production data unprotected on a MongoDB server in China. 

It isn’t in dispute that someone incompetent did this; the data was exposed. What isn’t known is who exactly did it, and what relationship they had to the vendor Konnech.

However, the situation is even worse than it first appears: because the MongoDB database was *completely* unsecured, it was also possible for interlopers to not only read all of its data—but also potentially add, overwrite and change data. 

For instance, someone could have added a few dozen unscreened poll workers who were unvetted and were acting as plants sent in to do someone’s bidding on the election machines or ballots.

But that brings us back to an earlier point: the CCP views any data on China’s networks as belonging to the government: whether it was exposed due to incompetence, or not. They don’t care; it belongs to them.

Since Gregg and his team were able to “walk in the front door” because the doors and windows weren’t locked, it stands to reason that China’s own cybersecurity teams may have done so as well…and therefore the CCP could have, and likely did, come into possession of this same US poll worker data.

What else might have been on those servers? Could it have included voter registration data? What might the CCP have done with this data? Could they have used it to bribe or blackmail poll workers because they had all of their details, including phone numbers? 

Could they have used it to inject their own plants as poll workers? Could they have used the information about blueprints and election machine locations to make hacking into wireless network connected election machines easier, since they knew exactly where to target their efforts? 

We can’t know all these details at this point. But there is certainly reason to suspect that they could, and did, use this information to their own advantage.

What was particularly disturbing was that, when Gregg and Catherine discovered all of this, they tried to do the right thing and get the FBI involved. But because of the politicization of so many Federal agencies, not only was their attempt to inform and involve the FBI rebuffed, but the FBI then attempted to turn things around and make Gregg or his team look like bad guys for having downloaded (from an unsecured server) and come into possession of this dataset of 1.8 Million US poll workers. 

Gregg basically did the equivalent of picking up an abandoned hard drive of data left lying openly on the ground in China, and the FBI wants to accuse him for having it in his possession, because of what was found on the drive! 

They’re not interested in seeing if that data was misused to hijack our elections by China? Isn’t foreign interference in US elections an act of war?

It remains to be seen in how many other instances data spills like this were taken advantage of by China; and it will remain to be seen if there is any direct evidence that data of this type was used to alter the election outcome, above and beyond what was already documented to have occurred in the ‘2000 Mules’ ballot stuffing cases.

We’re not at the end of this investigation into election fraud; we’re simply at the end of the beginning. The Pit was a turning point: a time for Gregg and Catherine to engage a wider community of researchers and journalists into probing the whole spectrum of election integrity issues and engaging the citizenry in ways they need to be—but haven’t yet been—involved.

Without trustworthy elections, our Republic is at risk of being pulled down by those who prefer communism and socialism. We won’t stand idly by. All Americans must find a way to get more involved in the election process; become engaged and involved in protecting and securing our Republic.

CognitiveCarbon’s Content is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Subscribe now