Lesson Progress:

Data Structures

Lesson Content

Inquire: How Do They Know?

Overview

Have you ever wondered how social media sites are able to keep track of your posts and comments? How are they able not only to remember who you have selected as friends, but also suggest additional ones based on who you know and what you like? This is possible because of databases. All the data you generate by your online actions allows for the collection, storage, and analysis of that data by companies who use the data for advertisements and other purposes. The Internet runs on databases, and so do businesses, universities, and laboratories around the world. They are what let us truly harness data.

Once you complete this lesson, you will be able to define database, understand common data structures, and explain how they affect the functionality of a database. You will also be able to define the benefits and limitations of both SQL and NoSQL.

Decorative

Big Question

With so much data available in the world, how are we able to make sense of it and harness its power?

Read: Data What?

What is a Database?

Data refers to the raw unprocessed bits of facts and figures that make up information. A database is an organized collection of data stored, organized, and accessed electronically. Databases come in a number of configurations that depend on the type of data and the intent of data processing. Some are small, simple collections of data stored as plain text files, while others exist as complex collections of related tables of data. In all cases, a database serves to make a collection of data useable by combining different pieces of data together and giving context. Once this step is complete, companies, governments, and individuals can use the data.

Flat Data

Flat file databases are the simplest kind, consisting of a collection of data stored in a single plain text or binary file. Typically, a user records a single record of data per line or row, with multiple columns and fields allowing you to organize that data more easily. Flat files are the easiest type of database to create as you can generate them using any spreadsheet software or a word processing software.

DecorativeAs an example, a spreadsheet file (Microsoft Excel or Google Sheets) listing all the personnel working at a company with their contact information would be a flat file. Each row records one individual, with columns/fields labeled Name, Address, Phone Number, Email Address, and so on.

We interact with this kind of simple database most often when looking at things like outlines, calendars, address books, and webpage bookmark lists. Many types of basic software use these files to store basic configuration information. Their simplicity, however, makes them less useful for dealing with large amounts of data, and they are not capable of supporting more complex database models.

Relational Databases

The relational data models use data by storing and organizing data within a matrix comparing the relationship between types of data and the instances of that data. Relational databases organize data in rows and columns like a flat file, but they do so across many tables of related data using unique keys to identify each row. This allows each table to represent a single entity type such as a person or product. Each row then represents an instance of that type of entity, and each column represents a single attribute.

DecorativeEach row or instance can associate itself with many tables and enable a large number of attributes and other entities thanks to its unique identity keys. This allows for far greater depth in the amount of data that one can store and process when compared to a flat file.

Almost all relational databases use the Structured Query Language (SQL) to maintain and query or search the database. Queries are a method of filtering the data in a database based on different attributes to see only specific subsets of the data available as results. You can make this even easier by creating indexes, or preset queries, which essentially create shortcuts to the parts of the database that you use most often.

Relational databases remain the most common databases in use today. Nearly any website you go to is likely running on a relational database. Facebook, for instance, is one large relational database connecting you to your friends list, your posts, and the posts of others that you have liked or commented on. While the Internet may still run on relational databases, many businesses have started switching to object databases where possible.

Object Databases

Object databases are more complicated. They combine data into a single object instead of using tables, rows, and columns to break data up. You can connect that single object to various other objects in a multitude of ways depending on how you need to use the data. This allows for the storage of very complex data relationships without the need for the mapping of rows and columns. This allows them to process extremely complex data more efficiently and effectively than other database types.

However, an object database is intrinsically linked with a specific programming language and thus is often application dependent. Also, unlike relational databases that are fairly standardized in using SQL, there is not a single standard for object databases at this date.

The fields of engineering, telecommunications, physics, and molecular biology use these databases heavily. The ability to contain and process complex data makes them excellent for applications related to computer aided design or complex system modeling.

Takeaways

Data on its own is useless until you combine it with additional data and context. Then, it can provide useful information, and databases are one of the main tools to facilitate this process. The type of data and how you intend to use it will determine the type of data structure and database you will need to use. For the simplest data, such as basic lists, a flat file database would be typical. When you need to be able to track the relationships between different types of data, such as company customer lists, sales tracking, and employee commissions, you would use a relational database. People and companies use object databases primarily for tasks that require extremely complex data structures like cell phone networks, and advanced engineering and medical research.

Reflect: Databases

Poll

Thinking about your daily interactions with the computers and the Internet, between personal use and business, which kind of database do you think you come into contact with the most?

Expand: SQL vs. NoSQL

What’s SQL?

SQL — structured query language — and the relational databases that use it have been the primary data storage means since their creation in the 1970s. Before the Internet, and even before personal computers, a pair of computer scientists at IBM designed SQL to query databases in a way that would be as easy to read as English. Their focus was to foster growth in the nascent computer industry by creating a method of interacting with data that didn’t require an extensive background in mathematics or computer programming.

SQL, What is it Good For?

As a relational database, SQL makes it easy to sort through related data, even when running complex queries, and is capable of limitless indexing. It has strong vertical scalability through to the server containing the database. Its nature means data integrity is strong and updates are fast and easy. The standardized structure of the data also means that it is relatively easy to combine multiple tables or move data between tables. SQL is also a well-known and extremely stable platform.

However, any relational database has its limits. If you want to be able to index and search a few thousand documents, an SQL powered relational database is an easy choice. But, what if you want to index a million websites and their related data? This is especially challenging if webpage designers structure the data of every site in a different way. The lack of an ability to scale horizontally and spread data across machines is also an issue. When the Internet was born and began to flourish, these questions became very real, and so came a transition to NoSQL, or distributed non-relational databases.

NoSQL

DecorativeNoSQL has actually existed as long as SQL, but did not come into vogue until the Internet began to show the limitations of SQL databases when dealing with large amounts of real-time data from websites and web applications. Google and Amazon both began developing new NoSQL database solutions which led to a resurgence in interest in the format.

A NoSQL database is non-relational and offers many important benefits. It handles large volumes of distributed non-relational data well, and is able to process simple queries of large batches of data very quickly. Data structure is not a requirement, and you can add additional fields and attributes at any time. Many versions allow for the handling of relational data in a way similar to SQL. NoSQL scales well horizontally across multiple machines, providing additional processing power. The software is easily available for free from many sources.

However, it has several limitations. The distributed nature of its data means that it has limited support for combining different tables, as the same data may be present but in different formats. This unstructured data also means it requires mass updates distributed across the database and can result in data not updating at the same time. When a query is run close to the time of an update, it may result in outdated results. This also means there is less data integrity. NoSQL databases are harder to index, and more complex queries can require more time to accomplish.

Which is Better?

Both database formats solve different problems and have different uses. SQL is excellent for applications that require rapid complex queries and is best when dealing with related data that has a predefined structure. NoSQL is better for very large amounts of nonstructural and distributed data. It also allows for greater scalability across multiple machines.

While NoSQL acted as a primary driver of the Internet throughout the 2000s, SQL has regained ground and has once again become the Internet’s backbone as a result of the inherent limitations of NoSQL. Many companies use NoSQL because of the inherent speed advantage it has when dealing with simple queries.

In the end, both serve their purposes. As they evolve, it is likely they will continue to see use as integral components of the world’s databases.

Check Your Knowledge

Use the quiz below to check your understanding of this lesson’s content. You can take this quiz as many times as you like. Once you are finished taking the quiz, click on the “View questions” button to review the correct answers.

Lesson Resources

Lesson Toolbox

Additional Resources and Readings

Early History of SQL

An article written by one of the original creators of SQL detailing the history of its creation

Flat File vs Relational Database Models

A video explaining the difference between flat files and relational databases, as well as the advantages of each

A Gentle Introduction to Relational and Object Oriented Databases

A research paper explaining the differences between relational and object database models in more depth

Relational Databases

More information on relational databases

Lesson Glossary

Terms

AJAX progress indicator
  • data
    the raw, unprocessed bits of facts and figures that make up information
  • database
    an organized collection of data that is stored, organized, and accessed electronically
  • information
    facts and figures arranged with context to provide usable data

License and Citations

Content License

Lesson Content:

Authored and curated by David Thomas for The TEL Library. CC BY NC SA 4.0

Media Sources

 LinkAuthorPublisherLicense
DecorativeServer Cloud DevelopmentColossusCloudPixabayCC 0
DecorativeInternet Network OnlineWokandapixPixabayCC 0
DecorativeDatabase Schema Data TablesmcmurryjuliePixabayCC 0
DecorativeNautical Almanac spreadsheet screenshotAndrés RuizWikimedia CommonsPublic Domain