How to Populate Database from CSV file in 2 Simple Steps

In the previous Part 1, Part 2 & Part 3 tutorial series, we developed REST APIs, secured one API with Basic authentication and all other APIs with OAuth2 JWT authentication, and documented them using OpenAPI 3 spec. In this article, we are gonna initialize the database with data from the CSV file on application startup.

Introduction

We may need to initialize the application database on application startup. For instance, in our application, we are creating the admin users and roles on application startup. Likewise, let’s assume that we have a table called movies used to store some movie details and we have an academy_awards.csv file that contains a list of academy award nominees. So, we want to read the file and get the list of nominees for the “Best Picture” award & if they won the oscar and store these in the database. While there are many ways to do this, we are gonna use the commons-csv library for this purpose. Let’s go ahead and implement it.

What you’ll do?

Populate the database with data from the CSV file on application startup.

What you’ll need?

  • IntelliJ or any other IDE of your choice
  • JDK 17
  • MySQL Server 8

Populating Database

Add Dependencies

Firstly, let’s add the commons-csv library for reading the CSV file in the pom.xml.

pom.xml

		<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-csv -->
		<dependency>
			<groupId>org.apache.commons</groupId>
			<artifactId>commons-csv</artifactId>
			<version>1.9.0</version>
		</dependency>

Modifying SetupDataLoader

SetupDataLoader.java

Secondly, In SetupDataLoader, we are already creating default users and roles on application startup. In addition, we are gonna introduce one more method in which we will read the CSV file and store the records in the database. Also, let’s copy the academy_awards.csv file into src/main/resources directory.

package com.javachinna.config;

import com.javachinna.model.Movie;
import com.javachinna.model.Role;
import com.javachinna.model.User;
import com.javachinna.repo.MovieRepository;
import com.javachinna.repo.RoleRepository;
import com.javachinna.repo.UserRepository;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.csv.CSVFormat;
import org.springframework.context.ApplicationListener;
import org.springframework.context.event.ContextRefreshedEvent;
import org.springframework.core.io.ClassPathResource;
import org.springframework.security.crypto.password.PasswordEncoder;
import org.springframework.stereotype.Component;
import org.springframework.transaction.annotation.Transactional;

import java.io.FileReader;
import java.io.Reader;

/**
 * Class is responsible for initializing the database with users and movies from CSV file on application startup
 */
@Slf4j
@Component
@RequiredArgsConstructor
public class SetupDataLoader implements ApplicationListener<ContextRefreshedEvent> {

    private boolean alreadySetup = false;

    private final UserRepository userRepository;
    private final RoleRepository roleRepository;
    private final MovieRepository movieRepository;
    private final PasswordEncoder passwordEncoder;

    @Override
    @Transactional
    public void onApplicationEvent(final ContextRefreshedEvent event) {
        if (alreadySetup || userRepository.findAll().iterator().hasNext()) {
            return;
        }

        // Create user roles
        var userRole = createRoleIfNotFound(Role.ROLE_USER);
        var adminRole = createRoleIfNotFound(Role.ROLE_ADMIN);

        // Create users
        createUserIfNotFound("[email protected]", passwordEncoder.encode("user@@"), // "user"
                userRole, "User");
        createUserIfNotFound("[email protected]", passwordEncoder.encode("admin@"), // "admin"
                adminRole, "Administrator");
        insertMoviesFromCSV();
        alreadySetup = true;
    }

    @Transactional
    void createUserIfNotFound(final String email, final String password, final Role role, final String displayName) {
        User user = userRepository.findByEmailIgnoreCase(email);
        if (user == null) {
            user = new User(email, password);
            user.addRole(role);
            user.setEnabled(true);
            user.setDisplayName(displayName);
            userRepository.save(user);
        }
    }

    @Transactional
    Role createRoleIfNotFound(final String name) {
        Role role = roleRepository.findByName(name);
        if (role == null) {
            role = new Role(name);
            role = roleRepository.save(role);
        }
        return role;
    }

    void insertMoviesFromCSV() {
        try (Reader in = new FileReader(new ClassPathResource("academy_awards.csv").getFile())) {
            CSVFormat.RFC4180.builder()
                    .setAllowMissingColumnNames(true).setHeader("Year", "Category", "Nominee", "Additional Info", "Won?")
                    .setSkipHeaderRecord(true).build().parse(in).forEach(record -> {
                        if ("Best Picture".equals(record.get("Category"))) {
                            Movie movie = new Movie();
                            movie.setTitle(record.get("Nominee"));
                            movie.setWonOscar("Yes".equalsIgnoreCase(record.get("Won?")));
                            movieRepository.save(movie);
                        }
                    });
        } catch (Exception e) {
            log.error("Unable to read CSV file", e);
        }
    }
}
Let’s go into detail to understand what we are doing here:
  • Firstly, we have created an instance of CSVFormat with a predefined format RFC4180 which is a DEFAULT format but does not ignore empty lines in the file. Likewise, we can parse the CSV file with any one of the predefined formats given below in the table.
  • Secondly, we have set allowMissingColumnNames to true since our CSV file may not have names for all the columns. If we don’t set it to true and if the column name is missing in the file, then it will throw an IllegalArgumentException.
  • Thirdly, we have to call the setHeader() method which sets the header to the given values. The header can either be parsed automatically from the input file with builder.setHeader() or specified manually with builder.setHeader("name", "email", "phone"). Though our file contains the header record, we have specified the header names manually just to show how it can be done.
  • Finally, we have to skip the header record with setSkipHeaderRecord(true). However, If your file does not contain a header record, then you don’t need to skip it but you need to explicitly specify the header names as mentioned above.

Predefined CSV Formats

CSV FormatDescription
DEFAULTStandard Comma Separated Value format, as for RFC4180 but allowing empty lines.
EXCELExcel file format (using a comma as the value delimiter). Note that the actual value delimiter used by Excel is locale-dependent, it might be necessary to customize this format to accommodate your regional settings.
For example for parsing or generating a CSV file on a French system the following format will be used:
  CSVFormat fmt = CSVFormat.EXCEL.withDelimiter(‘;’);
INFORMIX_UNLOADDefault Informix CSV UNLOAD format used by the UNLOAD TO file_name operation.
This is a comma-delimited format with an LF character as the line separator. Values are not quoted and special characters are escaped with ‘\’. The default NULL string is “\\N”.
INFORMIX_UNLOAD_CSVDefault Informix CSV UNLOAD format used by the UNLOAD TO file_name operation (escaping is disabled.)
This is a comma-delimited format with an LF character as the line separator. Values are not quoted and special characters are escaped with ‘\’. The default NULL string is “\\N”.
MYSQLDefault MySQL format used by the SELECT INTO OUTFILE and LOAD DATA INFILE operations.
This is a tab-delimited format with an LF character as the line separator. Values are not quoted and special characters are escaped with ‘\’. The default NULL string is “\\N”.
RFC4180Comma-separated format as defined by RFC 4180.
ORACLEDefault Oracle format used by the SQL*Loader utility.
This is a comma-delimited format with the system line separator character as the record separator. Values are double quoted when needed and special characters are escaped with ‘”‘. The default NULL string is “”. Values are trimmed.
POSTGRESQL_CSVDefault PostgreSQL CSV format used by the COPY operation.
This is a comma-delimited format with an LF character as the line separator. Values are double quoted and special characters are escaped with ‘”‘. The default NULL string is “”.
POSTGRESQL_TEXTDefault PostgreSQL text format used by the COPY operation.
This is a tab-delimited format with an LF character as the line separator. Values are double quoted and special characters are escaped with ‘”‘. The default NULL string is “\\N”.
TDFTab-delimited format.

Testing

Finally, let’s start the application and check if the database is populated as expected

Output

CSV to Database insert statements log
Loading data from CSV file to Database

Source Code

https://github.com/JavaChinna/spring-boot-oauth2-jwt

Conclusion

In conclusion, we have initialized our database with data from a CSV file on application startup.

Thank you for reading.

Leave a Reply