Creating Connections

Once you explore a data source and understand its purpose, the next step is to connect to the data source and make the most use of its data.

To create a connection, you must specify the following properties:

  • Connection Type: Defines the different set of connections available in a connection to connect to a data asset.
  • Connection Details: Includes required details to create a connection to a data source. A connection is always associated with one data asset while a data asset can have more than one connection.
  • Analytics Service: The analytics pipeline enables you to view retrieved data like profile information of a data asset, sample data, and tagged data. Analytics pipeline also runs data quality jobs to ensure that the data in the source system is in order.
note

Each analytics service or pipeline needs to be physically connected to the source system.

Create a New Connection

To create a connection, do the following:

  1. Click the Data Sources tab. The Create Data Source wizard is displayed.

  2. Select a Data Source type.

  3. Click the Create New tab under Provide Connection Configuration. The table below provides information about the supported connection types and their properties.

    creating a new connection

    Connection TypeDescriptionProperty
    HiveHive connection connects to a Hive database.
    1. Connection Name: Specify the name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL which is used to locate the database schema. The URL uses the following format:
      • jdbc:hive2://<hostname>:<port>/<database name>
    4. JDBC Username: Specify the username to connect to the Hive database.
    5. JDBC Password: Specify the password to connect to the Hive database.
    HbaseHBase connection creates a connection for a HBase table. HBase connection is a NoSQL connection.
    1. Connection Name: Specify a name for the connection. It is a required field which is case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters.
    2. Connection Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. Properties: Drag and drop or click the box to upload the site-config.xml file.
    HdfsHadoop Distributed File System (HDFS) connection type accesses data from a hadoop cluster.
    1. Connection Name: Specify the name of the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. Access Id: Specify the access ID.
    4. Access key: Specify the access key or password.
    redshiftAmazon Redshift connection allows you to work with data in your cluster by using Amazon Redshift JDBC drivers.
    1. Connection Name: Specify the name of the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL which is used to locate the database schema. The JDBC URL has the following format:
      • jdbc:redshift://<cluster>.<hostname>.<region>.redshift.amazonaws.com:<port>/<database>
    4. JDBC Username: Specify the username to connect to the Redshift database.
    5. JDBC Password: Specify the password to connect to the Redshift database.
    azure sqlMicrosoft JDBC driver is used for SQL server to connect to Azure SQL database.
    1. Connection Name: Specify a name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL is used to locate the database schema. The JDBC URL has the following format:
      • jdbc:sqlserver://[serverName[\instanceName][:portNumber]][;property=value[;property=value]]
    4. JDBC Username: Specify the username to connect to the Azure database.
    5. JDBC Password: Specify the password to connect to the Azure database.
    snowflakeSnowflake connection type is used to connect to a Snowflake data source.
    1. Connection Name: Specify a name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL is used to locate the database schema. The JDBC URL has the following format:
      • jdbc:snowflake://<accountname>
    4. JDBC Username: Specify the username to connect to the Snowflake database.
    5. JDBC Password: Specify the password to connect to the Snowflake database.
    aws s3Amazon Web Services connection type enables AWS integrations.
    1. Connection Name: Name of the connection is a required field that is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. Access Key: Fill in the AWS access key ID.
    4. Secret Key: Enter the AWS secret access key.
    5. AWS region: Enter the AWS region for the glue data lake. (example: us-east-1)
    gcs Google connection type adds a connection to the Google Cloud Platform (GCP).
    1. Connection Name: Specify a name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. Credentials File: Locate the service account credentials file and upload it.
    4. Email: Specify the Google service account email ID.
    kafkaKafka connection type accesses streaming data pipelines and data sources.
    1. Connection Name: Name of the connection is a required field that is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. BootStrapServers: bootstrap.servers is a comma-separated list of bootstrap servers.
    4. Security Protocol: Select one of the security protocols provided like PLAINTEXT(not encrypted) or SASL_PLAINTEXT(encrypted).
    mysqlMySQL connection type is used to connect to a MySQL database.
    1. Connection Name: Specify a name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL is used to locate the database schema. The JDBC URL has the following format:
      • jdbc:mysql://<server name>/<database name>
    4. JDBC Username: Specify the username to connect to the MySQL database.
    5. JDBC Password: Specify the password to connect to the MySQL database.
    memsqlMemSQL connection type is used to connect to a MemSQL database.
    1. Connection Name: Specify a name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL is used to locate the database schema. The JDBC URL has the following format:
      • jdbc:acceldata:mysql://hostname:port;databaseName=<db_name>
    4. JDBC Username: Specify the username to connect to the MemSQL database.
    5. JDBC Password: Specify the password to connect to the MemSQL database.
    postgresqlPOSTGRESQL connection type is used to connect to a POSTGRESQL database.
    1. Connection Name: Specify a name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL is used to locate the database schema. The JDBC URL has the following format:
      • jdbc:postgresql://host:port/database
    4. JDBC Username: Specify the username to connect to the POSTGRESQL database.
    5. JDBC Password: Specify the password to connect to the POSTGRESQL database.
    oracleOracle connection type is used to connect to an Oracle database.
    1. Connection Name: Specify a name for the connection. It is a required field which is not case sensitive and must be unique in the domain. It should not exceed 128 characters and can contain special characters as well.
    2. Description: Describe the purpose of the connection. The description cannot exceed 4000 characters.
    3. JDBC URL: Specify the Java Database Connectivity (JDBC) URL is used to locate the database schema. The JDBC URL has the following format:
      • jdbc:oracle:<drivertype>:@<database>. Example of a driver type is 'Thin'.
    4. JDBC Username: Specify the username to connect to the Oracle database.
    5. JDBC Password: Specify the password to connect to the Oracle database.
  4. Click Next to select an analytics service for the connection.

  5. Select an Analytics Service from the drop-down list. analytics service

  6. Click Test Connection to check if the connection created is proper.