Redshift Not Connecting to Host via Python Script

0

I currently have a .csv file in an S3 bucket that I'd like to append to a table in a Redshift database using a Python script. I have a separate file parser and upload to S3 that work just fine.

The code I have for connecting to/copying into the table is below here. I get the following error message:

OperationalError: (psycopg2.OperationalError) could not connect to server: Connection timed out (0x0000274C/10060) Is the server running on host "redshift_cluster_name.unique_here.region.redshift.amazonaws.com" (18.221.51.45) and accepting TCP/IP connections on port 5439?

I can confirm the following:

  • Port is 5439

  • Not encrypted

  • Cluster name/DB name/username/password are all correct

  • Publicly accessible set to "Yes"

What should I be fixing to make sure I can connect my file in S3 to Redshift? Thank you all for any help you can provide.

Also I have looked around on Stack Overflow and ServerFault but these seem to either be for MySQL to Redshift or the solutions (like the linked ServerFault CIDR solution) did not work.

Thank you for any help!

DATABASE = "db"
USER = "user"
PASSWORD = "password"
HOST = "redshift_cluster_name.unique_here.region.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "public"
S3_FULL_PATH = 's3://bucket/file.csv'
#ARN_CREDENTIALS = 'arn:aws:iam::aws_id:role/myRedshiftRole'
REGION = 'region'
############ CONNECTING AND CREATING SESSIONS ############
connection_string = f"redshift+psycopg2://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}"
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = f"SET search_path TO {SCHEMA}"
s.execute(SetPath)
###########################################################
############ RUNNING COPY ############
copy_command = f
'''
copy category from '{S3_FULL_PATH}'
credentials 'aws_iam_role={ARN_CREDENTIALS}'
delimiter ',' region '{REGION}';
'''
s.execute(copy_command)
s.commit()
######################################
#################CLOSE SESSION################
s.close()
##############################################
python
amazon-web-services
amazon-s3
amazon-redshift
asked on Stack Overflow Oct 15, 2019 by TJ15 • edited Oct 16, 2019 by John Rotenstein

1 Answer

3

Connecting via a Python program would require the same connectivity as connecting from an SQL Client.

I created a new cluster so I could document the process for you.

Here's the steps I took:

  • Created a VPC with CIDR of 10.0.0.0/16. I don't really need to create another VPC, but I want to avoid any problems with prior configurations.
  • Created a Subnet in the VPC with CIDR of 10.0.0.0/24.
  • Created an Internet Gateway and attached it to the VPC.
  • Edited the default Route Table to send 0.0.0.0/0 traffic to the Internet Gateway. (I'm only creating a public subnet, so don't need a route table for private subnet.)
  • Created a Redshift Cluster Subnet Group with the single subnet I created.
  • Launch a 1-node Redshift cluster into the Cluster Subnet Group. Publicly accessible = Yes, default Security Group.
  • Went back to the VPC console to edit the Default Security Group. Added an Inbound rule for Redshift from Anywhere.
  • Waited for the Cluster to become ready.
  • I then used DbVisualizer to login to the database. Success!

The above steps made a publicly-available Redshift cluster and I connected to it from my computer on the Internet.

answered on Stack Overflow Oct 16, 2019 by John Rotenstein

User contributions licensed under CC BY-SA 3.0