Showing posts with label postgres. Show all posts
Showing posts with label postgres. Show all posts

Wednesday, November 23, 2011

copy command in postgres rails

 Use copy SQL command instead of large insert query in postgres and also improved performance.

   query=<<-SQL
      SELECT brands1.id, #{self.id}
      FROM brands brands1
      LEFT OUTER JOIN brands brands2
        ON brands1.parent_id = brands2.id
      WHERE (#{name_conditions})
      AND #{specialty_conditions}
    SQL

search_contacts_bulk_insert_via_copy(query,"brands_searches (brand_id, search_id)")


  private

def search_contacts_bulk_insert_via_copy(query,iquery)
search_contacts_path="#{RAILS_ROOT}/tmp/bulk_insert/search_contacts_#{self.id}.copy"
File.delete(search_contacts_path) if FileTest.exists?(search_contacts_path)
search_contacts_path1 = File.new(search_contacts_path, "w")
File.chmod(0777, search_contacts_path)
query ="COPY (#{query}) TO '#{search_contacts_path}'"
ActiveRecord::Base.connection.execute(query)
   logger.error(query)
clinsert_query="COPY #{iquery} FROM '#{search_contacts_path}';"
connection.execute(clinsert_query)
logger.error(clinsert_query)
File.delete(search_contacts_path) if FileTest.exists?(search_contacts_path)
end


Saturday, November 19, 2011

Searching in integer Arrays in postgres

SELECT * FROM sal_emp WHERE 10000 = ANY (pay_by_quarter);
In addition, you can find rows where the array has all values equal to 10000 with:

SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
Alter natively, the generate_subscripts function can be used. For example
 SELECT * FROM
   (SELECT pay_by_quarter,
           generate_subscripts(pay_by_quarter, 1) AS s
      FROM sal_emp) AS foo
 WHERE pay_by_quarter[s] = 10000;
 
Real time example:
Structure 
contactids, companyids,added_by_search_contactids,added_by_search_companyids => integer array  
1. SELECT * FROM  (SELECT contactids,generate_subscripts(contactids, 1) AS s 
   FROM lists limit 50000 ) AS foo 
   WHERE contactids[s]  IN (select id from contacts where current_details_id IS NULL); 

2. UPDATE lists SET contactids = coalesce(contactids,ARRAY[]::INT[]) - coalesce(ARRAY(select id 
   from contacts where current_details_id IS NULL),ARRAY[]::INT[]),
   added_by_search_contactids = coalesce(added_by_search_contactids,ARRAY[]::INT[]) - 
   coalesce(ARRAY(select id from contacts where current_details_id IS NULL),ARRAY[]::INT[]),
   companyids = coalesce(companyids,ARRAY[]::INT[]) - coalesce(ARRAY(select id from
   companies where current_details_id IS NULL),ARRAY[]::INT[]),
   added_by_search_companyids = coalesce(added_by_search_companyids,ARRAY[]::INT[]) - 
   coalesce(ARRAY(select id from companies where current_details_id IS NULL),ARRAY[]::INT[]) 
   WHERE id IN (SELECT id FROM (SELECT id,contactids,generate_subscripts(contactids, 1)AS s FROM lists
   WHERE current_step >= 4 AND folder_id IS NOT NULL and is_deleted IS FALSE LIMIT 20000 OFFSET 80000) AS foo 
   WHERE contactids[s] = ANY(select id from contacts where current_details_id IS NULL) UNION SELECT id 
   FROM (SELECT id,added_by_search_contactids,generate_subscripts(added_by_search_contactids, 1)AS s FROM lists WHERE 
   current_step >= 4 AND folder_id IS NOT NULL and is_deleted IS FALSE LIMIT 20000 OFFSET 80000) AS foo 
   WHERE added_by_search_contactids[s] = ANY(select id from contacts where current_details_id IS NULL) 
   UNION SELECT id FROM (SELECT id,companyids,generate_subscripts(companyids, 1)AS s FROM lists 
   WHERE current_step >= 4 AND folder_id IS NOT NULL and is_deleted IS FALSE LIMIT 20000 OFFSET 80000) AS foo WHERE 
   companyids[s] = ANY(select id from companies where current_details_id IS NULL) UNION SELECT id FROM 
   (SELECT id,added_by_search_companyids,generate_subscripts(added_by_search_companyids, 1)AS s FROM lists 
   WHERE current_step >= 4 AND folder_id IS NOT NULL and is_deleted IS FALSE LIMIT 20000 OFFSET 80000) AS foo 
   WHERE added_by_search_companyids[s] = ANY(select id from companies where current_details_id IS NULL)) 
  RETURNING lists.id

More information here
http://www.postgresql.org/docs/current/static/arrays.html
http://www.postgresql.org/docs/current/static/functions-srf.html#FUNCTIONS-SRF-SUBSCRIPTS.
If you have  any quires please post your comments here.

Friday, November 18, 2011

PostgreSQL row sort with NULL values and empty string at last

ORDER BY NULLIF(first_name, '') DESC NULLS LAST
More examples here

What is the performance difference distinct vs group by SQL command


Don't use distinct command.It performance way is slow. group by command performance very high compare to distinct command.Please see the real time difference in the below example query.

1) select count(*) from (select distinct contact_id from contacts_lists) a;

 count
 65474

 1 row(s)

 Total runtime: 40,800.186 ms

2) select count(*) from (select contact_id from contacts_lists group by contact_id) a;

 count
 65474

 1 row(s)

 Total runtime: 32,302.381 ms