Sql Query to Randomize and Randomize Again

By:   |   Updated: 2020-04-01   |   Comments (ane)   |   Related: More > Functions Arrangement


Problem

In this tip nosotros look at dissimilar examples of getting random values using the SQL Server RAND function to give y'all a amend idea of how this works and when and how to use it.

Solution

The SQL Server RAND function allows you to generate a pseudo-random sequence of numbers.  The Microsoft SQL Docs site presents basic examples illustrating how to invoke the function.  A pseudo-random sequence is one that is determined according to precise rules, but which appears to be random.  The values are often uniformly random over some range of values.  The SQL Server RAND function generates pseudo-random numbers of this sort.  MSSQLTips.com offers several prior tips comparing the RAND function to other ways of generating pseudo-random sequences and demonstrating extensions to the RAND function (here, here, and hither).  This tip focuses on the nuts of how to invoke the part especially highlighting the role of seed values along with coordinated demonstrations.

Basic Operation of the RAND function

The RAND function operates with or without the specification of a seed value that can decide the ability to repeat a sequence of output values.  With or without a seed, the function returns a value from 0 through ane, exclusive.  The return blazon from the part has a float information type.  If the seed value is identical for successive invocations of the office, so the return value is the same for each successive run of the function.  Seed values can have whatsoever of the following three data types: tinyint, smallint, int.  If you exercise non specify a seed when invoking the function, then a random seed value is assigned in the background for each invocation of the function.

The post-obit example shows the syntax for running the function vi successive times with the same seed value (1) for each invocation.

-- initial run select rand(one) [rand with seed of 1]   -- rand with same seed select rand(i) [rand with seed of one]   -- rand with same seed select rand(1) [rand with seed of ane]   -- rand with same seed select rand(i) [rand with seed of 1]   -- rand with aforementioned seed select rand(1) [rand with seed of 1]   -- rand with same seed select rand(1) [rand with seed of 1]          

The output from the preceding script shows the returned values.  Each render value is the same because each select statement relies on the aforementioned seed value of one.  The function's output appears as a float value (0.713…).

Rand Syntax and Uses fig_1

If y'all specify a seed value just for the initial invocation of the RAND function and follow that by v more invocations of the part without a seed, then you can retrieve a repeatable list of pseudo-random values.  The list is repeatable in the sense that re-running a script returns the same list of pseudo-random values.

Here's a script that shows an initial RAND function invocation with the aforementioned seed every bit in the preceding script.  However, the initial invocation is followed past five additional RAND functions invocations without a seed value.

-- initial run select rand(one) [rand with seed of ane]   select rand() [first run of rand without seed]   select rand() [2nd run of rand without seed]   select rand() [third run of rand without seed]   select rand() [fourth run of rand without seed]   select rand() [5th run of rand without seed]          

The post-obit table shows the results sets from two consecutive runs of the preceding script.  Notice that each run returns the verbal same sequence of values.  This output confirms that the listing output past the script repeats the same sequence of pseudo-random values.  The feature keeping the results the aforementioned across sequent runs is the seed value for the starting time invocation of the RAND part.  If you lot change the seed value for the initial invocation of the RAND office, then the pseudo-random sequence in the results fix besides changes.

Results set from first run of preceding script Results set from second run of preceding script

Rand Syntax and Uses fig_2

Rand Syntax and Uses fig_3

The next script shows 6 consecutive invocations of the RAND function, but this script does not specify a seed value for the initial part invocation nor whatever of the other invocations.  When you do non specify a seed value, SQL Server automatically assigns a random value in the background.  Therefore, each invocation returns a listing of vi pseudo-random values, simply the list does not repeat beyond successive runs of the script.

-- initial run with no seed value select rand() [first run of rand without seed]  select rand() [second run of rand without seed]  select rand() [third run of rand without seed]  select rand() [fourth run of rand without seed]  select rand() [5th run of rand without seed]  select rand() [sixth run of rand without seed]          

Here's the output from two consecutive runs of the preceding script.  Discover the results sets are unlike across consecutive runs.  The list of pseudo-random numbers in each list are unique because in that location is no user-supplied seed value for the initial invocation of the RAND function.

Results set from first run of preceding script Results set from second run of preceding script

Rand Syntax and Uses fig_4

Rand Syntax and Uses fig_5

Generating a uniform distribution of random digits

By default, the RAND part returns values with a compatible distribution.  By a compatible distribution, it is meant the frequency is the aforementioned across discrete pseudo-random values every bit well as across continuous pseudo-random value ranges with the same width. Furthermore, SQL developers tin transform the uniform float values from the RAND office to discrete values, such equally the integer values of one through 10. This section demonstrates the code for accomplishing this kind of transformation for RAND function output.  In add-on, this section examines the pseudo-random digit output from a script to ostend that the counts for the x digits is approximately the same beyond successive script runs.

Here's the script to create a fresh version of a table for storing randomly created digits, populating the table with thou digits, and then counting the frequency of occurrence of each digit value in the ready.

  • The script starts with a block of code to drop any prior version of the #rand_digits tabular array.  This is the table that stores the one thousand digits created by a transformation of the output from the RAND function.
    • A attempt block followed by a catch block conditionally drops the #rand_digits table.  If the table does not be already, a message is printed in the SQL Server Management Studio Letters tab indicating the table is not available to drib.
    • Next, a create table statement creates a fresh copy of rand_digits.  The table has one column named rand_digit with a tinyint data type.
  • The side by side lawmaking block declares and populates some local variables to manage the operation of the balance of the script block.
    • The @min_integer and @max_integer local variables specify the minimum and maximum integer value to be randomly generated.  The script in this demonstration transforms the bladder values returned by the RAND function into integers in the range of i through ten.
    • The @loop_ctr and @max_loop_ctr local variables facilitate the control of how many passes to perform through a while loop.  Each pass through the loop generates and stores a successive random number.
      • The @loop_ctr variable starts with a value of zip.
      • On each successive pass through a while loop, the value of @loop_ctr is incremented by one.
      • When @loop_ctr equals the value of @max_loop_ctr, the script transfers control to the first argument after the loop.
  • The next cake of code transforms float values returned by the RAND part to digits from ane through ten.  In addition, on each laissez passer through the loop, the randomly created digit based on the output of the RAND role is inserted into the #rand_digits table.
    • Brainstorm and end statements delimit the code to exist executed on each pass through the while loop.
    • The select statement following an insert statement in the while loop transforms the bladder values generated by the RAND function to integers from i through ten.  By re-specifying the values for @min_integer and @max_integer besides every bit perhaps their data type, yous tin can designate any other minimum and maximum values your requirements dictate.
    • After a random digit is inserted into #rand_digits, a set statement increments the value of @loop_ctr past ane.
  • The final block of code in the script counts each digit created past the ane grand passes through the while loop.
    • A select argument groups the rows past rand_digit values.
    • The cavalcade of count function values is assigned the allonym frequency.
    • An club past clause arranges the output from the select statement in ascending order by rand_digit value.
-- This code sample returns a uniform distribution of  -- digits from @min_integer (1) through @max_integer (10)   -- create a fresh copy of #rand_digits begin try drop table #rand_digits cease endeavor begin catch    impress '#rand_digits non available to drop' end take hold of   create table #rand_digits ( rand_digit tinyint )   -- declare min and max random digit values -- and variables values to control loop count declare   @min_integer tinyint =   1 ,@max_integer tinyint =  10 ,@loop_ctr int = 0 ,@max_loop_ctr int = one thousand   -- loop 1000 times  while @loop_ctr < @max_loop_ctr begin   -- generate a random digit from @min_integer through @max_integer -- and insert it into #rand_digits  insert #rand_digits(rand_digit) select floor(rand()*(@max_integer - @min_integer + ane) + @min_integer)   prepare @loop_ctr = @loop_ctr + one   end   -- count the number of each randomly computed digit -- and display the results select rand_digit, count(*) [frequency] from #rand_digits group by rand_digit order by rand_digit          

The post-obit table shows the results sets from three consecutive runs of the preceding script.

  • Find that each results prepare has rand_digit and frequency columns.
  • The rand_digit column values extend from one through ten in each results set.
  • The frequency cavalcade values vary from one results set to the adjacent.  However, yous can encounter that the frequency column values are around 100 for each row.  This outcome indicates the frequency values are approximately uniformly distributed inside each of the three results sets.
  • The frequency cavalcade values are distinct because the pseudo-rand values generated from each run of the script is unique even while their distribution overall reflects a uniform distribution.

Pseudo-random digit values that are uniformly distributed offer many potential benefits to database applications, including taking a random sample of the rows in a very big table.  The next section illustrates 1 approach to implementing this kind of solution.

Results prepare from 1 st Script Run Results fix from 2 nd Script Run Results fix from 3 rd Script Run

Rand Syntax and Uses fig_6

Rand Syntax and Uses fig_7

Rand Syntax and Uses fig_8

Selecting a random sample from a very big table

Two prior tips demonstrated how to create a data warehouse of historical stock prices and volumes with information from the first trading appointment in 2009 through October seven, 2019.  The Next Steps section in this tip contains links for learning more about the information warehouse.  A fact table in the data warehouse (yahoo_prices_valid_vols_only) contains over 14 one thousand thousand rows.  Earlier demonstrating random sampling techniques for rows from a large table, it will be helpful to query the yahoo_prices_valid_vols_only table to get together a few metrics on its contents.  The post-obit script generates these metrics.

Non only does the following script compute the metrics, only it also creates a temporary table (#symbols_with_all_dates) with a subset of the stock symbols from the data warehouse.  The temporary table stores a subset of symbols from the data warehouse with a distinct symbol_id integer value for each symbol.

  • The script starts by creating a fresh copy of the table (#symbols_with_all_dates).
  • Next, the script reveals the exact number of rows in the fact tabular array (14,620,885).
  • So, the script shows the full number of symbols in the fact table (8,089).
  • This is followed past another select statement that counts the number of singled-out trading dates in the fact table (2709).
  • Stock markets regularly register new stocks for trading besides as drop existing stocks that are no longer traded.  The adjacent query finds a subset of symbols that has a engagement value for all the distinct trading dates in the information warehouse.  The row_number function in a query assigns a symbol_id value to each such symbol.  There are 2,614 symbols in the subset.  This subset populates the #symbols_with_all_dates table.
  • The final select argument in the script displays the rows in #symbols_with_all_dates.
begin try    driblet tabular array #symbols_with_all_dates cease try begin catch    print '#symbols_with_all_dates not available to drib' end catch go   -- number of rows (14,620,885) in the yahoo_prices_valid_vols_only table select count(*) [number of rows] from for_csv_from_python.[dbo].[yahoo_prices_valid_vols_only]   -- 8089 symbols select count(distinct symbol) distinct_symbol_count from for_csv_from_python.[dbo].[yahoo_prices_valid_vols_only]   -- 2709 trading dates select count(distinct [appointment]) distinct_date_count from for_csv_from_python.[dbo].[yahoo_prices_valid_vols_only]   -- 2614 symbols take all trading dates (2709) select row_number() over (society by symbol) symbol_id, symbol into #symbols_with_all_dates from for_csv_from_python.[dbo].[yahoo_prices_valid_vols_only] grouping by symbol having count([close]) = ( -- 2709 trading dates select count(distinct [date]) distinct_date_count from for_csv_from_python.[dbo].[yahoo_prices_valid_vols_only] )   -- display contents of #symbols_with_all_dates select * from #symbols_with_all_dates order by symbol          

Starting with #symbols_with_all_dates and yahoo_prices_valid_vols_only, the next script demonstrates how to depict two unlike random samples each having ten symbols from the distinct symbols in #symbols_with_all_dates.  There are 7,081,326 rows in the target population from which sampling is performed.  These rows are derived for price and volume data for each of 2614 symbols for 2709 trading dates.

There are two major code blocks in the script below.  The first lawmaking block is for drawing a random sample for the first set of ten symbols.  The second code block is for cartoon a sample for the 2nd fix of x symbols.  Each block of code commences with a pair of header comment lines denoting the code as for the commencement or second sample.

  • The script begins by creating a fresh copy and populating the #sample_1_of_symbols tabular array.  The second lawmaking block creates a fresh copy and populates the #sample_2_of_symbols tabular array.  Both major code blocks conclude past displaying data for first and last trading appointment for each symbol in its sample.
  • The code block for each sample uses a different seed value for the initial RAND function invocation that specifies its symbols.
    • The seed value for the first sample is 1.
    • The seed value for the second sample is ii.
    • The pseudo-random digits for each sample are in the range from ane through ii,614. Each digit corresponds to a distinct symbol.  The symbols are stored in #sample_1_of_symbols for the first sample and #sample_2_of_symbols for the second sample.
  • The final two select statements in each major code block displays the prices and volumes for each sample of symbols. These select statements draw on yahoo_prices_valid_vols_only and either #sample_1_of_symbols or #sample_2_of_symbols.
    • The first select statement displays data for the get-go trading date.
    • The second select statement displays data for the last trading date.

The preceding script displays four results sets – two for the first sample and ii more for the 2d sample.  The structure of the pair of results sets for each sample is the same.  Therefore, the following screen shot shows only the 2 results sets for the offset sample.  Additionally, all four results sets are displayed subsequently in an Excel worksheet and discussed from an belittling perspective.

The next two screen shots display results sets for the first sample from SQL Server Direction Studio.

  • Each results set has three types of data.
    • The offset type includes ii columns, Appointment and Symbol, that place each row by a trading date and a symbol.
    • The 2nd type of information includes four types of prices.
      • The close toll is critical in that it reveals the price for a stock at the finish of a trading engagement.
      • The other three prices convey some feel for the path of a stock's toll during a trading engagement on its way to the close price.
        • The open up cost shows the toll at the open up of a trading date.
        • The high and the depression prices indicate, respectively, the top and the bottom prices on a trading date.
    • The third blazon of data is in the Volume column.  This indicates the number of shares exchanged during a trading appointment.  Generally, analysts ascribe more significance to prices during a trading date when the volume is significantly above average.
  • The first results ready for the first sample appears on top.  This results ready shows the iii types of information for the kickoff trading date for each of the symbols belonging to the commencement sample.
  • The 2d results set for the first sample appears on the lesser in the screen shot below.  This results set displays the 3 types of data for the last trading date for each of the symbols belonging to the commencement sample.

Rand Syntax and Uses fig_9

The next screen shot is for an Excel spreadsheet showing a pair of results sets for each sample.

  • The first and 2nd results sets for the start sample appear, respectively, in rows 3 through 12.
    • The first results set for the start trading date (1/2/2009)  appears in columns A through G.
    • The second results set for the last trading date (ten/seven/2019) appears in columns I through O.
  • The first and second results sets for the second sample appears, respectively, in rows 20 through 29.
    • Again, the first results ready for the beginning trading appointment (1/two/2009)  appears in columns A through 1000.
    • Also, the 2d results prepare for the last trading date (ten/seven/2019) appears in columns I through O.
  • The symbols for each sample are listed in alphabetical lodge in column B.
    • The first symbol in the first sample is ACM, and the last symbol in the first sample is VLT.
    • The first symbol in the second sample is BFIN, and the last symbol in the second sample is PIE.

Several analyses follow to help identify if and how the two dissimilar samples confirm they are from the same population of toll and volume data considering they yield similar results.

  • Cell Q14 is the average percent gain betwixt the offset and concluding shut price across the 10 symbols in the first sample.  Therefore, the average close cost gain for the first sample is slightly greater than 142 pct.  The comparable cost gain for the 2d sample (see cell Q31) is slightly more than 215 percent.  Because of the disparity in average close price gain per centum values, it is not obvious that both samples are from the aforementioned population.
  • Columns Due south, T, and U show a different kind of comparing betwixt the 2 samples.
    • The values in cavalcade S for rows 3 through 12 in the first sample and rows 20 through 29 in the second sample are
      • 1 when the per centum gain is greater than five pct per year for the last shut price relative to the first close price
      • 0 when the last close price is non five per centum greater per year than the starting time shut price
    • The 0's and 1's in column T are assigned every bit
      • 1 when the last close price is more than 10 percent greater per year than the starting time close price
      • 0 when the last close price is not more than 10 percentage greater per yr than the showtime close price
    • The cut-off value for existence 1 in column U is more xv percent greater per twelvemonth (and 0 otherwise).
  • The results across all the symbols are summarized in row xv for the first sample and in row 32 for the second sample.  As you tin can meet, the percent greater than a criterion value is very similar beyond the two samples.
    • Both the 5 percent per twelvemonth and ten percent per year comparisons are exactly the same at
      • 60% for more than than the five percent per year comparison
      • 30% for the more than x percent per year comparison
    • In general, you can run across at that place is a tendency for the per centum greater than a criterion value to decline as the criterion value rises.  This general tendency continues through the fifteen pct per year benchmark, but the proportion of sample symbols is not exactly the aforementioned between the two samples: 10 percent for the first sample and 20 percentage for the second sample.

Rand Syntax and Uses fig_10

The preceding analyses in this department are only a selection of examples for assessing if and how two samples from the same underlying population yield comparable results.  Every bit this section confirms, the assessment about if ii samples are similar depends on how you lot compare them.  Therefore, you lot should tailor your comparisons based on the needs of those requiring the results.

Next Steps

The T-SQL scripts and worksheets for data displays and analyses are available in this tip's download file.  After y'all ostend that y'all are getting valid results with the lawmaking from the download file, try variations to T-SQL lawmaking from this tip.

  • You lot tin can re-run the second script in the first department with dissimilar seed values to ostend that pseudo-random sequences depend on the seed value for the RAND function.
  • You can also modify the assignments for the @min_integer and @max_integer local variables for the script for the "Generating a compatible distribution of random digits" department.  These changes will allow you to confirm your ability to control the minimum and maximum pseudo-random values generated past a RAND role.

If y'all desire to test the lawmaking for this tip's terminal section, then yous also need to run scripts from Collecting Time Series Information for Stock Market with SQL Server and Time Serial Data Fact and Dimension Tables for SQL Server.  Scripts from these two prior tips will re-create the yahoo_prices_valid_vols_only table in your SQL Server instance.  You can draw different samples of symbols from those in the final section by specifying unlike seed values for the initial RAND part invocations.

Of course, the all-time way to derive value from this tip is by running the code in the download for this tip with your company's data.  If you encounter issues, I expect forward to answering whatsoever questions that you take about how the code should work and/or how to become the lawmaking to work for your personal needs.

Related Manufactures

Popular Articles

Virtually the author

MSSQLTips author Rick Dobson Rick Dobson is a Microsoft Certified Technical Specialist and well accomplished SQL Server and Admission author.

View all my tips

Article Last Updated: 2020-04-01

rossonfordonce.blogspot.com

Source: https://www.mssqltips.com/sqlservertip/6313/generate-unique-random-number-in-sql-server/

0 Response to "Sql Query to Randomize and Randomize Again"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel