试阅

说明

从上一阶段的结果中返回一个非确定性样本。

支持两种模式:

  • DOCUMENTS 模式允许对指定数量的文档进行抽样
    • 此模式与 GoogleSQL.RESERVOIR 类似,它会输出大小为 n 的样本,其中,任何大小为 n 的样本出现的可能性都相同。
  • PERCENT 模式允许对指定百分比的文档进行抽样
    • 此模式与 GoogleSQL.BERNOULLI 类似,每个文档都是以相同的 percent 概率独立选择的。这样一来,平均会返回 #documents * percent / 100 个文档。

语法

Node.js

  const sampled = await db.pipeline()
    .database()
    .sample(50)
    .execute();

  const sampled = await db.pipeline()
    .database()
    .sample({ percent: 0.5 })
    .execute();

行为

文档模式

文档模式会以随机顺序检索指定数量的文档。 指定的数字必须是非负 INT64 值。

例如,对于以下集合:

Node.js

await db.collection('cities').doc('SF').set({name: 'San Francsico', state: 'California'});
await db.collection('cities').doc('NYC').set({name: 'New York City', state: 'New York'});
await db.collection('cities').doc('CHI').set({name: 'Chicago', state: 'Illinois'});

文档模式下的抽样阶段可用于从该集合中检索非确定性的结果子集。

Node.js

const sampled = await db.pipeline()
    .collection("/cities")
    .sample(1)
    .execute();

在此示例中,系统将仅随机返回 1 个文档。

  {name: 'New York City', state: 'New York'}

如果提供的数字大于返回的文档总数,则会以随机顺序返回所有文档。

Node.js

const sampled = await db.pipeline()
    .collection("/cities")
    .sample(5)
    .execute();

这将生成以下文档:

  {name: 'New York City', state: 'New York'}
  {name: 'Chicago', state: 'Illinois'}
  {name: 'San Francisco', state: 'California'}

其他示例

Web

let results;

// Get a sample of 100 documents in a database
results = await execute(db.pipeline()
  .database()
  .sample(100)
);

// Randomly shuffle a list of 3 documents
results = await execute(db.pipeline()
  .documents([
    doc(db, "cities", "SF"),
    doc(db, "cities", "NY"),
    doc(db, "cities", "DC"),
  ])
  .sample(3)
);
Swift
var results: Pipeline.Snapshot

// Get a sample of 100 documents in a database
results = try await db.pipeline()
  .database()
  .sample(count: 100)
  .execute()

// Randomly shuffle a list of 3 documents
results = try await db.pipeline()
  .documents([
    db.collection("cities").document("SF"),
    db.collection("cities").document("NY"),
    db.collection("cities").document("DC"),
  ])
  .sample(count: 3)
  .execute()

Kotlin

var results: Task<Pipeline.Snapshot>

// Get a sample of 100 documents in a database
results = db.pipeline()
    .database()
    .sample(100)
    .execute()

// Randomly shuffle a list of 3 documents
results = db.pipeline()
    .documents(
        db.collection("cities").document("SF"),
        db.collection("cities").document("NY"),
        db.collection("cities").document("DC")
    )
    .sample(3)
    .execute()

Java

Task<Pipeline.Snapshot> results;

// Get a sample of 100 documents in a database
results = db.pipeline()
    .database()
    .sample(100)
    .execute();

// Randomly shuffle a list of 3 documents
results = db.pipeline()
    .documents(
        db.collection("cities").document("SF"),
        db.collection("cities").document("NY"),
        db.collection("cities").document("DC")
    )
    .sample(3)
    .execute();
Python
# Get a sample of 100 documents in a database
results = client.pipeline().database().sample(100).execute()

# Randomly shuffle a list of 3 documents
results = (
    client.pipeline()
    .documents(
        client.collection("cities").document("SF"),
        client.collection("cities").document("NY"),
        client.collection("cities").document("DC"),
    )
    .sample(3)
    .execute()
)

百分比模式

在百分比模式下,每个文档都有指定的 percent 返回几率。与文档模式不同,此处的顺序不是随机的,而是保留了预先存在的文档顺序。此百分比输入必须是介于 0.01.0 之间的双精度值。

由于每个文档都是独立选择的,因此输出是非确定性的,平均会返回 #documents * percent / 100 个文档。

例如,对于以下集合:

Node.js

await db.collection('cities').doc('SF').set({name: 'San Francsico', state: 'California'});
await db.collection('cities').doc('NYC').set({name: 'New York City', state: 'New York'});
await db.collection('cities').doc('CHI').set({name: 'Chicago', state: 'Illinois'});
await db.collection('cities').doc('ATL').set({name: 'Atlanta', state: 'Georgia'});

百分比模式下的抽样阶段可用于从集合阶段检索(平均)50% 的文档。

Node.js

  const sampled = await db.pipeline()
    .collection("/cities")
    .sample({ percent: 0.5 })
    .execute();

这将生成一个非确定性样本,其中包含 cities 集合中(平均)50% 的文档。以下是一种可能的输出。

  {name: 'New York City', state: 'New York'}
  {name: 'Chicago', state: 'Illinois'}

在百分比模式下,由于每个文档被选中的概率相同,因此可能会返回零个文档或所有文档。

其他示例

Web

// Get a sample of on average 50% of the documents in the database
const results = await execute(db.pipeline()
  .database()
  .sample({ percentage: 0.5 })
);
Swift
// Get a sample of on average 50% of the documents in the database
let results = try await db.pipeline()
  .database()
  .sample(percentage: 0.5)
  .execute()

Kotlin

// Get a sample of on average 50% of the documents in the database
val results = db.pipeline()
    .database()
    .sample(SampleStage.withPercentage(0.5))
    .execute()

Java

// Get a sample of on average 50% of the documents in the database
Task<Pipeline.Snapshot> results = db.pipeline()
    .database()
    .sample(SampleStage.withPercentage(0.5))
    .execute();
Python
from google.cloud.firestore_v1.pipeline_stages import SampleOptions

# Get a sample of on average 50% of the documents in the database
results = (
    client.pipeline().database().sample(SampleOptions.percentage(0.5)).execute()
)